Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
heeen
May 14, 2005

CAT NEVER STOPS
Display List do have great performance, especially on nvidia hardware. The compiler does a very good job at optimizing them. But as soon as you're dealing with shaders things will start to get ugly because there are problems with storing uniforms etc.

While you're at it, stick to the generic glVertexAttrib functions instead of the to-be-deprecated glVertexPointer/glNormalPointer/... functions.
You will probably need to write simple shaders for the generic attrib functions, though.

Adbot
ADBOT LOVES YOU

UraniumAnchor
May 21, 2006

Not a walrus.
Is there a way to write shader attribs straight from CUDA without having to pass through main memory?

Specifically what I'd like to do is simulate some 'terrain' morphing (in this case shifting water levels) where the morphing computation is done in CUDA, and passes the updated height information right into the vertex pipeline without ever leaving the card.

UraniumAnchor fucked around with this message at 02:51 on May 8, 2010

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

UraniumAnchor posted:

Is there a way to write shader attribs straight from CUDA without having to pass through main memory?

Specifically what I'd like to do is simulate some 'terrain' morphing (in this case shifting water levels) where the morphing computation is done in CUDA, and passes the updated height information right into the vertex pipeline without ever leaving the card.

You can write to any sort of buffer (Vertex/Index/Constant, Texture, DrawIndirect) with CUDA using the D3D/OpenGL interop API. I'm not sure precisely what you are saying you want to do, but you should be able to :

1) Create a texture/constant buffer in your graphics API
2) register it with CUDA
3) map the resource(s), getting back a void pointer corresponding to the device memory address of the buffer
4) run the CUDA kernel using that pointer,
5) un-map the resource (releasing it to the graphics API), and
6) Bind the resource and use it in a shader

UraniumAnchor
May 21, 2006

Not a walrus.

Hubis posted:

You can write to any sort of buffer (Vertex/Index/Constant, Texture, DrawIndirect) with CUDA using the D3D/OpenGL interop API. I'm not sure precisely what you are saying you want to do, but you should be able to :

1) Create a texture/constant buffer in your graphics API
2) register it with CUDA
3) map the resource(s), getting back a void pointer corresponding to the device memory address of the buffer
4) run the CUDA kernel using that pointer,
5) un-map the resource (releasing it to the graphics API), and
6) Bind the resource and use it in a shader

I mostly want to avoid having to pass the heightfield over the bus since the only thing that even cares about it is the GPU. I assume this approach is smart enough to realize that the memory address lives on the card and doesn't need to transfer it around? And it can handle a somewhat large (2000 on a side square) dataset?

heeen
May 14, 2005

CAT NEVER STOPS

UraniumAnchor posted:

Is there a way to write shader attribs straight from CUDA without having to pass through main memory?

Specifically what I'd like to do is simulate some 'terrain' morphing (in this case shifting water levels) where the morphing computation is done in CUDA, and passes the updated height information right into the vertex pipeline without ever leaving the card.

allocating:
code:
glGenBuffers(1, &vbo);
   glBindBuffer(GL_ARRAY_BUFFER, vbo);
	glBufferData(GL_ARRAY_BUFFER, pre_numvertices * sizeof(vertex), 0, GL_DYNAMIC_COPY);
	CheckGLError(__FILE__, __LINE__);
	glBindBuffer(GL_ARRAY_BUFFER, 0);
    cutilSafeCall(cudaGLRegisterBufferObject(vbo));
main loop:
code:
	cutilSafeCall(cudaGLMapBufferObject((void**)&faces, vbo));
//call kernels to work on "faces"
	cutilSafeCall(cudaGLUnmapBufferObject(vbo));
//draw vbo
deallocating:
code:
cutilSafeCall(cudaGLUnregisterBufferObject(vbo));
	glDeleteBuffers(1, &vbo);

Spite
Jul 27, 2001

Small chance of that...

heeen posted:

Display List do have great performance, especially on nvidia hardware. The compiler does a very good job at optimizing them. But as soon as you're dealing with shaders things will start to get ugly because there are problems with storing uniforms etc.

While you're at it, stick to the generic glVertexAttrib functions instead of the to-be-deprecated glVertexPointer/glNormalPointer/... functions.
You will probably need to write simple shaders for the generic attrib functions, though.

Display lists: that really depends on the platform, and the driver just makes them into VBOs anyway. The original idea of display lists is basically what DX11's deferred context paradigm is trying to get at, and even that has its problems. Everyone in the OGL world is trying to kill display lists, so it's really a bad idea to use them. (the problem being that you can't REALLY optimize them since they tell you nothing about the state at the time the list is created/used. Most CPU overhead is spent validating state and in associated costs - so long as you aren't sending lots of data to the GPU and converting between formats.)

And if you still have VertexPointer,etc, I'd still use them. Optimizations can be made in the driver (ie, in clip space) if the driver knows what the position attribute, and the modelview and projection matrices are. This won't be true forever, definitely, but you only have a limited number of vtx attribs and they are definitely reserved if you aren't running OGL 3.0.

heeen
May 14, 2005

CAT NEVER STOPS
Does anyone have a uniform buffer object class I could have a look at? I'm trying to figure out a good way to bring global uniforms, material-specific uniforms and maybe surface specific uniforms together.
Standard uniforms were giving me a headache because I didn't want to query uniforms by name every time I set them (hundrets of times per frame), so I had to cache uniform locations somehow, which is tricky: a global uniform could have a different location/index in every shader that it is used in, but the material/"global uniform manager" each needed to track them individually.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

UraniumAnchor posted:

I mostly want to avoid having to pass the heightfield over the bus since the only thing that even cares about it is the GPU. I assume this approach is smart enough to realize that the memory address lives on the card and doesn't need to transfer it around? And it can handle a somewhat large (2000 on a side square) dataset?

Yeah, this is exactly what you'd want -- this lets you generate the data on the GPU via CUDA directly into device memory, and then simply re-map the device buffer as a GL texture without transferring from device to host. There's some driver overhead in this mapping so you want to make the map call as rarely as possible (interestingly, the overhead is per-call, not per-resource, so map as many resources as you can in each call); however, it's going to be a whole lot better than the Device->Host->Device memcpy latency you'd otherwise have to deal with. The only limit on size is (a) your available texture memory, and (b) the texture/buffer size limitations of your graphics API. Another thing to consider is that you should make use of both the Base and Pitch properties when writing to the memory in CUDA, as the graphics API may have specific requirements for how the elements are laid out based on the element format (RGBA8 vs RGB8 vs R32f, etc)

Deep Dish Fuckfest
Sep 6, 2006

Advanced
Computer Touching


Toilet Rascal
There's something I've been wondering except I'm not entirely sure where to look to find the answer, but I figured someone here might know. Basically, just how well do modern GPUs handle branching in shader or whatever other code they're running? As far as I know, GPUs are far better at straight numerical computation, but seeing as fixed transform pipelines are pretty much gone and with the trend towards general purpose GPUs, how true is that nowadays?

That being said, if branching does have a significant cost, do built-in functions, like say min() or abs() or clamp() in HLSL, which would involve some form of branching if implemented as shader code, also have that same penalty?

I'm largely asking this because I tend to be really careful to minimize the amount of branching in the form of explicit if statements in shader code I write, but I realized I have no idea whether it's really necessary or not.

haveblue
Aug 15, 2005



Toilet Rascal
It is necessary, branching and loops are extremely expensive in shaders.

The built-in functions map to functionality of the silicon, so something like min() or clamp() doesn't generate a branch and is nowhere near as expensive as writing "if(x<0) x = 0" in your own code.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

haveblue posted:

nowhere near as expensive as writing "if(x<0) x = 0" in your own code.
This isn't really true. While min/max are usually built-in, what you just typed out is exactly the kind of instruction emitted in shader models that don't support branching.

There are instructions that take two values, compare them, and if the comparison checks out, then they assign a value to a register. These are fairly cheap, what isn't cheap though is being forced to evaluate every execution path and then use a conditional to choose the result, or split an operation into multiple draw calls so you can switch to a different shader permutation.

That's what branching was put in to improve, and it's not TOO expensive provided you follow a simple rule: Only use conditions that are likely to produce the same result on nearby pixels. GPUs are optimized for processing blocks of pixels the same way, which makes branch mispredictions VERY expensive.

PDP-1
Oct 12, 2004

It's a beautiful day in the neighborhood.
Are there any tips or tricks for debugging shader files?

I'm using DirectX/HLSL and finding writing custom shaders to be very frustrating since it isn't possible to step through the code like you can in a normal IDE and there's no way to print values to a debug file.

Are there any kind of emulators that will take a .fx file as input, let you specify the extern values and vertex declaration, and walk through the code one line at a time?

Hobnob
Feb 23, 2006

Ursa Adorandum

PDP-1 posted:

Are there any tips or tricks for debugging shader files?

I'm using DirectX/HLSL and finding writing custom shaders to be very frustrating since it isn't possible to step through the code like you can in a normal IDE and there's no way to print values to a debug file.

Are there any kind of emulators that will take a .fx file as input, let you specify the extern values and vertex declaration, and walk through the code one line at a time?

I don't think it will let you single-step, but Rendermonkey is useful for developing shaders. You can tweak inputs, see intermediate and final outputs, and tweak the shader code very easily.

heeen
May 14, 2005

CAT NEVER STOPS

PDP-1 posted:

Are there any tips or tricks for debugging shader files?

I'm using DirectX/HLSL and finding writing custom shaders to be very frustrating since it isn't possible to step through the code like you can in a normal IDE and there's no way to print values to a debug file.

Are there any kind of emulators that will take a .fx file as input, let you specify the extern values and vertex declaration, and walk through the code one line at a time?

There's glsldevil and gdebugger.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

OneEightHundred posted:

This isn't really true. While min/max are usually built-in, what you just typed out is exactly the kind of instruction emitted in shader models that don't support branching.

There are instructions that take two values, compare them, and if the comparison checks out, then they assign a value to a register. These are fairly cheap, what isn't cheap though is being forced to evaluate every execution path and then use a conditional to choose the result, or split an operation into multiple draw calls so you can switch to a different shader permutation.

That's what branching was put in to improve, and it's not TOO expensive provided you follow a simple rule: Only use conditions that are likely to produce the same result on nearby pixels. GPUs are optimized for processing blocks of pixels the same way, which makes branch mispredictions VERY expensive.

This is basically it. Branching comes in two forms: Conditional instructions (such as "if (a>b) {a=b}") which cost nothing extra, and divergent code paths (such as "if (a>b) { a = tex2D(texFoo); } else {b = tex2D(texFoo);}") which may potentially have the cost of executing both code paths. In the first case, you are just running the same instructions, and each thread may get different results based on the values in their registers; in that sense, "a = min(a, b)" is no different than "a = a+b". In the second case, you can think of the GPU as processing fragments/threads in "clusters" which all execute together, with a mask for each thread in the cluster saying whether to use or discard the results for a given instruction. When all the threads in a cluster go down the same path (such as all the fragments generated by a small triangle) the GPU is smart enough to detect this, and only execute instructions in that path. If the cluster is split (i.e. it is "divergent") then you have to issue instructions for both paths, even though only a subset of the threads will actually use the results from each.

So, if you've got high-level branching, such as changing material type based on constant buffer values or low-frequency changes, you won't really see any penalty; if you've got very locally divergent execution patterns, then you'll see worst-case performance.

stramit
Dec 9, 2004
Ask me about making games instead of gains.

PDP-1 posted:

Are there any tips or tricks for debugging shader files?

I'm using DirectX/HLSL and finding writing custom shaders to be very frustrating since it isn't possible to step through the code like you can in a normal IDE and there's no way to print values to a debug file.

Are there any kind of emulators that will take a .fx file as input, let you specify the extern values and vertex declaration, and walk through the code one line at a time?

Use PIX for windows. It comes with the Direct X SDK and allows you to do a variety of things. You can debug a pixel (step through each render call that wrote to that pixel), Check render state and a whole bunch of other features. If your application is in DX10 then it should be pretty easy to use. DX9 PIX is a bit flakey but will still get the job done.

I prefer these to specific 'shader' debuggers as you can check a variety of things and it is based on what YOUR application is doing.

stramit fucked around with this message at 00:09 on May 12, 2010

PDP-1
Oct 12, 2004

It's a beautiful day in the neighborhood.
Thanks for all the replies on shader debuggers. I downloaded RenderMonkey and it helped get through my original problem so I now have a working texture/lighting model to base further work on. I'll take a look at PIX but I'm using an XP machine constrained to DX9 so it'd be the flaky version instead of the nicer DX10+ versions.

edit: had a question here, figured it out on my own.

PDP-1 fucked around with this message at 11:41 on May 12, 2010

Scaevolus
Apr 16, 2007

In OpenGL, what's the best way to render a million cubes? (in a 100x100x100 grid) They won't be moving. Should I use display lists or vertex buffer objects?

haveblue
Aug 15, 2005



Toilet Rascal
Definitely the vertex buffer. It is the fastest for everything that is not frequently updated.

Zerf
Dec 17, 2004

I miss you, sandman

Scaevolus posted:

In OpenGL, what's the best way to render a million cubes? (in a 100x100x100 grid) They won't be moving. Should I use display lists or vertex buffer objects?

Take this with a grain of salt, I haven't worked with GL for a few years. Why don't you just use both? VBOs and display lists are mutually exclusive.

VBOs makes your data reside on the graphics card, which will be much faster than immediate mode(glVertex3f etc). Display lists just records your GL function calls.

If we just play with the thought that you would issue a draw call for each of the million boxes(which you're not, I hope) you could make a VBO out for the box, and record the million drawcalls into a display list, so when you need to draw everything you have an already-compiled command buffer to use(i.e. you traded function call time for memory).

Spite
Jul 27, 2001

Small chance of that...
Do not use Display Lists. They are deprecated and disgusting. Most drivers will convert then to VBO/VAO under the hood anyway.
They DO NOT help performance in the way most people think - because of their design the driver can't cache state and validation work, which is what takes all the time anyway.

Use VBO and put everything you can into VRAM. Keep in mind that stuff may be paged on and off the card as the driver needs. Use as few draw calls as necessary - if your hardware supports instancing, use that.

As for UBO, that spec is a mess. It's probably not that much faster than making a bunch of uniform arrays and updating those - although you can't updates pieces of it that way. You can also try gpu_program4 and just update the constant arrays.

UraniumAnchor
May 21, 2006

Not a walrus.
How do I get the current pixel's depth in a GL pixel shader? I know FragCoord.z is the incoming fragment, and I write to gl_FragDepth if I want to modify the final depth, but how do I read what's already in there?

Specifically I want the incoming pixel to be more opaque if the depth difference is greater.

haveblue
Aug 15, 2005



Toilet Rascal

UraniumAnchor posted:

How do I get the current pixel's depth in a GL pixel shader? I know FragCoord.z is the incoming fragment, and I write to gl_FragDepth if I want to modify the final depth, but how do I read what's already in there?

Specifically I want the incoming pixel to be more opaque if the depth difference is greater.

You can't read directly from the depth buffer, you have to bind the previously rendered depth buffer as a texture and render the shader output into a different target.

Spite
Jul 27, 2001

Small chance of that...

haveblue posted:

You can't read directly from the depth buffer, you have to bind the previously rendered depth buffer as a texture and render the shader output into a different target.

Yeah - do a Z-prepass with color writes turned off. You might also be able to do something by mucking with the depth test and blending, but using a shader will be more straightforward.

UraniumAnchor
May 21, 2006

Not a walrus.
Well, here's a few code fragments to make sure I'm at least going the right direction:

code:

...
   glGenTextures(1, &depthTex);
   glBindTexture(GL_TEXTURE_2D, depthTex);
   glTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT, vx, vy, 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, 0);
...

[... after drawing ...]

...
   glCopyTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT, 0, 0, vx, vy, 0);
...

Is that the right approach?

edit: Figured the other part out. I guess it detects unused variables and junks them.

UraniumAnchor fucked around with this message at 22:34 on May 14, 2010

Spite
Jul 27, 2001

Small chance of that...
Don't use copytex. Attach a depth texture to an FBO, render a z-prepass into it.

Then bind that texture, and draw into a different FBO with the shader that reads the depth value. You can also turn off depth writes, turn on color writes and use the same FBO.

Also, don't use GL_LUMINANCE - use ARB_depth_texture.
GL_DEPTH_COMPONENT_24 and GL_DEPTH_COMPONENT are the <internalformat> and <format>, respectively.

UraniumAnchor
May 21, 2006

Not a walrus.
Is there a way to draw to both the visible framebuffer and the FBO? I assume I can just copy afterwards, but whatever steps I can skip...

Edit: Wait, nevermind, I think I see what the proper way to do this is...

UraniumAnchor fucked around with this message at 22:20 on May 15, 2010

Spite
Jul 27, 2001

Small chance of that...
No, you can't draw to both the backbuffer and an FBO, nor should you want to. You can render to multiple color attachments via FragData.

It's much better to get into the habit of rendering to an FBO and then blitting that to the screen. The iPhone, for example, requires you to render into a renderbuffer and then give that to the windowing system to present.

You can just draw a fullscreen quad with an Identity projection matrix - that also allows you to do most processing effects easily.

haveblue
Aug 15, 2005



Toilet Rascal

Spite posted:

It's much better to get into the habit of rendering to an FBO and then blitting that to the screen. The iPhone, for example, requires you to render into a renderbuffer and then give that to the windowing system to present.

To be pedantic, I think that's a property of OpenGL ES, not the iPhone specifically.

Also, nobody learns to do this because one of the Xcode project templates contains all the GL setup and frame submission code :v:

newsomnuke
Feb 25, 2007

What's up with this vertex shader?

code:
void main(void)
{
	vec4 pos = vec4(gl_Vertex);
	pos.w = 1;

	gl_Position = gl_ModelViewProjectionMatrix * pos;
}
It only works when I set vertices' 'w' coordinate to 1 before sending them to the GPU, for instance:

code:
for (size_t i = 0; i < vertSize; i += 4)
{
	verts[i + 0] = x;
	verts[i + 1] = y;
	verts[i + 2] = z;
	verts[i + 3] = distance; // only works if 'distance' is 1
}

...
glBufferDataARB(GL_ARRAY_BUFFER_ARB, vertSize * sizeof(GLfloat), verts, GL_STATIC_DRAW_ARB);
...
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vertexId);
glVertexPointer(4, GL_FLOAT, 0, 0);
...
glDrawElements(GL_TRIANGLES, numFaces * 3, GL_UNSIGNED_SHORT, 0);
The vertices end up on screen, run through the fragment shader properly, just their positions are completely wrong. Why isn't the 'w' component being reset properly?

PDP-1
Oct 12, 2004

It's a beautiful day in the neighborhood.
The w component of your vector should pretty much always be 1. Points are translated via a 4x4 matrix with the amount of the translation stored in the 4th column (or row, depending on how you set up your system). If you use a number other than 1 to pad out your matrix you are introducing an extra scaling into these translation matrix components, resulting in your points not ending up where you want them.

Even if you aren't trying to move the points around directly with a translation matrix the view and projection matricies will be affected by having something other than 1 in the last position of your vector, and you'll still get screwed up vertex positions.

newsomnuke
Feb 25, 2007

PDP-1 posted:

The w component of your vector should pretty much always be 1.
I know, which is why I reset it to 1 in the shader before applying the transform. But it doesn't seem to work!

edit: I'm using the 'w' component to store extra information which I use in the vertex shader, if you were wondering.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
Are you sure you've got the vertex shader set up properly? The behavior you're describing is what you'd get if it skipped the vertex shader completely, so you at least need to rule that out. Remember that pixel and vertex shaders get linked to a single "program" in OpenGL and you can only have one program bound at a time.

Spite
Jul 27, 2001

Small chance of that...

haveblue posted:

To be pedantic, I think that's a property of OpenGL ES, not the iPhone specifically.

Also, nobody learns to do this because one of the Xcode project templates contains all the GL setup and frame submission code :v:

Well, if you mean that there's no backbuffer and all rendering must be done into an FBO, then sure. However, it would be nice if you could present a texture instead of a renderbuffer, etc.


ultra-inquisitor:
Pass your modified 'pos' as a varying and set the output red channel to w and see what it's being set to. I agree with OneEightHundred though, it sounds like the shader isn't bound.

newsomnuke
Feb 25, 2007

edit: ok, it's working. Turns out the problem was actually in the fragment shader where I was doing something with gl_FrontColor which I shouldn't have been doing.

newsomnuke fucked around with this message at 13:15 on May 16, 2010

hey mom its 420
May 12, 2007

My little OpenGL game performs fine in Linux and Windows XP but is really slow under Windows 7. Has anyone had any similar experiences or knows off the top of their head what could be wrong?

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Bonus posted:

My little OpenGL game performs fine in Linux and Windows XP but is really slow under Windows 7. Has anyone had any similar experiences or knows off the top of their head what could be wrong?
Do you have an ATI video card? Some of the ATI drivers that windows update provides don't support OpenGL resulting in it being emulated in software.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Bonus posted:

My little OpenGL game performs fine in Linux and Windows XP but is really slow under Windows 7. Has anyone had any similar experiences or knows off the top of their head what could be wrong?
No opengl driver for the card (nor wrapper-to-DirectX) installed in Win7, so it's falling back to software renderer? (Or something wrong with the startup code that's making it fall back to software renderer; I've had that with calling for a resolution that the Windows driver didn't support, back in the day.)

I have a 3D math question again, which is pretty much the same question as I had a while back, but a bit simpler now. I have no problem with a hierarchical skeleton comprised of bone-length and a quaternion-rotation for each bone, except when it comes to skinning, which requires a different transform.

Here's a simple diagram:

Where the L-vars are translations along the Y axis, and R-vars are rotations.

Now, if I wanted to attach an object to the end of the third bone, the transform for that would be simply L3*R3*L2*R2*L1*R1 - that would put a point that was originally at (0,0,0) at wherever the tip of bone 3 is. But for skinning that's not what you need - you want a point that was originally at L3*L2*L1 to end up at L3*R3*L2*R2*L1*R1.

Basically, I need to turn all the rotations, and where they are into one transformation matrix. (For clarity, assume that I want to put a single mesh around the skeleton in the diagram, that is a tentacle.)

This isn't really a code problem so much as a math problem - what's the correct skinning transformation for each of these 3 bones? For the first bone it's obviously just R1, but it can't be R2*R1 for the second because that would do both rotations around the origin.

roomforthetuna fucked around with this message at 05:41 on May 17, 2010

Fecotourist
Nov 1, 2008

roomforthetuna posted:

This isn't really a code problem so much as a math problem - what's the correct skinning transformation for each of these 3 bones? For the first bone it's obviously just R1, but it can't be R2*R1 for the second because that would do both transformations around the origin.

Is this relevant?
http://isg.cs.tcd.ie/projects/DualQuaternions/

The dual quaternion may be what you're looking for in terms of putting rotation and translation into one entity that can be interpolated, etc.

Adbot
ADBOT LOVES YOU

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Fecotourist posted:

Is this relevant?
http://isg.cs.tcd.ie/projects/DualQuaternions/
Nah, I'm looking for something much much simpler than that - I'm just trying to get the right matrix for ordinary old-fashioned skinning. I tried Google for it, but it just gives me a shitload of "how to make the hierarchical position matrix" which is the one I already have that's no use for skinning, and a couple of "there's this library function in XNA" for which I couldn't find an explanation for what it does (if it even does the thing I'm looking for at all, which was unclear). Oh, and a lot of ways to calculate the skinning weight for when you're making a model, which I've already done.

I'm pretty sure this particular question is so remedial that nobody's bothered to publish an answer to it anywhere.

Edit: The XNA functions that appear to be the closest thing I've found to an answer are CopyAbsoluteBoneTransformsTo combined with CopyBoneTransformsFrom. It appears that they're doing the skin matrix thus:
code:
//precalc, done once
model.setToBindPose();
model.copyAbsoluteBoneTransformsTo(bindTransforms);
for (int b=0; b<bindTransforms.Length; b++) {
  bindTransforms[b]=MatrixInvert(bindTransforms[b]);
}

//render
model.copyBoneTransformsFrom(boneTransforms);
model.copyAbsoluteBoneTransformsTo(boneAbsoluteTransforms);
for (int b=0; b<boneTransforms.Length; b++) {
  skinTransforms[b]=bindTransforms[b]*boneAbsoluteTransforms[b];
}
If that works then my question is, what exactly do those two functions do? I'm guessing the 'from' version is taking its "boneTransforms" from some sort of animation data, and putting those into the model, and then the 'Absolute' transform would be the ... (R1), (R2*L1*R1) and (R3*L2*R2*L1*R1) full-transform values I already have in my skeleton... I guess using the inverse of the bind-time transforms is something I haven't tried, maybe this is the answer to my question. Will come back tomorrow and report.

roomforthetuna fucked around with this message at 06:09 on May 17, 2010

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply