Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us $3,400 per month for bandwidth bills alone, and since we don't believe in shoving popup ads to our registered users, we try to make the money back through forum registrations.
«69 »
  • Post
  • Reply
Xerophyte
Mar 17, 2008

This space intentionally left blank


peepsalot posted:

Yeah I saw Inigo's site, and its crazy impressive but really light on details of how the hell most of it is done.

The distance functions page he has covers most of the functions used. IQ is a crazy demoscener savant at actually modelling things with nothing but simple distance fields and transforms combined in nutty ways, but that covers most the operations you have outside of highly specific fractals.

If you're wondering about how to do the actual ray marching, it both is and is not complex. Making a simple ray marcher is easy. Making one that's both fast and robust tends to involve a lot of scene-specific tweaking.

A very basic conservative SDF ray marcher will look something like
C++ code:
class Ray {
  float3 o;   // origin
  float3 dir; // direction
};

// Minimum threshold: any point closer to the surface than this is counted as part of it.
const float kMinThreshold = 1e-5;

// Maximum threshold: if we are ever this far away from the surface then assume we will never
// intersect. In practice you'll probably clamp the ray to some bounding volume instead.
const float kMaxThreshold = 1e2;

bool march_ray(Ray& r, SDF f) {
  float distance;
  while(true) {
    // Compute the distance to the surface. We can move this far without intersecting.
    distance = f(r.o);

    // Specifically, we can move this far in the ray direction without intersecting.
    r.o = r.o + distance * r.dir;

    // If we're sufficiently close to the surface, count as an intersection.
    if (distance < kMinThreshold) {
      return true;
    }

    // If we're sufficiently far away from the surface, count as escaping.
    if (kMaxThreshold < distance) {
      return false;
    }
  }
}
This has a couple of problems:
1. It only works if the SDF is an actual SDF. If it's an approximation that can be larger than the actual distance to the surface then the marching breaks. Such approximations are common in practice. They occur if you have some complex fractal that you can't get a true distance field for, or if you are applying any sort of warping or displacement function to a real SDF, or if you are using a non-Euclidian distance norm, and so on.
2. If the ray is moving perpendicular to the gradient of the distance field (i.e. the ray is parallel with a plane, slab or box) then this can be horrifically slow as you take a ton of tiny steps without ever getting closer.

In practice, most SDFs people use to do cool hings are not really SDFs and you probably need to change the naive r.o = r.o + distance * r.dir; marching step. Exactly what "more careful" means tends to be very scene-specific. Common tweaks include:
- Multiplying the step size with a tweakable global scale factor.
- Adaptively increasing the step size if the distance is changing slower than expected between iterations.
- Clamping the step size to some min and max bounds, then doing a binary search to refine the intersection point once you've concluded one exists.
Finding the right set of tweaks for your scene tends to be challenging. If you get them wrong then you get stuff like this where sometimes the marcher will fail to intersect the surface.

For non-SDF raymarching -- when your surface is implicitly defined with something like a bool is_inside(point p) -- it's common to just use a fixed step size, possibly with a binary search step to refine intersections. This can be very, very slow, which is why even approximate SDFs are nice.

E: The code initially used do-while for some reason, but I decided that this made me feel unclean.

Xerophyte fucked around with this message at Oct 28, 2017 around 14:04

Adbot
ADBOT LOVES YOU

Spatial
Nov 15, 2007



Most of IQ's demos are up on ShaderToy. Full source and realtime editing to learn from.

peepsalot
Apr 24, 2007

††††††††PEEP THIS...
†††††††††††BITCH!



Xerophyte posted:

The distance functions page he has covers most of the functions used. IQ is a crazy demoscener savant at actually modelling things with nothing but simple distance fields and transforms combined in nutty ways, but that covers most the operations you have outside of highly specific fractals.

If you're wondering about how to do the actual ray marching, it both is and is not complex. Making a simple ray marcher is easy. Making one that's both fast and robust tends to involve a lot of scene-specific tweaking.

A very basic conservative SDF ray marcher will look something like
C++ code:
class Ray {
  float3 o;   // origin
  float3 dir; // direction
};

// Minimum threshold: any point closer to the surface than this is counted as part of it.
const float kMinThreshold = 1e-5;

// Maximum threshold: if we are ever this far away from the surface then assume we will never
// intersect. In practice you'll probably clamp the ray to some bounding volume instead.
const float kMaxThreshold = 1e2;

bool march_ray(Ray& r, SDF f) {
  float distance;
  while(true) {
    // Compute the distance to the surface. We can move this far without intersecting.
    distance = f(r.o);

    // Specifically, we can move this far in the ray direction without intersecting.
    r.o = r.o + distance * r.dir;

    // If we're sufficiently close to the surface, count as an intersection.
    if (distance < kMinThreshold) {
      return true;
    }

    // If we're sufficiently far away from the surface, count as escaping.
    if (kMaxThreshold < distance) {
      return false;
    }
  }
}
This has a couple of problems:
1. It only works if the SDF is an actual SDF. If it's an approximation that can be larger than the actual distance to the surface then the marching breaks. Such approximations are common in practice. They occur if you have some complex fractal that you can't get a true distance field for, or if you are applying any sort of warping or displacement function to a real SDF, or if you are using a non-Euclidian distance norm, and so on.
2. If the ray is moving perpendicular to the gradient of the distance field (i.e. the ray is parallel with a plane, slab or box) then this can be horrifically slow as you take a ton of tiny steps without ever getting closer.

In practice, most SDFs people use to do cool hings are not really SDFs and you probably need to change the naive r.o = r.o + distance * r.dir; marching step. Exactly what "more careful" means tends to be very scene-specific. Common tweaks include:
- Multiplying the step size with a tweakable global scale factor.
- Adaptively increasing the step size if the distance is changing slower than expected between iterations.
- Clamping the step size to some min and max bounds, then doing a binary search to refine the intersection point once you've concluded one exists.
Finding the right set of tweaks for your scene tends to be challenging. If you get them wrong then you get stuff like this where sometimes the marcher will fail to intersect the surface.

For non-SDF raymarching -- when your surface is implicitly defined with something like a bool is_inside(point p) -- it's common to just use a fixed step size, possibly with a binary search step to refine intersections. This can be very, very slow, which is why even approximate SDFs are nice.

E: The code initially used do-while for some reason, but I decided that this made me feel unclean.
OK but that example code would just draw a sphere(for example) as completely flat / indistinguishable from a solid circle, right? How is it shaded? Is the surface normal also computed?
Also I'm interested in creating a closed tessellated mesh (exporting to STL format) from this surface. I am starting to understand how it can be rendered on screen with shaders, but is there any particular approach to tesselating based on SDFs? Or would SDF be a poor fit for a task like that?

Xerophyte
Mar 17, 2008

This space intentionally left blank


peepsalot posted:

OK but that example code would just draw a sphere(for example) as completely flat / indistinguishable from a solid circle, right? How is it shaded? Is the surface normal also computed?
Also I'm interested in creating a closed tessellated mesh (exporting to STL format) from this surface. I am starting to understand how it can be rendered on screen with shaders, but is there any particular approach to tesselating based on SDFs? Or would SDF be a poor fit for a task like that?

For shading you need the surface normal, yes. For the surface of a solid defined by an SDF the normal can be computed: the normal of the surface is the normalized gradient of the distance field. In some cases the gradient can be computed analytically, i.e. for a sphere at point m_center with a radius of m_radius you'd do something like
C++ code:
void sdf(float3 p, float* sdf, float3* normal) {
  float3 sphere_to_point       = p - m_center;
  float  distance_from_center  = length(sphere_to_point);

  *sdf    = distance_from_center - m_radius;
  *normal = sphere_to_point / distance_from_center;
}
When that's not possible you can also compute the gradient numerically by computing the finite difference.
C++ code:
// Compute the gradient of a distance field by using the forward difference. This is cheap but can 
// have issues with floating point precision since you need a small EPSILON. Using a central
// difference is more accurate, but then you need to compute the SDF 6 times instead of 3 times.
float3 normalized_gradient(SDF sdf, float3 p, float sdf_at_p) {
  const float EPSILON = 1e-5f;
  float3      gradient =
      float3(sdf(float3(p.x + EPSILON, p.y, p.z)) - sdf_at_p,
             sdf(float3(p.x, p.y + EPSILON, p.z)) - sdf_at_p,
             sdf(float3(p.x, p.y, p.z + EPSILON)) - sdf_at_p);
  return gradient / length(gradient);
}
The finite difference approach is usually necessary since a lot of the more complex transforms (soft minimums, domain warps, etc) you can do on an SDF make it hard to compute the gradient analytically.

Texturing is trickier. You can do various projections (e.g. planar, cylindrical), but there are no custom uv wraps like you might do for a mesh. For fractal SDFs people sometimes texture with things like some iteration count or a projected coordinate.


I'm not really that familiar with meshing algorithms. I don't believe SDFs can be meshed in in a better way than any other implicit surface there. Marching cubes is dead simple and a good place to start if you want to make your own mesher, but the resulting mesh quality is pretty crappy. Higher-quality meshing algorithms exist but they're complex and involve various trade-offs. My impression is that you probably don't want to roll your own. CGAL is apparently a library that exists for this sort of thing, I have no idea how good it is.

Anecdotally, I know the approach Weta took for meshing the various SDF fractals they used for Ego in Guardians of the Galaxy 2 was to do a bunch of simple renders of the SDF, feed those renders into their photogrammetry software, then get a point cloud, then mesh that. I don't think I'd recommend that approach, but apparently it's good enough for production.

Xerophyte fucked around with this message at Oct 30, 2017 around 06:22

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today


Xerophyte posted:

I'm not really that familiar with meshing algorithms. I don't believe SDFs can be meshed in in a better way than any other implicit surface there.
The fact that you can fairly reliably extract a normal means they're easier to mesh well than things that only give you in/out, at least. Obviously that's only relevant if you're using a more advanced meshing algo than marching cubes.

Xerophyte posted:

Anecdotally, I know the approach Weta took for meshing the various SDF fractals they used for Ego in Guardians of the Galaxy 2 was to do a bunch of simple renders of the SDF, feed those renders into their photogrammetry software, then get a point cloud, then mesh that. I don't think I'd recommend that approach, but apparently it's good enough for production.
That's hilariously hacky.

Xerophyte
Mar 17, 2008

This space intentionally left blank


Ralith posted:

That's hilariously hacky.

Basically, their use case was that they were rendering scenes set inside of a giant Sierpinski gasket planet, and they want to feed to that geometry to their artists for manual edits. They first tried various more direct meshing approaches and they gave a uniform level of detail, which was either too big to use or lacked detail in the foreground. Feeding the photogrammetry software a bunch of renders from the right position gave them a point cloud of appropriate density, which became meshes that were good enough for artists to work with.

You could definitely just generate the point cloud from the SDF or directly generate a Delaunay triangulation which takes the camera position into account, but the photogrammetry round trip was good enough and saved time so...

peepsalot
Apr 24, 2007

††††††††PEEP THIS...
†††††††††††BITCH!



Are there any online calculator or utility that help create transformation matrices?

Xerophyte
Mar 17, 2008

This space intentionally left blank


peepsalot posted:

Are there any online calculator or utility that help create transformation matrices?

There's a bunch of matrix and linear algebra libraries that can help you. Eigen is pretty popular.

lord funk
Feb 16, 2004



Anyone have a good example of using touch movement to rotate a 3D object around its center axis? I know I have to transform the angle based on the camera view matrix, but I also have that problem where when you lift your touch and put it back down, the model matrix is oriented to its last position and doesn't match what you might consider 'up' and 'down'.

edit: nm this looks like a good one:
http://www.learnopengles.com/rotati...h-touch-events/

lord funk fucked around with this message at Nov 16, 2017 around 16:37

UncleBlazer
Jan 27, 2011



lord funk posted:

Anyone have a good example of using touch movement to rotate a 3D object around its center axis? I know I have to transform the angle based on the camera view matrix[...]

If you're happy doing matrix manipulation then it's cool but I'd recommend quaternions for rotations, I found them less of a headache. Not that it answers your touch issue though, sorry!

Zerf
Dec 17, 2004

I miss you, sandman


peepsalot posted:

Are there any online calculator or utility that help create transformation matrices?

I use WolframAlpha quite much; it's super handy. For example, it can do symbolic matrix inversions etc.

What are you after specifically?

lord funk
Feb 16, 2004



UncleBlazer posted:

If you're happy doing matrix manipulation then it's cool but I'd recommend quaternions for rotations, I found them less of a headache. Not that it answers your touch issue though, sorry!

Yep, quaternions are great!

New question: I want to do a kind of hybrid ortho / projection display of 3D objects. I want to be able to project them, but then translate along the screen's 2D plane wherever I want them on the screen. Like this:



So the hex shapes there are orthographic projection, and I want the shapes to 'hover' over them.

How should I do this? I thought I could make a vertex shader that just takes the final position and translates the x/y coordinate, but that's still in 3D projected view space. I want to just shift the final projected image.

Absurd Alhazred
Mar 27, 2010

BETTER LIBERAL THAN SKELETAL!


lord funk posted:

Yep, quaternions are great!

New question: I want to do a kind of hybrid ortho / projection display of 3D objects. I want to be able to project them, but then translate along the screen's 2D plane wherever I want them on the screen. Like this:



So the hex shapes there are orthographic projection, and I want the shapes to 'hover' over them.

How should I do this? I thought I could make a vertex shader that just takes the final position and translates the x/y coordinate, but that's still in 3D projected view space. I want to just shift the final projected image.

If you just translate them they're going to get projected towards the center of the screen rather than where you want them.

I think your best bet is to perspective render them to a texture once, and then write that texture orthographically in multiple places.

Xerophyte
Mar 17, 2008

This space intentionally left blank


Caveat: I don't code much for GPUs, so I may have gotten some of the normalized device coordinate vs clip space stuff backwards.

If you want an in-viewport translation with the perspective preserved then you can also just change the viewport transform. Render to texture makes sense if you intend to reuse the result in several places or over several frames.

If you actually want to center the object at some specific screen space position with a perspective appropriate for that position, then you basically need to reverse-project the clip space or normalized device space coordinate. Say you have an object currently at world position pworld. You want to put it at some device coordinate (xNDC, yNDC, _) (I'm going to assume we don't care that much about the depth). What world space position pw' does that match?
1. Compute the object's current clip space position pclip = (xclip, yclip, zclip, wclip) = projection * view * (p, 1).
2. Modify the clip space x and y coordinate so they match the NDC you want. pclip' = (xNDC * wclip, yNDC * wclip, zclip, wclip).
3. Invert the view + projection to take your pclip' back to world space. (pworld', 1) = invert(projection * view) * pclip'.
4. Translate the object from pworld to pworld'. It'll show up at the NDC position you want.

This technically relies on the clip space w coordinate not changing with translations in the clip space's XY plane, which it doesn't so we should be good. You can probably simplify the matrix math but that depends on the specifics of your projection for the device.

Absurd Alhazred
Mar 27, 2010

BETTER LIBERAL THAN SKELETAL!


Xerophyte posted:

If you want an in-viewport translation with the perspective preserved then you can also just change the viewport transform. Render to texture makes sense if you intend to reuse the result in several places or over several frames.

Yeah, the problem with multiple viewports is that you have to change viewports per instance. Now, it's not really all that difficult to do that, most modern GPUs will both support it in the geometry shader and allow an extension to have it pushed to an option in the vertex shader instead, but from the use case it seems better to just render once, use many, and render to texture is the best way to do that, I think. It really depends on the use-case, you're right if the image needs to change every frame (although you could just render all frames to a sprite-sheet and then sample a different one each frame, assuming there is a finite set of sprites that will cover that for you).

lord funk
Feb 16, 2004



Xerophyte posted:

If you want an in-viewport translation with the perspective preserved then you can also just change the viewport transform. Render to texture makes sense if you intend to reuse the result in several places or over several frames.

Yeah that makes total sense! Thanks for the approach details.

I do want to render the objects each frame, so they can react to environment lighting changes.

Absurd Alhazred
Mar 27, 2010

BETTER LIBERAL THAN SKELETAL!


lord funk posted:

Yeah that makes total sense! Thanks for the approach details.

I do want to render the objects each frame, so they can react to environment lighting changes.

You could take care of that with a normal map, but yeah, maybe instancing with multiple viewports is the thing to do.

Zerf
Dec 17, 2004

I miss you, sandman


lord funk posted:

Yeah that makes total sense! Thanks for the approach details.

I do want to render the objects each frame, so they can react to environment lighting changes.

Heh, funny you should bring this up. I just implemented this last week. My solution was to handle it in the shader. Since perspective transform is non-linear, it means that each affected vertex now needs to be multiplied by two matrices instead of one and some meddling between the multiplications. Quite a simple solution, but it works well.

lord funk
Feb 16, 2004



Zerf posted:

Heh, funny you should bring this up. I just implemented this last week. My solution was to handle it in the shader. Since perspective transform is non-linear, it means that each affected vertex now needs to be multiplied by two matrices instead of one and some meddling between the multiplications. Quite a simple solution, but it works well.

Really? Cool Would you be willing to share a bit of your shader transformations code? I tried Xerophyte's answer, but I'm probably just messing something up along the way.

Again just to be clear we're on the same page this is what I'm on about :

https://i.imgur.com/R2LKQY3.mp4

lord funk
Feb 16, 2004



Oh my god what is it about posting on the internet that you immediately figure it out. Done. Thanks all, and especially thanks Xerophyte cause your answer was awesome.

Absurd Alhazred
Mar 27, 2010

BETTER LIBERAL THAN SKELETAL!


I finally got some geometry shading going in our codebase. Not too difficult to do, but are there any nice best-practices guides? I know you're not suppose to multiply too many primitives, but it is standard to use it for points -> billboards, right?

Suspicious Dish
Sep 24, 2011



Fun Shoe

Absurd Alhazred posted:

I finally got some geometry shading going in our codebase. Not too difficult to do, but are there any nice best-practices guides?

don't

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today


Absurd Alhazred posted:

I know you're not suppose to multiply too many primitives, but it is standard to use it for points -> billboards, right?
This is massive overkill, and I wouldn't be shocked if it actually performed slower than, say, instancing.

Absurd Alhazred
Mar 27, 2010

BETTER LIBERAL THAN SKELETAL!


Ralith posted:

This is massive overkill, and I wouldn't be shocked if it actually performed slower than, say, instancing.

So having most of the information in the instance variables and applying it to a single quad is better than having the same number of vertices and using a geomtry shader to expand them into quads?

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today


Absurd Alhazred posted:

So having most of the information in the instance variables and applying it to a single quad is better than having the same number of vertices and using a geomtry shader to expand them into quads?
That's my intuition, yes. Geometry shaders are a big hammer, for when there's no other viable approach, i.e. when your geometry is actually unpredictable in advance. Just turning them on may make your driver pessimize the whole pipeline.

The only way to be sure is to find or build some benchmarks that model your usecase on your hardware, of course.

Suspicious Dish
Sep 24, 2011



Fun Shoe

instancing is likely slower than geometry shaders, too

http://www.joshbarczak.com/blog/?p=667

schme
May 28, 2013


I have failed to find what this is about :

code:
vec3 thing = (0., 1., time);
It seems to just make thing equal to the rightmost thing, but I've no idea why that syntax works or what it does. Found it while removing a function in Kodelife.. My google-fu is weak.

Doc Block
Apr 15, 2003

ProRes 4444 Only


Fun Shoe

A vec3 holds 3 values (presumably floats), so thatís assigning all three values at once.

schme
May 28, 2013


Ah right. It works with a float too, so I guess it just assigns each one after the other.

Suspicious Dish
Sep 24, 2011



Fun Shoe

vec3 foo = 1.0; is the same as vec3 foo = vec3(1.0); is the same as vec3 foo = vec3(1.0, 1.0, 1.0);

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today


schme posted:

I have failed to find what this is about :

code:
vec3 thing = (0., 1., time);
It seems to just make thing equal to the rightmost thing, but I've no idea why that syntax works or what it does. Found it while removing a function in Kodelife.. My google-fu is weak.
You're using the comma operator, which evaluates to the right-hand value. The other two are being discarded.

Doc Block
Apr 15, 2003

ProRes 4444 Only


Fun Shoe

Ah, right, missed that the code is doing vec3 thing = (0., 1., time); instead of vec3 thing = vec3(0., 1., time);.

schme
May 28, 2013


Alright, thanks everyone. For completions sake: all expressions are run, but the last one is returned as a value. Therefore you can do stuff like (blatantly stolen from stackoverflow):
code:
string s;
while(read_string(s), s.len() > 5)
{
   //do something
}
Some of these are C/C++ examples but I assume they work the same in GLSL, couldn't confirm.

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today


Side-effects are rarer in GLSL, so it's even more esoteric a pattern, but yeah the semantics are the same.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

Absurd Alhazred posted:

So having most of the information in the instance variables and applying it to a single quad is better than having the same number of vertices and using a geomtry shader to expand them into quads?

Geometry shader billboards is bad
4 vertex instances is probably worse (depending on your ve count)

The best approach is to use 32-vertex instances made up of 8 quads, so your "billboard Id" will be "(InstanceID << 3) + (VertexId >> 2)" and your "billboard vertex id" will be "VertexId & 3" (on Nvidia -- can't speak for Intel/amd)

Absurd Alhazred
Mar 27, 2010

BETTER LIBERAL THAN SKELETAL!


Hubis posted:

Geometry shader billboards is bad
4 vertex instances is probably worse (depending on your ve count)

The best approach is to use 32-vertex instances made up of 8 quads, so your "billboard Id" will be "(InstanceID << 3) + (VertexId >> 2)" and your "billboard vertex id" will be "VertexId & 3" (on Nvidia -- can't speak for Intel/amd)

Any specific reason for that, or is it empirical?

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

Absurd Alhazred posted:

Any specific reason for that, or is it empirical?

Sorry, was phone-posting!

Geometry Shaders: The reason they're usually bad is because DirectX did not provide any relaxation to the "Rasterization Order" requirement -- the primitives must be rasterized downstream in the exact order in which they are generated (at least in circumstances where they would overlap). This can become a problem if you do expansion (or culling) in the GS because now each GS invocation has to serialize to make sure the outputs are written to a buffer for later rasterization in the right order. It might not be an issue if you're not actually geometry-limited, but it's generally something to be concerned about. Slow isn't useless though, and NVIDIA has come up with some cool ways to use the GS without invoking any major performance penalty (like Multi-Projection) but it has a bad rap in general.

Quad-Per-Instance: GPUs are wide processors. NVIDIA shader units essentially process 32 threads in parallel each instruction (and they have many such shader units). One quirk is that, at least on some iterations of the hardware, a given 32-thread "warp" can only process one instance at a time when executing vertex shaders. This means that if you have a 4-vertex instance then 4 threads are going to be enabled and 28 threads are going to be predicated off (essentially idle). Your vertex processing will be running at 12.5% efficiency! If you're doing particle rendering it might be that you're going to be pixel shader/blending rate bound before the vertex shading becomes an issue, but often you have enough vertices that it will bite you.

So if you use instancing (but have multiple quads/instance so you are using all 32 threads) then you avoid all these potholes.

Graphics is fun!

Absurd Alhazred
Mar 27, 2010

BETTER LIBERAL THAN SKELETAL!


Hubis posted:

Sorry, was phone-posting!

Geometry Shaders: The reason they're usually bad is because DirectX did not provide any relaxation to the "Rasterization Order" requirement -- the primitives must be rasterized downstream in the exact order in which they are generated (at least in circumstances where they would overlap). This can become a problem if you do expansion (or culling) in the GS because now each GS invocation has to serialize to make sure the outputs are written to a buffer for later rasterization in the right order. It might not be an issue if you're not actually geometry-limited, but it's generally something to be concerned about. Slow isn't useless though, and NVIDIA has come up with some cool ways to use the GS without invoking any major performance penalty (like Multi-Projection) but it has a bad rap in general.

Quad-Per-Instance: GPUs are wide processors. NVIDIA shader units essentially process 32 threads in parallel each instruction (and they have many such shader units). One quirk is that, at least on some iterations of the hardware, a given 32-thread "warp" can only process one instance at a time when executing vertex shaders. This means that if you have a 4-vertex instance then 4 threads are going to be enabled and 28 threads are going to be predicated off (essentially idle). Your vertex processing will be running at 12.5% efficiency! If you're doing particle rendering it might be that you're going to be pixel shader/blending rate bound before the vertex shading becomes an issue, but often you have enough vertices that it will bite you.

So if you use instancing (but have multiple quads/instance so you are using all 32 threads) then you avoid all these potholes.

Graphics is fun!

This is very informative. Thanks!

Zerf
Dec 17, 2004

I miss you, sandman


Hubis posted:

Sorry, was phone-posting!

Geometry Shaders: The reason they're usually bad is because DirectX did not provide any relaxation to the "Rasterization Order" requirement -- the primitives must be rasterized downstream in the exact order in which they are generated (at least in circumstances where they would overlap). This can become a problem if you do expansion (or culling) in the GS because now each GS invocation has to serialize to make sure the outputs are written to a buffer for later rasterization in the right order. It might not be an issue if you're not actually geometry-limited, but it's generally something to be concerned about. Slow isn't useless though, and NVIDIA has come up with some cool ways to use the GS without invoking any major performance penalty (like Multi-Projection) but it has a bad rap in general.

Quad-Per-Instance: GPUs are wide processors. NVIDIA shader units essentially process 32 threads in parallel each instruction (and they have many such shader units). One quirk is that, at least on some iterations of the hardware, a given 32-thread "warp" can only process one instance at a time when executing vertex shaders. This means that if you have a 4-vertex instance then 4 threads are going to be enabled and 28 threads are going to be predicated off (essentially idle). Your vertex processing will be running at 12.5% efficiency! If you're doing particle rendering it might be that you're going to be pixel shader/blending rate bound before the vertex shading becomes an issue, but often you have enough vertices that it will bite you.

So if you use instancing (but have multiple quads/instance so you are using all 32 threads) then you avoid all these potholes.

Graphics is fun!

Nice, currently sitting doing some Vulkan stuff, enjoying bindless, and doing a lot of stuff in batches, so this was really informative. But I take it then, that instancing in itself isn't that bad, it's just the extremes when you get really low vertex count per instance?

Adbot
ADBOT LOVES YOU

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

Zerf posted:

Nice, currently sitting doing some Vulkan stuff, enjoying bindless, and doing a lot of stuff in batches, so this was really informative. But I take it then, that instancing in itself isn't that bad, it's just the extremes when you get really low vertex count per instance?
Exactly.

Instancing Is Good (tm) because it can dramatically reduce the CPU->GPU memory traffic and potentially let the GPU do smart things about pre-loading instance data (as oposed to doing effectively the same thing with just regular Draw/DrawIndexed and either transforming the verts on the CPU and map/unmapping the vertex buffer or putting the data into a structured buffer and doing it in VS). In theory you run into that inefficiency any time your instance vertex count is not a multiple of 32, but for most larger instances it's in the noise anyways.

And again, this isn't to say that small instances are inherently bad, just that it reduces efficiency in the VS in such a way that it might become a worse performance limiter than whatever you are trying to fix.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply
«69 »