Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Paniolo posted:

Yes, you are assuming that performance data gathered from a debug build means anything at all. That's seriously wrong.
Can someone else please confirm or deny this, because to me it seems a ridiculous premise that a debug build will have a performance problem where a release build will have no performance problem (not ridiculous that it will perform slower, that's a given, but that it will specifically perform badly, below a reasonable expectation for a given operation.)

Adbot
ADBOT LOVES YOU

MasterSlowPoke
Oct 9, 2005

Our courage will pull us through
How about you simply compile it for release and profile it there?

wellwhoopdedooo
Nov 23, 2007

Pound Trooper!

roomforthetuna posted:

Can someone else please confirm or deny this, because to me it seems a ridiculous premise that a debug build will have a performance problem where a release build will have no performance problem (not ridiculous that it will perform slower, that's a given, but that it will specifically perform badly, below a reasonable expectation for a given operation.)

No.

e: Also, if it's faster than 60 FPS but you want to see relative performance, why not take the cap off?

wellwhoopdedooo fucked around with this message at 14:38 on May 3, 2011

HauntedRobot
Jun 22, 2002

an excellent mod
a simple map to my heart
now give me tilt shift
Trying to enable multisampling in OpenGL and I'm missing something. I am using glew elsewhere, but I know that all this stuff has to be initialised before glew, so there's been no glewInit() yet. This is the (I think) minimal failing code in my window setup routine.

code:
  (snipped)
Everything goes great up until the call to wglChoosePixelFormatARB which crashes with a segfault, though it appears to be pointing at a valid function. Why am I crashing out here?

Edit: Answer - it's NOT crashing there, the debugger just can't cope with functions defined that way and skips a bit, crashing in a perfectly sensible place later. Carry on.

HauntedRobot fucked around with this message at 16:11 on May 3, 2011

haveblue
Aug 15, 2005



Toilet Rascal

roomforthetuna posted:

Can someone else please confirm or deny this, because to me it seems a ridiculous premise that a debug build will have a performance problem where a release build will have no performance problem (not ridiculous that it will perform slower, that's a given, but that it will specifically perform badly, below a reasonable expectation for a given operation.)

If the compiler optimization setting is different between debug and release then performance could indeed be radically different in certain areas. Also, running in debug mode may be enabling extra logging, doing more bounds/sanity checks, skipping optimizing data transformations, and so on in the system libraries, especially inside 3D graphics drivers.

"Badly" is subjective but you could gain 20-30% performance just by switching from debug to release, depending on what you are doing.

Deep Dish Fuckfest
Sep 6, 2006

Advanced
Computer Touching


Toilet Rascal
It's not entirely graphics related, but if you want an idea of just how much difference debug/release builds can make, here's an example. Some time ago I was doing some physics stuff and I looked around for a linear algebra library because gently caress implementing that myself. I stumbled on a library (Eigen) that makes heavy use of templates and promises both solid performances and ease of use. I shoved it into my application, made a really simple test with a deformable cube, and compiled with a debug profile.

That gave me about 10 FPS. Obviously that wasn't acceptable, so I messed around for a couple hours until I was ready to give up and decided to try a release build because, hey, I might get all the way up to 12 FPS. I ended up with ~2000 FPS.

That's an extreme case, but it can happen.

Paniolo
Oct 9, 2007

Heads will roll.

YeOldeButchere posted:

That gave me about 10 FPS. Obviously that wasn't acceptable, so I messed around for a couple hours until I was ready to give up and decided to try a release build because, hey, I might get all the way up to 12 FPS. I ended up with ~2000 FPS.

If it used STL that's not too surprising - checked iterators can add a pretty enormous performance penalty.

The main point here is that a debug build isn't 30% slower because every function is exactly 30% slower - it's slower because it introduces bottlenecks that don't occur in release builds. It's not uncommon to profile a debug build and see most of the time spent in a function which would be inlined out in a release build.

With DirectX the debug runtime does a ton of validation that the release runtime doesn't do. That can create bottlenecks in functions which would otherwise be a simple memcpy. Since you don't have access to the DirectX source code, if you profile an operation as taking a long time in debug mode, you really have no way of knowing if it's a genuine performance issue, or slowdown caused by the debug layer. Hence, the performance data you're collecting in debug mode is worthless.

Just to be as clear as possible, the point isn't "switch to release mode and your problem will go away", it's "you cannot effectively troubleshoot your problem, or even be sure you have one, in a debug build."

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

YeOldeButchere posted:

It's not entirely graphics related,
Oh, yeah, I know a debug build in general can make a huge difference in performance to some things, I just don't think using the debug DirectX would make the huge performance problem I was seeing, because it only does anything different at the boundaries, not in the performance-critical stuff (and indeed it didn't - I did a release build using the proper runtime and it was still only 58fps versus 50fps in the debug build).

Anyway, now I feel really stupid because I found what I was doing wrong - I may not have been using a reference renderer, but I was using software vertex processing!
:ughh: :downs: etc.

On the bright side I suppose, accidentally using software vertex processing did get me to make my vertex shader a little more efficient! Sigh. Always remember to undo a little debugging test change! My graphics library had been initializing with software vertex processing for probably the last 2 months or so.

Deep Dish Fuckfest
Sep 6, 2006

Advanced
Computer Touching


Toilet Rascal

Paniolo posted:

Just to be as clear as possible, the point isn't "switch to release mode and your problem will go away", it's "you cannot effectively troubleshoot your problem, or even be sure you have one, in a debug build."

Yeah, I know that, I was just giving an extreme example of how debug builds can do weird things performance-wise in general.

And no, it wasn't due to the STL. It's because the library is built with layers upon layers of templates which allow pretty complex linear algebra expressions that get broken up into inlined vectorized code by the compiler if optimizations are on. I was honestly impressed by what C++ templates could do.

Madox
Oct 25, 2004
Recedite, plebes!

roomforthetuna posted:

Can someone else please confirm or deny this, because to me it seems a ridiculous premise that a debug build will have a performance problem where a release build will have no performance problem (not ridiculous that it will perform slower, that's a given, but that it will specifically perform badly, below a reasonable expectation for a given operation.)

I can confirm from my own experience that when you are using DirectX in debug mode (set in the DirectX control panel) it does a lot more sanity checks and bad state detection than in release mode. Release mode assumes you are doing everything right and fires it all through with almost no checks.

Also its been a few years since I did C++ (all XNA now) but I never call these functions:
pD3DDevice->SetVertexDeclaration();
pD3DDevice->SetStreamSource();
pD3DDevice->SetIndices();
from inside a BeginPass/EndPass

I'll dig up some old code to see what I did.

Edit: I am a moron, I made my calls the exact same way you have it posted.

Madox fucked around with this message at 18:24 on May 6, 2011

a slime
Apr 11, 2005

I don't really know where to post this but I think it fits here. I'm having trouble wrapping my head around a problem and I think at this point I've spent too long staring at it to see a simple solution.

I have a 3d polygon displayed with an orthographic projection. For an arbitrary rotation, I want to be able to "snap" the vertices to the nearest point on a 2d grid overlaid on the orthographic projection.

Right now what I do is the following: I use gluProject on three axis aligned unit vectors, then subtract a projected zero vector from each to get x_part, y_part, and z_part- each 3d axis' affect on 2d translation in the projection. Then I take the minimum nonzero component of each to be x_width, y_width, and z_width, and use (2D_GRID_SIZE / foo_width) as the width of a grid on foo's respective axis. Any vertex snapped to this 3d grid will align with the 2d grid on the scene's projection, and I can do this with some simple rounding magic.

First of all, this solution is not very general and makes a million assumptions, the most ugly of which is that everything breaks if the 2d grid has a different size on each axis. Second of all, everything breaks if the origin is not on a gridpoint.

I arrived here by trial and error and I can't really justify anything that I've done so far. Any ideas? Can anyone give me an idea where to start in building a general solution to this problem? I feel like there has to be an obvious solution that I just completely missed.

edit: think I got it... Post following shortly

a slime fucked around with this message at 14:53 on May 10, 2011

HauntedRobot
Jun 22, 2002

an excellent mod
a simple map to my heart
now give me tilt shift
Edit: Once again, stupid problem that had nothing to do with the code I posted.

HauntedRobot fucked around with this message at 14:37 on May 16, 2011

PDP-1
Oct 12, 2004

It's a beautiful day in the neighborhood.
I've got an odd problem where the bottom 1/10th of the screen or so seems to lag behind the rest of the scene when the camera rotates. It doesn't show up in screen shots, but looks like this simulated pic in the live program if the camera was rotating counter-clockwise:



Any ideas? This is DX9 via XNA if that matters. Everything looks fine the second the camera is stopped.

haveblue
Aug 15, 2005



Toilet Rascal
That is "tearing" and it happens because your screen updates are out of synch with the monitor displaying the new image.

http://msdn.microsoft.com/en-us/library/bb174576(VS.85).aspx

Mata
Dec 23, 2003
Can OpenGL triangle strips only ever be 1 triangle "tall"? For example this image I would describe as 1x4 triangles. What's the simplest way to instead draw 4x4 triangles? Should I use multiple index buffers on one vertex buffer?

Mata fucked around with this message at 08:16 on Jun 20, 2011

zzz
May 10, 2008

Mata posted:

Can triangle strips only ever be 1 triangle "tall"? For example this image I would describe as 1x4 triangles. What's the simplest way to instead draw 4x4 triangles? Should I use multiple index buffers on one vertex buffer?

You can use degenerate triangles: by repeating vertices, you create invisible triangles that you can use to include a discontinuity in a single triangle strip (like a jump to a second layer)

If it's supported, you can also use an index of -1 (0xffff or 0xffffffff) to do the same. This is called strip-cut index in D3D and the Primitive Restart extension in OGL.

Mata
Dec 23, 2003

zzz posted:

You can use degenerate triangles: by repeating vertices, you create invisible triangles that you can use to include a discontinuity in a single triangle strip (like a jump to a second layer)

If it's supported, you can also use an index of -1 (0xffff or 0xffffffff) to do the same. This is called strip-cut index in D3D and the Primitive Restart extension in OGL.

Looks like primitive restart is exactly what I was looking for :) thanks!

ShinAli
May 2, 2003

The Kid better watch his step.
I don't know if I asked before in this thread but I'll go ahead.

How would I go about making multiple lights in my phong lighting shader? What I've done is have a uniform array attribute which I pass lighting information like position/direction/size/type of a fixed size (say 100) and loop through about a 100 times through the array. From this thread I've heard that a variable loop is pretty bad so I kept it fixed at 100 and put in an if statement to see if the current element is enabled. If there are more than a 100 lights, I just render the scene again with the lights it didn't go through and blend it with the previous rendered scene.

I'm not sure if this is the right way, and if you guys want I'll put up the source code. It has some if statements in it anyways and I'm not sure how well shaders handle branching.

I'd also liked to know how to handle attenuation of spot lights as everywhere I looked, they seem to use fixed values. I assumed I'd just linearly make less light depending on the distance but went with the fixed attenuation values.

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe

ShinAli posted:

I don't know if I asked before in this thread but I'll go ahead.

How would I go about making multiple lights in my phong lighting shader? What I've done is have a uniform array attribute which I pass lighting information like position/direction/size/type of a fixed size (say 100) and loop through about a 100 times through the array. From this thread I've heard that a variable loop is pretty bad so I kept it fixed at 100 and put in an if statement to see if the current element is enabled. If there are more than a 100 lights, I just render the scene again with the lights it didn't go through and blend it with the previous rendered scene.

I'm not sure if this is the right way, and if you guys want I'll put up the source code. It has some if statements in it anyways and I'm not sure how well shaders handle branching.

I'd also liked to know how to handle attenuation of spot lights as everywhere I looked, they seem to use fixed values. I assumed I'd just linearly make less light depending on the distance but went with the fixed attenuation values.

If you can give up blended transparency look into using 'deferred shading', it's complicated but oh so good for lots of lights. (and there's ways to get transparency back if you really want it)

ShinAli
May 2, 2003

The Kid better watch his step.

Unormal posted:

If you can give up blended transparency look into using 'deferred shading', it's complicated but oh so good for lots of lights. (and there's ways to get transparency back if you really want it)

That's exactly what I've been using, and I seem to be able to go to about a 1000 lights before it slows down below 30 fps. I just don't know if I'm doing it right.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

ShinAli posted:

I'd also liked to know how to handle attenuation of spot lights as everywhere I looked, they seem to use fixed values.
angle = dot(normalize(point - lightOrigin), normalize(lightDirection))
attenuation = saturate((angle - cosMaxAngle) / (cosMinAngle - cosMaxAngle))

Min angle = Minimum angle where the light source stops being visible at full intensity (might still be visible through a diffuser)
Max angle = End of the fade-out angle, i.e. the angle where neither the light source nor diffusers are visible.

OneEightHundred fucked around with this message at 21:36 on Jul 15, 2011

haveblue
Aug 15, 2005



Toilet Rascal

ShinAli posted:

I'm not sure how well shaders handle branching.

Very, very poorly. Avoid if at all possible.

Depending on what you are doing it may be faster to evaluate both branches and multiply the one you don't want to use by zero before combining it with the final result.

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe

ShinAli posted:

That's exactly what I've been using, and I seem to be able to go to about a 1000 lights before it slows down below 30 fps. I just don't know if I'm doing it right.

Generally if you're using a deferred shader, you shouldn't be branching in a single shader, you should be rendering a single quad (or sphere or whatever) per light volume.

ShinAli
May 2, 2003

The Kid better watch his step.

OneEightHundred posted:

angle = dot(normalize(point - lightOrigin), normalize(lightDirection))
attenuation = saturate((angle - cosMaxAngle) / (cosMinAngle - cosMaxAngle))

Min angle = Minimum angle where the light source stops being visible at full intensity (might still be visible through a diffuser)
Max angle = End of the fade-out angle, i.e. the angle where neither the light source nor diffusers are visible.

Argh, I actually meant point lights but this is still helpful as I need to implement spot lights anyways.

For point lights, I'd assume you'd use two "sizes" where one is full intensity and the other is where the fall off would end. Would I just measure up the fall off size in some proportion of the intensity size? I'm trying to think on how to use angles as a part of this but it would seem that I need to know the range of the light anyways before I can take the angle into consideration.

Unormal posted:

Generally if you're using a deferred shader, you shouldn't be branching in a single shader, you should be rendering a single quad (or sphere or whatever) per light volume.

I mostly wanted to batch as many lights as possible in a single pass to avoid doing a draw call for every light. Does it not matter as much if I do one light per pass?

haveblue posted:

Very, very poorly. Avoid if at all possible.

Depending on what you are doing it may be faster to evaluate both branches and multiply the one you don't want to use by zero before combining it with the final result.

Actually I don't know why I didn't think of that, as I use a 1.0 for on and 0.0 for off.

ShinAli fucked around with this message at 22:37 on Jul 15, 2011

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

ShinAli posted:

For point lights, I'd assume you'd use two "sizes" where one is full intensity and the other is where the fall off would end.
Distance falloff, if you want to be realistic, is just inverse square. Easy way to think of this is if you take a picture of a 1x1 square, then take a picture of it at twice the distance away, it would take up half as much distance in each direction, resulting in a quarter as much area. Since light is collected or distributed uniformly across the entire photo, that means half as much hits the camera.

You can clamp the distance at a minimum value to avoid "hot spots" from the hyperbolic growth, but that distance is usually best kept low.

There is no distance where a light will not actually affect a visible surface, but if you want to cull away negligibly-affected surfaces, a common criteria is sqrt(intensity*256), which is the distance where the light would fail to change the value of a pixel by itself. Intensity in this case would be the highest of the three RGB intensity values.

OneEightHundred fucked around with this message at 18:02 on Jul 17, 2011

PDP-1
Oct 12, 2004

It's a beautiful day in the neighborhood.
I ran into something I don't understand today while working on a shader - I was sampling a mipmapped texture and the shader ran fine. Then I changed the texture and forgot to generate the mipmap and the framerate absolutely tanked. When I generated the mipmap on the new texture things ran great again.

The obvious conclusion is to be sure to use mipmaps, but I don't understand why that makes such a difference in the framerate. Sampling a texture is sampling a texture, and if anything I'd have guessed that translating the UV coords to the mipmap would more work for the GPU.

Why is sampling a mipmapped texture so much faster than sampling a non-mipmapped texture?

haveblue
Aug 15, 2005



Toilet Rascal
Two possibilities:
  • The higher mip levels fit into cache better.
  • The driver is doing something dumb like generating the mip levels on-demand for each fragment evaluated.

shodanjr_gr
Nov 20, 2007

PDP-1 posted:

I ran into something I don't understand today while working on a shader - I was sampling a mipmapped texture and the shader ran fine. Then I changed the texture and forgot to generate the mipmap and the framerate absolutely tanked. When I generated the mipmap on the new texture things ran great again.

The obvious conclusion is to be sure to use mipmaps, but I don't understand why that makes such a difference in the framerate. Sampling a texture is sampling a texture, and if anything I'd have guessed that translating the UV coords to the mipmap would more work for the GPU.

Why is sampling a mipmapped texture so much faster than sampling a non-mipmapped texture?

Better data locality. If you map the same region of the texture to a surface, the mipmap one requires fewer memory fetches.

PDP-1
Oct 12, 2004

It's a beautiful day in the neighborhood.
I doubt that the driver is generating anything on-demand since the un-mipped texture turns into visual noise at long draw distances. This was my clue to look at the mipmap status to begin with, but also suggests that the full size texture is being used directly if lower detail levels aren't available.

The cache/data locality issue seems like it could be the cause. I have a shitton of data set in vertex buffers so it is likely that a lot of cache swapping is going on in general and loading the full 512x512 texture + bumpmap would take some time.

Thanks for the help.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

haveblue posted:

The driver is doing something dumb like generating the mip levels on-demand for each fragment evaluated.
I think something sort of similar to this can happen you have anisotropic filtering enabled on a non-mipmapped surface.

If this is with D3D, remember that the sampler state is not part of the texture state like it is with OpenGL, so it's possible that the sampler state doesn't match up with how the texture data is stored.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!
I'm thinking about rendering a translucent spheroid as a "shield", and it strikes me that rendering it as simply a colored polygonal approximation of a spheroid with an alpha value will result in a totally flat appearance on screen. In reality, a transparent spheroid would appear both front and back, and 'denser' at the edges because you're looking through a thicker piece of the surface. Would this sort of effect be done with a shader that increases the alpha the more perpendicular the normal is to the camera?

Alternatively, what is a nicer way of rendering a visible forcefield around an object?

haveblue
Aug 15, 2005



Toilet Rascal
That would be the easiest way to fake it, yes. Comparing the normal to the camera vector is just a dot product.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
Just curious, having seen this on the feature list of OpenGL 4.2:

quote:

modifying an arbitrary subset of a compressed texture, without having to re-download the whole texture to the GPU for significant performance improvements;

Couldn't you already do that with glCompressedTexSubImage2D?

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!
Vector math question - if I have an "up" and "forward" vector that I manipulate (in an ongoing cumulative manner) using pitch and roll rotations, how can I correct for the creep of float inaccuracy? I can renormalize them, obviously, but I imagine they'd still slowly creep away from being at right angles - how would I best bring them back in order?

Thinking something like, in pseudocode:
code:
normalize(forward);
vector right=crossproduct(forward,up);
up=crossproduct(right,forward);
normalize(up);
Would that do the trick?

roomforthetuna fucked around with this message at 04:24 on Aug 21, 2011

ynohtna
Feb 16, 2007

backwoods compatible
Illegal Hen
Yeah, orthonormalizing axes is a common need when you're independently tweaking them over time. Have a look around - there's a few algorithms which minimize inaccuracy.

Alternatively: quaternions.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

ynohtna posted:

Alternatively: quaternions.
Doh, I knew that, but I've not had the call to use them for about a year now so it didn't occur to me. Now refactoring all the code over would probably be more trouble than it's worth. Thanks.

lord funk
Feb 16, 2004

I'm getting some pretty crusty lines with OpenGL ES 2.0. I am multisampling / anti-alising, but lines with glLineWidth(1.0) are pretty chunky:


(image zoomed a bit)

Is there a common solution to this that I'm missing?

FlyingDodo
Jan 22, 2005
Not Extinct
I am new to GL 4.1 and haven't made use of shaders before. I have a GLSL shader which has an in vec3 variable called v_position in the vertex shader and an out vec4 called outputColour in the fragment shader. I'm am wondering why does it still work if I comment out the following code:

code:
	glBindFragDataLocation(m_shaderId, 0, "outputColour");
	glBindAttribLocation(m_shaderId,0,"v_position");
As far as I know this binds the fragment variable outputColour to the screen buffer, and the vertex variable v_position to attribute pointer index 0. Surely this is needed otherwise how would opengl know what to do?

pseudorandom name
May 6, 2007

Psychic debugging: your shader uses the location keyword.

Adbot
ADBOT LOVES YOU

FlyingDodo
Jan 22, 2005
Not Extinct
It doesn't, which adds to my confusion.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply