Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Orzo
Sep 3, 2004

IT! IT is confusing! Say your goddamn pronouns!

HappyHippo posted:

Could you post some of the code, (especially the vertex declaration and the shader code)? How different is SlimDX from XNA? I don't have experience with SlimDX but I do with XNA (like VertexDeclaration). I assume you are using HLSL.
Thanks, forgot about this, I actually figured it out: when I switched to my own vertex shader, my coordinate system went from (0,0)-(sizeW,sizeH) (where size was the resolution more or less) to (-1,-1)-(1,1).

Adbot
ADBOT LOVES YOU

Seabert
Apr 13, 2008
Hey guys,

I recently bought a WP7 phone and decided to create a Sim City clone game on it using XNA. Problem is, how to handle the tiles? Especially without access to custom shaders!

I definitely want to use 3D tiles so I came up with the following options:

1) Create a vertexbuffer for every tile. Easy to change the texture when a road is placed or something. However it seems like a horrible waste of resources for such small buffers.

2) Create a vertexbuffer for a bunch of tiles and use a texture atlas. But this means that if I want to change the texture of a tile, I would need to modify the texture UV coordinates and rebuild the vertexbuffer quite often.

Does anyone have any idea's / tips?

Thanks in advance!

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!
Isn't it possible to lock only a subset of a large vertex buffer? So if a tile changes you could lock only the part of the vertex buffer that comprises that tile, rather than rebuild the whole buffer. (I know you can do that in DirectX but I'm not sure about XNA.)

For a Sim City style game I like the idea of using a texture atlas because it means you can do your Sim City style animation just by having a few copies of the texture each at different stages of the animation for every tile-type. One texture-select per frame and everything's animated! (It's also all unfortunately very synchronised, but I think Sim City was like that anyway.)

Edit: Rebuilding the whole vertex buffer on a tile change would probably be okay anyway really, if you can't lock parts of a buffer - tiles change only on user input or rare events, not every frame.

Seabert
Apr 13, 2008

roomforthetuna posted:

Isn't it possible to lock only a subset of a large vertex buffer? So if a tile changes you could lock only the part of the vertex buffer that comprises that tile, rather than rebuild the whole buffer. (I know you can do that in DirectX but I'm not sure about XNA.)

Checked out the documentation and it looks like you can in XNA! Thanks for the information this is looks like an excellent way of handling it :)

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
Locking a buffer doesn't change the contents anyway unless you use the dispose flag.

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe
Question for the gurus: I'm working on a writing little engine, using deferred shading. I've got it rendering to 3 FP16 color attachment textures + depth texture g-buffer; and I've got it rendering some crappy point lights, so that's all good.

So I'm basically new to OpenGL, and writing this from scratch, and my main question at this point is around the depth component of my g-buffer. I've got a GL_DEPTH_COMPONENT32 texture attached to depth on my g-buffer, but I don't really know how I'm 'supposed' to work with it.

1. Specifically, If I bind the texture, how do I access the value in a shader reasonably? texture2D returns 4 byte values, do I have to composite them manually into a single 32-bit value? Is there some easy GLSL way to access a 32-bit value from a FP texture that I'm missing?

2. What I really want to do is to re-use the depth stored in my g-buffer during my lighting pass, in my destination framebuffer, since I'm rendering my light volumes as little cubes. If I could enable depth testing, I could early out the pixels without running the fragment shader. I'd like to not have to do a new depth-only render for my framebuffer, since it seems wasteful since I have a depth texture already generated for my g-buffer.

I'm not sure what the 'right' way (or if there is a way) to re-use the depth texture from my g-buffer as depth in my framebuffer (or some other FBO) is. Could I use a full-screen quad and a fragment shader to load depth values from my g-buffer's depth texture into my frame buffer somehow? I also store the fragment z in my g-buffer, could I manually load the depth from that, or is the fragment depth implementation dependent?

I'll figure it out eventually, but maybe you guys could reduce the head-bashing. Appreciate it!

(Last time I wrote graphics code was in the 90s, writing a software rasterizer in the days before hardware acceleration; poo poo sure is fast now, no joke! You driver authors/hardware guys are great, I don't have to do any of the hard work anymore. :))

e: Minor question: Does noperspective not work on ATI cards or something? I'm getting texture coordinates for sampling my gbuffer by doing an (xy/w) for the verticies of my light cubes. I was having distortion due to perpsective correct interpolation to the fragments, but noperspective fixed that on my NVidia card. The distortion still happens on an ATI card someone tested on, even though it does't throw an error building the shaders with noperspective in them:

noperspective out vec2 vOutTexturePos;

Unormal fucked around with this message at 16:43 on Feb 14, 2011

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

Unormal posted:

I'm getting texture coordinates for sampling my gbuffer by doing an (xy/w) for the verticies of my light cubes.
Don't divide by W to undo perspective correction, multiply by it.

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe

OneEightHundred posted:

Don't divide by W to undo perspective correction, multiply by it.

Well, but the code below works perfectly, except for very minor distortion in vOutTexturePos caused by perspective-correct interpolation of those values to the fragment shader. On an NVidia card where vOutTexturePos interpolation respects "noperspective", it works perfect. vOutTexture pos is correctly the x,y coordinate of the g-buffer texel I want to sample for any point represented in world space by vVertex. (mMVP is my model-view-projection matrix). On an ATI card I tested on, I get very minor distortion, which is reproduced exactially on an NVidia card by removing noperspective, allowing it to interpolate vOutTexturePos perspective-correct. So it seems the ATI driver isn't respecting noperspective in this case.

code:
noperspective out vec2 vOutTexturePos;
...
vOutVertex = mMVP * vVertex;
vOutTexturePos.xy =  (vOutVertex.xy/vOutVertex.w);
vOutTexturePos.xy = (vOutTexturePos.xy*0.5)+vec2(0.5,0.5);
If I change it to multiply by W, it doesn't seem to work at all. It's certainly possible I'm not understanding something here. :)

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
Well, I only have limited experience with deferred so take it with a grain of salt, but I'm pretty sure deferred stuff is usually done with screen-space quads anyway, or even full-screen quads with scissoring, and 3D stuff is just used to apply stencil buffer values.

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe

OneEightHundred posted:

Well, I only have limited experience with deferred so take it with a grain of salt, but I'm pretty sure deferred stuff is usually done with screen-space quads anyway, or even full-screen quads with scissoring, and 3D stuff is just used to apply stencil buffer values.

Sure, billboarding the light volumes instead of using cubes or scissoring would be a workaround that should work fine on both, since a billboard would have no perspective to correct for; Cubes seems like a perfectly reasonable implementation method if ATI would just respect noperspective though :) It seems to me (newb that I am), that rendering depth-tested cubes would allow me to not have to run the fragment shader at all, instead of running the fragment shader on the light billboard, just to discard it because it's 'behind' a pixel. It seems way better to let the depth test do it.

However, if I do more extensive post-processing, like blending in transparent objects, it'd be nice to be able to re-use my depth-buffer, so using 3d for light volumes aside, if anyone has answers for the depth buffer questions, I'm all ears :)

Unormal fucked around with this message at 02:12 on Feb 15, 2011

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

Unormal posted:

rendering depth-tested cubes would allow me to not have to run the fragment shader at all, instead of running the fragment shader on the light billboard, just to discard it because it's 'behind' a pixel. It seems way better to let the depth test do it.
I believe what is normally done is using stencil volumes, i.e. render the back of a sphere or cube with stencil on increment, then render the front with decrement, and then draw with stencil test on things above 0.

This has the additional effect of only shading pixels lit by the light, as opposed to doing it in 3D, which shades any pixel where line of sight hit the light volume before hitting a solid (even though it may have exited the light volume and hit a solid behind it), and it's basically required for doing directional projections.

OneEightHundred fucked around with this message at 02:46 on Feb 15, 2011

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe

OneEightHundred posted:

I believe what is normally done is using stencil volumes, i.e. render the back of a sphere or cube with stencil on increment, then render the front with decrement, and then draw with stencil test on things above 0.

This has the additional effect of only shading pixels lit by the light, as opposed to doing it in 3D, which shades any pixel where line of sight hit the light volume before hitting a solid (even though it may have exited the light volume and hit a solid behind it), and it's basically required for doing directional projections.

Ah right, that makes sense!

Pretty new to stencil volumes, but was just reading about shadowing using stencil volumes tonight. Seems to be a pretty reasonable approach.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
I hosed up that description because I haven't done it in a while.

Disable color/depth write, set depth test to only render behind the target pixels, set stencil to increment, draw the back of the light cube/sphere, set stencil to decrement, draw the front of it, then re-enable color write and draw your lighting shader in screen space with stencil test set to only draw for stencil values above zero.

If you're good with screen-space partitioning (i.e. you can prevent lights from overlapping on screen) then you can do this with multiple lights at once.

e: Actually you can do this with stencil XOR too, since you're not dealing with intersected volumes.

OneEightHundred fucked around with this message at 02:52 on Feb 15, 2011

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe

OneEightHundred posted:

I hosed up that description because I haven't done it in a while.

Disable color/depth write, set depth test to only render behind the target pixels, set stencil to increment, draw the back of the light cube/sphere, set stencil to decrement, draw the front of it, then re-enable color write and draw your lighting shader in screen space with stencil test set to only draw for stencil values above zero.

If you're good with screen-space partitioning (i.e. you can prevent lights from overlapping on screen) then you can do this with multiple lights at once.

e: Actually you can do this with stencil XOR too, since you're not dealing with intersected volumes.

Yeah, your initial description was enough to get my light bulb to go off. I've got it rendering a couple thousand totally dynamic lights in a big outdoor scene at the moment, so I'll approach it pretty generically, rather than doing preliminary culling. Though I guess I could throw them all in an octree or something, I dunno how much CPU that would use as they move. (I don't have a good instinctual feeling for how much poo poo CPU's and GPU's can do in graphical scenes these days, but it's *alot* and the cpu seems to be the :downs: step-child of a modern GPU). I'll probably just use a much-simplified shader on far away lights to keep the fill rate issues down.

e: Though thinking about it, I need the depth for the stencil volume rendering as described here, so my initial question of if I can re-use my g-buffer depth texture in some intelligent way still stands. Or do I just need to do a fast depth-only pass on my final target framebuffer first? (That seems wasteful). [also since my volumes are simple cubes, at least for my point lights, XOR stenciling seems like it'd work fine]

Unormal fucked around with this message at 03:02 on Feb 15, 2011

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
Depth pre-pass depends on the cost of the initial draw pass. If you're doing something that's almost entirely dynamic and all the initial pass does is dump gbuffer data, then it probably won't give you much benefit. If you're using a hybrid lighting model (i.e. lightmaps, environment maps, or other poo poo that isn't dependent on lights) then the initial pass might be more expensive and worth doing a depth prepass for.

It's one of those cases where you have to try it and see how well it works out.

quote:

the cpu seems to be the :downs: step-child of a modern GPU
The entire point of trying to throw as much stuff into single passes is because of the CPU. Draw calls incur cost, and most of that cost is CPU. It's much cheaper with D3D10 and OpenGL, but D3D9 draw calls are a bit on the expensive side.


As for depth buffer reuse, a pretty common approach with deferred rendering is using the depth buffer, combined with the projection matrix and screen-space coordinates, to determine what the world-space coordinates are for a pixel without explicitly storing it.


I'm unfortunately not too knowledgeable about what you can read out of the depth buffer with shaders. I know it used to be kind of convoluted since depth components had their own format, but I think on modern hardware you can just read them as floats.

You never have to composite values though, values are always in a format you can directly use.

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe

OneEightHundred posted:

As for depth buffer reuse, a pretty common approach with deferred rendering is using the depth buffer, combined with the projection matrix and screen-space coordinates, to determine what the world-space coordinates are for a pixel without explicitly storing it.


Right, that's the plan, just curious how I'm 'supposed' to be doing the depth buffer extration. I tried for a half hour or so after I got my dorky implementation working by storing it expliclity but couldn't figure out an easy way, so I'll just go bash on it till I figure it out. :)

Thanks for your help, I'll tinker some more and see how it goes. The coversation at least shows me I'm tracking on a path that makes overall sense, just have to hammer out the details.

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe
Figured I'd post some of my results for posterity as I figure them out.

So for depth value sampling, just attaching the depth component as a texture and using the r channel of the sampler2D worked fine.

So you can calculate and display a greyscale linear depth with code like this:

code:
float n = 0.01; // camera z near
float f = 512.0; // camera z far
float v = texture2D(tFrameDepth, vTexCoord.xy).r;
float c = (2.0 * n) / (f + n - v * (f - n));
vFragColor = vec4(c,c,c,0);
I've found lots of reference to detaching the depth texture and attaching just it to a new FBO, saying that it should work, but I haven't gotten it to yet. That's probably next on the docket.

E: So it seems the depth attachment works fine, just was trying to do it to my "final" window framebuffer but that's apparently a no-no. As long as I create an intermediate framebuffer to do my blending in, I can detach the depth texture attachment from my geometry FBO and attach it to my compositing/lighting FBO for the lighting stage and it works fine.

code:
// unbind the depth texture from the current FBO
Gl.glFramebufferTexture2DEXT(Gl.GL_FRAMEBUFFER_EXT, Gl.GL_DEPTH_ATTACHMENT_EXT, Gl.GL_TEXTURE_2D, 0, 0);
// Bind the compositing buffer
LightingBuffer.Bind(); 
// unbind anything bound to the lighting FBO
Gl.glFramebufferTexture2DEXT(Gl.GL_FRAMEBUFFER_EXT, Gl.GL_DEPTH_ATTACHMENT_EXT, Gl.GL_TEXTURE_2D, 0, 0);
// bind the main buffer's depth to the lighting depth attachment, viola
Gl.glFramebufferTexture2DEXT(Gl.GL_FRAMEBUFFER_EXT, Gl.GL_DEPTH_ATTACHMENT_EXT, Gl.GL_TEXTURE_2D, MainBuffer.fboDepth, 0);

Unormal fucked around with this message at 17:54 on Feb 16, 2011

Spite
Jul 27, 2001

Small chance of that...
FBO 0 isn't actually an object - it's the system drawable. So attaching things to it may have...odd effects (though it should just throw an error).

One quick note about CPU overhead: OpenGL is pretty bad about CPU overhead as well. Using 3.1+ will mitigate this, as they removed a bunch of junk. But anything earlier (ie, anything that still has fixed function, etc) will require validation of all that legacy state which sucks rear end.

Unormal
Nov 16, 2004

Mod sass? This evening?! But the cakes aren't ready! THE CAKES!
Fun Shoe

Spite posted:

FBO 0 isn't actually an object - it's the system drawable. So attaching things to it may have...odd effects (though it should just throw an error).

One quick note about CPU overhead: OpenGL is pretty bad about CPU overhead as well. Using 3.1+ will mitigate this, as they removed a bunch of junk. But anything earlier (ie, anything that still has fixed function, etc) will require validation of all that legacy state which sucks rear end.

Am I using anything fixed function here? I figured since I'm entirely shader driven I was bypassing the 'fixed' pipeline, though I don't really know how using the framebuffer EXT functions vs the builtin framebuffer functions for 3.x would effect things. I guess I figured driver writers would just implement the EXT functions as special cases of the more general 3.0 functionality, and EXT would actually be more portable, even though the opengl pages tell you to use the more updated built in functions if you can.

The only thing that feels 'built in/fixed' to me is using the OpenGL blend mode to render each deferred light into the intermediary buffer. That feels a little more auto-magic then the rest of the rendering I do, which is much more manual-direct-writes via shaders. Though I guess there's alot of magic going on under there anyway. I can't figure out any way I could do the blending manually other than having 2 FBOs and swaping them each time I render a new light, which seems ridiculous and I can't imagine would be speedier than just using the built in blend, though I haven't actually benchmarked it.

E: Is there any kind of good comprehensive guide to mainstream video cards and their capabilities in terms of OpenGL? (i.e. how many render targets they support, how many texture units, etc?)

E2: VV Nice thanks!

Unormal fucked around with this message at 02:17 on Feb 17, 2011

Spite
Jul 27, 2001

Small chance of that...

Unormal posted:

Am I using anything fixed function here? I figured since I'm entirely shader driven I was bypassing the 'fixed' pipeline, though I don't really know how using the framebuffer EXT functions vs the builtin framebuffer functions for 3.x would effect things. I guess I figured driver writers would just implement the EXT functions as special cases of the more general 3.0 functionality, and EXT would actually be more portable, even though the opengl pages tell you to use the more updated built in functions if you can.

The only thing that feels 'built in/fixed' to me is using the OpenGL blend mode to render each deferred light into the intermediary buffer. That feels a little more auto-magic then the rest of the rendering I do, which is much more manual-direct-writes via shaders. Though I guess there's alot of magic going on under there anyway. I can't figure out any way I could do the blending manually other than having 2 FBOs and swaping them each time I render a new light, which seems ridiculous and I can't imagine would be speedier than just using the built in blend, though I haven't actually benchmarked it.

E: Is there any kind of good comprehensive guide to mainstream video cards and their capabilities in terms of OpenGL? (i.e. how many render targets they support, how many texture units, etc?)

Well, it doesn't matter if you're not using the fixed function pipeline. If it's there, it needs to be validated because the spec says so. Hooray. Of course, every modern implementation with have a bunch of dirty bits and not do validation if nothing's changed (ie, don't validate the really old image processing type state if it's not been changed). GL 3.1+ doesn't _have_ all that old crap, so it can be totally ignored.

Blending will absolutely be faster if you use the fixed hardware. Programmable blending doesn't exist on the desktop yet - and using multiple render targets will suck for perf.

For limits, etc, this isn't a bad reference:
http://developer.apple.com/graphicsimaging/opengl/capabilities/

FlyingDodo
Jan 22, 2005
Not Extinct
Is there a standard way of fixing t-junctions for a bunch of polygons? At the moment all I am doing is checking each edge of every polygon against each edge of other polygons and inserting vertices/polygons as needed, with the whole thing accelerated with a BSP tree to remove checks that are not needed, but it still requires quite a number of polygon vs polygon checks. Anything to do with t-junction fixing on google is mostly related to how to fix them as an artist in 3d editing programs, not a programming solution.

Optimus Prime Ribs
Jul 25, 2007

Is there ever a justifiable reason to use display lists in OpenGL?
I've read online that they can sometimes be a viable option over VBOs or shaders, but I don't know how accurate or dated that is.

Right now I have VBOs and shaders implemented, and I'm just wondering if I should even bother with display lists.

haveblue
Aug 15, 2005



Toilet Rascal

Optimus Prime Ribs posted:

Is there ever a justifiable reason to use display lists in OpenGL?
I've read online that they can sometimes be a viable option over VBOs or shaders, but I don't know how accurate or dated that is.

Right now I have VBOs and shaders implemented, and I'm just wondering if I should even bother with display lists.

Don't bother, they are a very early and obsolete system.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
Current best practice is VBOs, glDrawRangeElements for everything, and use SSE uncached writes to VBOs if you're mapping (glBufferSubData does uncached writes).

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

Current best practice is VBOs, glDrawRangeElements for everything, and use SSE uncached writes to VBOs if you're mapping (glBufferSubData does uncached writes).

As a quick note, this will depend on the driver and OS (BufferSubData, I mean). Remember to keep an eye on your alignments with SSE. But yeah, you totally want to use uncached writes, especially for a big data set.

If you do use MapBuffer for your VBO, remember to use FlushMappedBufferRange/MapBufferRange.
And move to generic vertex attributes instead of using builtins if you can.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

Optimus Prime Ribs posted:

Is there ever a justifiable reason to use display lists in OpenGL?
I've read online that they can sometimes be a viable option over VBOs or shaders, but I don't know how accurate or dated that is.

Right now I have VBOs and shaders implemented, and I'm just wondering if I should even bother with display lists.

ARB is trying their darndest to make them disappear, despite their potential convenience in reducing API overhead (mostly because as they currently exist, they put a lot of constraints on future extensions).

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

Hubis posted:

ARB is trying their darndest to make them disappear, despite their potential convenience in reducing API overhead (mostly because as they currently exist, they put a lot of constraints on future extensions).
Reducing API overhead wasn't a terrific reason to begin with because of user-mode drivers making API overhead very low to begin with, but it's even worse when state objects are becoming increasingly common and do a better job of basically the same thing.

UraniumAnchor
May 21, 2006

Not a walrus.
Right now the wiki on opengl suggests using display lists to improve performance. :what:

Come on, vertex buffers aren't THAT hard.

Paniolo
Oct 9, 2007

Heads will roll.
Funny thing about display lists is how similar they are in concept to the Command Buffers introduced in DX11. OpenGL's now in the weird position of needing to phase out display lists, while phasing in something that's almost identical except threadsafe.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
The point of command buffers is that they can be stuffed with commands from any thread. Display lists don't really work that way because they're filled based on API calls that operate on the state machine, which is bound to a single thread.

Personally I think they're kind of pointless because inherent overhead in making API calls when you have user mode drivers is practically nothing.

The King of Swag
Nov 10, 2005

To escape the closure,
is to become the God of Swag.
Edit: I've managed to solve the problem after realizing that GLEW utilizes BOOL, which for whatever reason, is entirely undefined in windows.h (even though it should be). No matter, I'm programming in ObjC which also defines a compatible BOOL type, so I just included that and it compiled with no problem.

I have a GLEW question for you guys. As background, I'm using MinGW-w64 with whatever the newest version of GCC is.

I've successfully built and used GLEW both as a static library and by including it directly into my project, but now I need to use some extensions listed in wglew.h (the wgl extensions), but I get a tremendous stream of errors when I do so.

code:
GL\wglew.h|113|error: expected declaration specifiers or '...' before '*' token|
GL\wglew.h|113|warning: type defaults to 'int' in declaration of 'BOOL'|
GL\wglew.h|113|error: 'BOOL' declared as function returning a function|
GL\wglew.h|140|error: 'PFNWGLDELETEASSOCIATEDCONTEXTAMDPROC' declared as function returning a function|
GL\wglew.h|145|error: 'PFNWGLMAKEASSOCIATEDCONTEXTCURRENTAMDPROC' declared as function returning a function|

... (it has a lot of these function return as function errors, so I cut out the errors in the center) ...

GL\wglew.h|1047|error: 'PFNWGLWAITFORMSCOMLPROC' declared as function returning a function|
GL\wglew.h|1048|error: 'PFNWGLWAITFORSBCOMLPROC' declared as function returning a function|
GL\wglew.h|1074|error: expected '=', ',', ';', 'asm' or '__attribute__' before '__wglewSetStereoEmitterState3DL'|
||=== Build finished: 75 errors, 1 warnings ===|
I've spent hours searching for a solution on Google, but all I've found is a problem related to building the source (need to change __int64 in "typedef __int64 ptrdiff_t" to long long), but I've done that as well. GLEW (as defined in glew.h) is not my problem, I can use it just fine in my program, it's the wglew.h header that is giving me a nightmare of a problem.

The King of Swag fucked around with this message at 09:50 on Mar 24, 2011

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

The point of command buffers is that they can be stuffed with commands from any thread. Display lists don't really work that way because they're filled based on API calls that operate on the state machine, which is bound to a single thread.

Personally I think they're kind of pointless because inherent overhead in making API calls when you have user mode drivers is practically nothing.

I agree that they are pointless, and pretty flawed from a design point of view. You can't optimize anything if the display list still obeys the state machine.

But GL calls do have significant CPU overhead, especially on OS X. There's a ton of validation, conversion and other stuff that needs to be done because it's demanded by the spec. D3D10 is way better than GL is, at this point.

speng31b
May 8, 2010

Okay, so I have a few (hopefully not-too-dumb) questions about OpenGL and VBO usage.

Currently I'm working on a game that basically renders the world similar to minecraft, i.e., the world is made up of a lot of smaller blocks. I've been keeping all the world geometry in a giant VBO, but I've also read that VBOs are really optimized for holding about 2-4MB of data (or possibly a max of 8MB?).

My question is, would it be better to divide the world up into smaller sections that each have their own VBO in order to get my individual VBO size smaller? Should I look into making my vertex data smaller by using something like triangle fans with primitive restart, or what else would be a good way to maximize vertex sharing/reduce the size?

speng31b fucked around with this message at 21:33 on Mar 27, 2011

Spite
Jul 27, 2001

Small chance of that...

octoroon posted:

Okay, so I have a few (hopefully not-too-dumb) questions about OpenGL and VBO usage.

Currently I'm working on a game that basically renders the world similar to minecraft, i.e., the world is made up of a lot of smaller blocks. I've been keeping all the world geometry in a giant VBO, but I've also read that VBOs are really optimized for holding about 2-4MB of data (or possibly a max of 8MB?).

My question is, would it be better to divide the world up into smaller sections that each have their own VBO in order to get my individual VBO size smaller? Should I look into making my vertex data smaller by using something like triangle fans with primitive restart, or what else would be a good way to maximize vertex sharing/reduce the size?

I'd say it's more a caching and locality issue than a pure VBO size issue. There are paging and VRAM constraints to think of, but each vertex is just a number in VRAM to the GPU. Sourcing that number efficiently can be affected by layout and cache, however. Consider that if your entire world is in one VBO pieces that are next to each other spatially may not be next to each other in memory. It's similar to why you'd store an image in Z-order or Hilbert-order instead of just straight linear.

Plus, you do NOT want to be drawing things you can't see - so you'll either be sending multiple DrawRangeElements calls, or you can decompose your world into smaller chunks and bind and draw them individually. I'd prefer this myself, as you can then do frustum checks and just skip the draws if they aren't visible.

And as a last caveat - how is your performance thus far? If it's not bad, you may not want to overcomplicate your problem just yet.

speng31b
May 8, 2010

Spite posted:

I'd say it's more a caching and locality issue than a pure VBO size issue. There are paging and VRAM constraints to think of, but each vertex is just a number in VRAM to the GPU. Sourcing that number efficiently can be affected by layout and cache, however. Consider that if your entire world is in one VBO pieces that are next to each other spatially may not be next to each other in memory. It's similar to why you'd store an image in Z-order or Hilbert-order instead of just straight linear.

Plus, you do NOT want to be drawing things you can't see - so you'll either be sending multiple DrawRangeElements calls, or you can decompose your world into smaller chunks and bind and draw them individually. I'd prefer this myself, as you can then do frustum checks and just skip the draws if they aren't visible.

I've been thinking about the best way to do this, and I think I've settled on the latter method. It's just a matter of deciding how many pieces to divide the world up into for checking, and how to organize these pieces into multiple or single VBOs.

The real problem I'm having is memory usage. Representing the world as a bunch of blocks means that I have a huge amount of vertices in VRAM, whether I divide them into multiple VBOs or keep them in a few big ones. I can't think of a good way to handle this.

Spite posted:

And as a last caveat - how is your performance thus far? If it's not bad, you may not want to overcomplicate your problem just yet.

My performance is pretty good thus far, the memory usage is just a little bit on the high side. I just started on this project very recently, and I have some really basic performance necessity things like frustum culling that still need to be hooked up. The thing holding me back right now is deciding how to divide up my world and store it in VRAM.

-----

One more question I had, though: if I were to represent each cube as 8 vertices in a VBO, how would I deal with texture coordinates? Each vertex is shared by 3 faces of the cube, so it seems like there's no good way to represent those shared vertex texture coordinates without multitexturing or representing the cubes as 6 faces for a total of 24 vertices. I'm sure it's something really simple, I'm just drawing a blank.

speng31b fucked around with this message at 03:44 on Mar 28, 2011

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

Spite posted:

But GL calls do have significant CPU overhead, especially on OS X. There's a ton of validation, conversion and other stuff that needs to be done because it's demanded by the spec. D3D10 is way better than GL is, at this point.
Last time I saw the numbers, D3D10 was ahead, but only marginally (compared to the huge D3D9 difference). I'm not sure I recall anything where OpenGL had to do more validation and such than D3D10 did. This is especially the case with repeated draw calls, since if the state hasn't actually changed, the drivers tend to be intelligent enough to skip the unnecessary verification.

They can get a bit dumb in the sense that almost any amount of state change tends to have the same cost.

quote:

One more question I had, though: if I were to represent each cube as 8 vertices in a VBO, how would I deal with texture coordinates? Each vertex is shared by 3 faces of the cube, so it seems like there's no good way to represent those shared vertex texture coordinates without multitexturing or representing the cubes as 6 faces for a total of 24 vertices. I'm sure it's something really simple, I'm just drawing a blank.
You need multiple vertexes with the same point and different texture coordinates.

OneEightHundred fucked around with this message at 14:30 on Mar 28, 2011

speng31b
May 8, 2010

OneEightHundred posted:

You need multiple vertexes with the same point and different texture coordinates.

Guess I'll need to think of another way to get my RAM usage down, then. I guess I could stream vertices instead of having the whole world in memory. Thanks.

Goreld
May 8, 2002

"Identity Crisis" MurdererWild Guess Bizarro #1Bizarro"Me am first one I suspect!"

FlyingDodo posted:

Is there a standard way of fixing t-junctions for a bunch of polygons? At the moment all I am doing is checking each edge of every polygon against each edge of other polygons and inserting vertices/polygons as needed, with the whole thing accelerated with a BSP tree to remove checks that are not needed, but it still requires quite a number of polygon vs polygon checks. Anything to do with t-junction fixing on google is mostly related to how to fix them as an artist in 3d editing programs, not a programming solution.

There's no perfect solution.*

If your mesh is fully connected with say, half-edges, then you could detect open regions by looking for half-edges without opposites. Then you could perform zippering, and devise some way to split edges for the t-junctions without compromising the topology. Since you'd only be working on the detected boundary edges, this would work pretty fast, as you'd be culling out the majority of the mesh.

If you're dealing with a polygon soup, then you're going to deal with a whole bunch of robustness problems, and in that case, have fun! (I've spent plenty of time working on robust CSG operations, and my only advice is, well, you'll find out pretty fast whether or not you hate geometric modeling when dealing with problems like these)


*unless you're working with exact numerical computation, which is opening a huge can of worms

Goreld fucked around with this message at 21:23 on Mar 28, 2011

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
So, one thing I haven't been keeping up on: How important is throughput now, if at all? That is, 2003-ish, static buffers were becoming hot poo poo to reduce data throughput to the card compared to the previously-favored approach of essentially bulk-copying data from system memory to the GPU every frame.

However, it's always had drawbacks in ease of use and draw call reduction: Instancing can be used to handle duplicated models, but it can't handle models at different LOD. Accessing the data for CPU-side tasks (i.e. decals) requires storing it twice. Some things are practical to render from static data in some scenarios, but not others (i.e. skeletal animation can be easily done with vertex shaders, but large numbers of morph targets run out of attribute streams on older hardware). Some things could in theory be done statically, but would require stupid amounts of storage (i.e. undergrowth).

Is the performance hit of using only dynamic buffers and just bulk-copying data even noticeable any more, or is the bottleneck pretty much all draw calls and shaders now?

Adbot
ADBOT LOVES YOU

Paniolo
Oct 9, 2007

Heads will roll.

OneEightHundred posted:

Is the performance hit of using only dynamic buffers and just bulk-copying data even noticeable any more, or is the bottleneck pretty much all draw calls and shaders now?

I'm no expert but GPU processing power and storage are always increasing, while bus speeds are not. Doing as much work as possible on the GPU seems to be the way to go on the PC, but most graphics engines are going to be optimized for console hardware anyway which has different performance characteristics.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply