Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Animal
Apr 8, 2003

Alereon posted:

Do make sure you have it running in DX11 mode, accidentally falling back to an older DX mode will significantly impair performance. Another Metro 2033 benchmarking pro-tip: There's significant run-to-run variability so you need to use multiple runs and average them. Also make sure you have your fan speed turned high enough to get meaningful results, if the videocard breaks 69C it throttles so you have to throw out the results for that run.

Yes its on DX11, everything is set just like Anandtech. Disabling DoF did the trick.

Thanks for the fan trick, will monitor that.

Animal fucked around with this message at 20:18 on Oct 17, 2012

Adbot
ADBOT LOVES YOU

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

Metro 2033 does so much right in terms of their engine and scalability that their crap implementation of ADoF is kind of baffling. The underlying engine similarities to GSC's (rest in peace :qq:) X-Ray rendering engine are profound and yet S.T.A.L.K.E.R. CoP, even with extraordinary texture mods and added visual goodies from the fanbase, runs like greased lightning on a 670/680. View distance: forever, all the bells and whistles, and even some supersampling forced through the CC, runs like a champ.

Metro, you're in these super enclosed environments most of the time and their turn-everything-up has a ton of root similarities to how the shaders and particles and poo poo work in CoP yet the ADoF is a totally disproportionate FPS hog.

Animal
Apr 8, 2003

Agreed posted:

Metro 2033 does so much right in terms of their engine and scalability that their crap implementation of ADoF is kind of baffling. The underlying engine similarities to GSC's (rest in peace :qq:) X-Ray rendering engine are profound and yet S.T.A.L.K.E.R. CoP, even with extraordinary texture mods and added visual goodies from the fanbase, runs like greased lightning on a 670/680. View distance: forever, all the bells and whistles, and even some supersampling forced through the CC, runs like a champ.

Metro, you're in these super enclosed environments most of the time and their turn-everything-up has a ton of root similarities to how the shaders and particles and poo poo work in CoP yet the ADoF is a totally disproportionate FPS hog.

You must be right. I could not believe DoF was the cause, why should it have such a big impact.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.

Alereon posted:

I'm pretty sure VRAM and system RAM do use the same address space, that's why 32-bit systems can only address 4GB-VRAM-all other hardware reservations worth of system RAM.

I was hesitant to post because I'm not sure either, but I think that the graphics card maps only a portion of its RAM to the system's address space. Recall that older AGP cards had configurable Apertures to define the amount that data chunks could flow (which was subject to a lot of enthusiast nonsense, since it turned out that aperture sizes didn't really affect performance). Googling around suggests that's still the case. For example, this TechNet blog describing the addressing limit verifies that, on a 4GB RAM 32-bit system with 2.2 GB accessible, his two 1GB GeForce 280s using 256 MB of address space each. Much of the rest was apparently either reserved for dynamic mapping or over-conservative reservation.

Other devices have changed the paradigm somewhat. I've watched the RAM-as-VRAM allocations on my laptop's HD 3000, and it only takes off a very small amount for framebuffer from the address space. Other allocations are done dynamically by the driver and show up the way other system services do.

Nevertheless, the current Intel (and AMD) ISA maintains separate address spaces for VRAM and system RAM, as far as the GPU and CPU are concerned. Changing this ISA is something AMD is betting the farm on.

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

Animal posted:

You must be right. I could not believe DoF was the cause, why should it have such a big impact.

It could be similar to what's wrong with their PhysX - people don't look at Metro as a very good example of PhysX, because you basically get ~twice or three times the sparks when you shoot a thing, or particles, and they immediately go away... Icicles are a little more interesting... Grenades have a bigger cloud... And the fog's kinda different.

The big deal is 1. double physX precision, and 2. the fog. Volumetric fog at that level of calculation difficulty is a HUGE performance hit. A single card pretty much eats poo poo trying to do PhysX and rendering at the same time with all details maxed (even with ADoF off). I personally think it's an incredibly good looking and subtle touch, very different from the CPU physics version which is extremely sparse by comparison, but if they had an option for "whole lotta fog" or "maybe somewhere in between" it'd be a good idea for non-coprocessor users to select the latter :v:

But then they wouldn't be pulling out all the stops, I guess, and that's part of the selling point of the game. The engine does scale extremely well, though I hope they give users more control over the specific graphical options instead of just showing off a giant :smuggo: list of what they've got going on at each setting. Nobody gripes about Crysis having the option to be ludicrously tough on your graphics card considering it's 5 years old now.

movax
Aug 30, 2008

Alereon posted:

I'm pretty sure VRAM and system RAM do use the same address space, that's why 32-bit systems can only address 4GB-VRAM-all other hardware reservations worth of system RAM. This isn't relevant for the case of a 32-bit app running on a 64-bit system because Skyrim doesn't care about the VRAM, only the video driver does, and that's a 64-bit application.

VRAM is accessible via a PCI BAR most of the time. I can only assume that as VRAM sizes grew while 64-bit adoption was slower, that GPU manufactuers added some kind of psuedo-VM to address the full VRAM while only requesting a 256 or 512M BAR. So I guess you'd have 4 512MB pages on a 2GB card. On modern NV, I think BAR0 is command/control, BAR1 is VRAM; I don't know what the other BARs do off-hand.

The video driver is furiously executing memory read/memory write commands that get routed over PCIe to communicate with the GPU; this is why a GPU with an autonomous CPU on-board (like NV + ARM) gets interesting because it has the possibility of breaking tasks down into the video driver sending a high-level command over PCIe, and then the CPU on-board takes care of programming registers.

Most PCI MMIO space gets mapped below 4G by the BIOS to retain compatibility with 32-bit OSes; it'll remap any remaining DRAM above the 4G limit where 64-bit OSes can get to it. MTRRs will be set accordingly (on Linux, cat /proc/mtrr to see). 32-bit OS gets whatever RAM fits below the 4G barrier + PCI MMIO, 64-bit OSes get that + whatever memory is remapped at 0x100000000 and above.

It is totally possible on buggier BIOSes to brick your system as you add so much PCI devices that you literally cannot place any user RAM below 4G, and the MRC eventually ends up with TOLUD (top of lower usable DRAM) at 0.

e: discrete GPUs only consume address space whereas an IGP that needs system memory will actually steal physical memory for its needs

e2: Go to Device Manager, and the properties for your GPU. Go to the "Resources" tab and you'll see the memory/IO resources there. It should have some small amount of IO address space for legacy reasons, and the rest should all be BARs. At work my 256MB Radeon HD5700 for instance has:
0x(00000000)E0000000-0x(00000000)EFFFFFFF - 256MB BAR
0x(00000000)F7DE0000-0x(00000000)F7DFFFFF - Control registers?
0xDC00-0xDCFF (+ various) - IO Memory (completely discrete from memory space)

I have all those extra 0s because I'm on 64-bit Win 7. Looks like the control BAR got squeezed into the little region available near the top of lower memory where a lot of tiny BARs end up on Intel platforms.

Linux, just do lspci -v.

movax fucked around with this message at 20:59 on Oct 17, 2012

Jan
Feb 27, 2008

The disruptive powers of excessive national fecundity may have played a greater part in bursting the bonds of convention than either the power of ideas or the errors of autocracy.

Factory Factory posted:

I was hesitant to post because I'm not sure either, but I think that the graphics card maps only a portion of its RAM to the system's address space. Recall that older AGP cards had configurable Apertures to define the amount that data chunks could flow (which was subject to a lot of enthusiast nonsense, since it turned out that aperture sizes didn't really affect performance). Googling around suggests that's still the case. For example, this TechNet blog describing the addressing limit verifies that, on a 4GB RAM 32-bit system with 2.2 GB accessible, his two 1GB GeForce 280s using 256 MB of address space each. Much of the rest was apparently either reserved for dynamic mapping or over-conservative reservation.

Other devices have changed the paradigm somewhat. I've watched the RAM-as-VRAM allocations on my laptop's HD 3000, and it only takes off a very small amount for framebuffer from the address space. Other allocations are done dynamically by the driver and show up the way other system services do.

Nevertheless, the current Intel (and AMD) ISA maintains separate address spaces for VRAM and system RAM, as far as the GPU and CPU are concerned. Changing this ISA is something AMD is betting the farm on.

Since the subject piqued my curiosity, I did some extra research, and that does sort of match what I've found. What I'm unsure on is that while bus I/O (AGP, PCI-E or otherwise) does seem to require some shared memory (for memory mapped I/O, at least), there shouldn't be any correlation between the amount of VRAM a GPU has and the amount that mapped space will take up. All it does is create a buffer through which the CPU and GPU can communicate, and there's no point making this buffer larger than bus bandwidth.

It's not really clear to me how much of this memory responsibility belongs to the program, the GPU driver or the OS... I hadn't realised how much simpler unified memory (on 360) is. I will definitely have to read that article.

movax
Aug 30, 2008

Jan posted:

Since the subject piqued my curiosity, I did some extra research, and that does sort of match what I've found. What I'm unsure on is that while bus I/O (AGP, PCI-E or otherwise) does seem to require some shared memory (for memory mapped I/O, at least), there shouldn't be any correlation between the amount of VRAM a GPU has and the amount that mapped space will take up. All it does is create a buffer through which the CPU and GPU can communicate, and there's no point making this buffer larger than bus bandwidth.

It's not really clear to me how much of this memory responsibility belongs to the program, the GPU driver or the OS... I hadn't realised how much simpler unified memory (on 360) is. I will definitely have to read that article.

The problem is on a 32-bit OS with only 4GB of addressable memory, you very quickly run out of address space for physical DRAM when you have to devote gigs of memory to memory-mapped I/O. Theoretically, you could lose up to 256MB of addressable memory just for the entirety of PCIe config space, if you wanted to support all of it.

e: theoretical max:
255 buses * 32 devices * 8 functions * 4KB config space = 256MB

movax fucked around with this message at 21:05 on Oct 17, 2012

KillHour
Oct 28, 2007


I don't know why they don't just program games to be 64 bit nowadays. Is anyone really still using a 32 bit OS for gaming?

Killer robot
Sep 6, 2010

I was having the most wonderful dream. I think you were in it!
Pillbug

KillHour posted:

I don't know why they don't just program games to be 64 bit nowadays. Is anyone really still using a 32 bit OS for gaming?

I don't know, how many vocal gamers are left bitterly clinging to XP because they're certain how badly Windows jumped the shark in reliability and performance after it (from reading forum discussions/jokes rather than by using anything newer)?

It also doesn't help that a lot of computers still have been sold with 32-bit Windows for no good reason the last several years, and I'm sure a fair number are owned by people that buy games. Even past that, marketing never wants to put minimum requirements they feel will leave anyone out, even if it means giving processor/video requirements that lead to customers upset with the slideshow they're viewing.

Rap Game Goku
Apr 2, 2008

Word to your moms, I came to drop spirit bombs


KillHour posted:

I don't know why they don't just program games to be 64 bit nowadays. Is anyone really still using a 32 bit OS for gaming?

69.18% (including Macs) are using a 64-bit OS according to the Steam Hardware Survey.
http://store.steampowered.com/hwsurvey

So, yeah they probably should.

movax
Aug 30, 2008

KillHour posted:

I don't know why they don't just program games to be 64 bit nowadays. Is anyone really still using a 32 bit OS for gaming?

I could see a few reasons:

- Engine / toolkit might not support it. Developers like CryTek might not care about this so much, but other houses that license an engine might be limited by the version/release of the engine they are using

- Going to a 64-bit release means that no 32-bit system can run that code. Deploying both versions doubles your QA load, I would imagine. Not to mention I'd imagine some pointer hell would be involved as you figure out what's broken on each version.

My first point might be moot though, as obviously some engines are capable of targeting PS2/PS3/Wii/X360/etc with the same codebase. A gamedev could probably comment better than I can.

The biggest benefit of 64-bit (IMO) for "most people" is that a 64-bit OS essentially removes any limitation on addressable memory. Some chipsets / platforms "only" support 40/48 bits or so, which is still a stupid large amount of memory. Once everyone's on 64-bit, you could have PCIe devices exposing stupid large BARs that encompass their entire onboard memory without any paging. Who cares about burning 8GB of memory space when you're not going to run a 32-bit OS and have exabytes of memory space?

movax fucked around with this message at 21:33 on Oct 17, 2012

madsushi
Apr 19, 2009

Baller.
#essereFerrari

Athenry posted:

69.18% (including Macs) are using a 64-bit OS according to the Steam Hardware Survey.
http://store.steampowered.com/hwsurvey

So, yeah they probably should.

32-bit apps can run on both, 64-bit apps can't, so they probably shouldn't pick a platform that limits who can play their games.

Let's also remember that your average PC probably has between 1-2GB of RAM, and 64-bit would make no difference for them whatsoever.

MrBadidea
Apr 1, 2009

movax posted:

I could see a few reasons:

- Engine / toolkit might not support it. Developers like CryTek might not care about this so much, but other houses that license an engine might be limited by the version/release of the engine they are using

- Going to a 64-bit release means that no 32-bit system can run that code. Deploying both versions doubles your QA load, I would imagine. Not to mention I'd imagine some pointer hell would be involved as you figure out what's broken on each version.

My first point might be moot though, as obviously some engines are capable of targeting PS2/PS3/Wii/X360/etc with the same codebase. A gamedev could probably comment better than I can.

The biggest benefit of 64-bit (IMO) for "most people" is that a 64-bit OS essentially removes any limitation on addressable memory. Some chipsets / platforms "only" support 40/48 bits or so, which is still a stupid large amount of memory. Once everyone's on 64-bit, you could have PCIe devices exposing stupid large BARs that encompass their entire onboard memory without any paging. Who cares about burning 8GB of memory space when you're not going to run a 32-bit OS and have exabytes of memory space?

Pretty much the most important reason, bar none; Middleware/Thirdparty stuff is still only really focusing on 32-bit platforms, and nobody is building a complete game engine from scratch without said middleware; it'd be pretty stupid to try, and make the development costs astronomical.

Combined with the fact that we still don't absolutely need 64-Bit for games to make use of the extended address space, mixed with things like UE3s content streaming getting some real polish over the years, developers becoming more familiar with it and the like; selective and intelligent preloading/unloading of content on the fly is taking out a lot of the need for huge amounts of memory. Also does a pretty good number on load times too.

That isn't to say there aren't benefits to 64-Bit other than the potential extra memory space; they've just not been important enough to performance to be a consideration yet.

Jan
Feb 27, 2008

The disruptive powers of excessive national fecundity may have played a greater part in bursting the bonds of convention than either the power of ideas or the errors of autocracy.

KillHour posted:

I don't know why they don't just program games to be 64 bit nowadays. Is anyone really still using a 32 bit OS for gaming?

I can answer this one from a game developer standpoint.

The #1 reason, as movax mentions, really is the effort porting an existing engine to 64 bit. If you're licensing an engine that only supports 32 bit, it'd be a tremendous waste of effort to do that conversion yourself.

And for those with in-house engines, sure, fundamentally all it means is that all the pointers in your engine will now be 64 bits instead of 32... But depending on how the engine was written, there are so many places where this can go wrong. The most trivial example that comes to mind is the variable type for a pointer -- it's not uncommon to see a programmer take a pointer and cast it to int. Bam, you just lost half your pointer on a 64 bit system.

Most pointer-based fuckery and low-level optimisations probably won't convert over very well either.

So, as a corollary, an engine that foresaw this eventuality and imposed specific types for pointers from the get go will have a much easier time doing the switch. But since current consoles all use 32 bit, there was really not much of an incentive for cross-platform developers to make said switch. However, that choice will be forced on developers soon, both first party and middleware, as Durango and PS4 have more than 4GB and obviously need 64 bit OSes. This likely will be the moment at which most PC games will make the switch and stop supporting a dwindling 32-bit userbase.

Jan
Feb 27, 2008

The disruptive powers of excessive national fecundity may have played a greater part in bursting the bonds of convention than either the power of ideas or the errors of autocracy.

So, taking another pass at trying to figure out my terrible Skyrim Crossfire performance. Out of curiosity, I decided to have a look at what PCIe speed my GPUs are currently using... According to GPU-Z, the primary one is PCIe 3.0 16x, and the secondary is PCIe 3.0 4x. Shouldn't they both instead be set to 8x?

From a cursory search, my motherboard (Gigabyte P55 USB3) does some PCIe lane sharing with the main 16x slot in order to support USB 3.0 and SATA3's 6GB speed. What this apparently means (according to the engrish from the manual, as shown here) is that in a single GPU context, the 16x slot will be slowed down to 8x if USB 3.0 and SATA3 are both enabled. And indeed, turning USB 3.0 turbo on reduces the 16x to 8x. But the secondary slot stays at 4x.

Now, I really couldn't give a poo poo if USB is running at super duper turbo speeds, as long as I'm getting the most from my GPUs and SSDs. But I can't find an option or a combination of options that raise the secondary slot to 8x. (And yes, I plugged in the Crossfire ribbon connector between the GPUs.) Evidently, from the benchmarks quoted above, there isn't a huge difference between 16x/8x/4x with PCIe 3.0. Still, in the default Crossfire context (AFR), it sounds to me like the primary GPU would end up being hobbled down to 4x since both GPUs need to sync the same data...

Is this something I can fix? Or, rather, is this something I want to fix?

Edit: Searching for Gigabyte P55 Crossfire instead of USB3 seems to indicate that the secondary slot is capped at 4x, in spite of that message from Gigabyte in my link above, that seems to hint at Crossfire making it run 8x/8x. Oh well. That's what I get for going for Crossfire on a budget motherboard. :downs:

Jan fucked around with this message at 00:04 on Oct 18, 2012

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

PCI-e 3.0 8x is equivalent to PCI-e 2.0 16x. Look specifically at this page of the testing and note that the top-of-the-line cards are the ones that show issues with lower than PCI-e 2.0 16x this generation. Since that is equivalent in bandwidth to PCI-e 3.0 8x, and you're using cards that are significantly slower than the top of the line of this generation, I would venture that if you've got a problem it definitely isn't bandwidth related.

To put it in context, performing as well as a GTX 580 (when overclocked quite a bit) is the 7850's awesome value proposition, right? Well, last generation, top of the line cards were pretty much the same on PCI-e 2.0 16x or PCI-e 2.0 8x. It was considered a-okay to run an 8x/8x configuration for SLI or Crossfire on GTX 580s/Radeon HD 6970s.

Even now, the very top of the line cards suffer a maximum of about 18% going from their stated bandwidth capability of PCI-e 3.0 16x down to PCI-e 2.0 8x. That bears out across the testing from my glance over it as well. And it only happens in circumstances that really tax a specific kind of relationship between resources and processing.

Your dual 7850 setup is almost certainly not suffering from a bandwidth issue at PCI-e 3.0 8x/8x, if that's what it's running at. I'd start looking at other possible causes.



A broader lesson to take away from it is that "bandwidth limited" really has to be contextualized in order for it to make sense. It's crazy to me that the highest averaged performance delta, shown with the GTX 680 when you look at its performance between PCI-e 3.0 16x vs. PCI-e 1.1 4x, is only 33%, and it manifests more at lower resolutions than higher ones with just 20% lower performance in 1600p.

That's a HUGE drop in bandwidth, in a generation that we have characteristically described as bandwidth dependent, yet it'd still be usable to the extent that it just probably wouldn't be worth upgrading from a top-of-the-line card from the previous generation.

Makes me feel a hell of a lot better about running my GTX 680 in PCI-e 2.0 8x mode so that I can run the GTX 580 for PhysX as well (also in 8x). Difference in best-case vs. compromise is practically negligible, and I don't have to worry about PCH fuckery.

Agreed fucked around with this message at 01:05 on Oct 18, 2012

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
It's not, they're running PCIe 2.0 x16 and PCIe 2.0 x4. Jan's got CF going over PCH.

There's no way to fix that other than replacing the motherboard, by the way.

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

Factory Factory posted:

It's not, they're running PCIe 2.0 x16 and PCIe 2.0 x4. Jan's got CF going over PCH.

There's no way to fix that other than replacing the motherboard, by the way.

What am I missing, he just said he's running at PCI-e 3.0 16x and 4x, is he mistaken or have I misread?

movax
Aug 30, 2008

Agreed posted:

What am I missing, he just said he's running at PCI-e 3.0 16x and 4x, is he mistaken or have I misread?

Typo I assume, a P55 mobo wouldn't have PCIe 3.0 support.

A card strapped to a x4 link off the PCH also has to travel DMI to the CPU.

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice

Factory Factory posted:

I was hesitant to post because I'm not sure either, but I think that the graphics card maps only a portion of its RAM to the system's address space. Recall that older AGP cards had configurable Apertures to define the amount that data chunks could flow (which was subject to a lot of enthusiast nonsense, since it turned out that aperture sizes didn't really affect performance). Googling around suggests that's still the case. For example, this TechNet blog describing the addressing limit verifies that, on a 4GB RAM 32-bit system with 2.2 GB accessible, his two 1GB GeForce 280s using 256 MB of address space each. Much of the rest was apparently either reserved for dynamic mapping or over-conservative reservation.
I could be wrong, but I think that Technet blog contains a typo. He says 2.2GB remaining, but later in the text he refers to the "over 2GB hole", which indicates to me he meant 1.8GB/4.0GB remaining. This matches up with 4GB-2x1GB-some more hardware reservations. Unless I'm misreading or misunderstanding, his screenshots also show a ~2.2GB hole. I'm not going to pretend I was able to understand this, but is there maybe a difference between the "right" way to do things and the way it gets done in practice? Or is that what you're saying? I've never seen a system with 32-bit OS and a discrete videocard have more available RAM than what would be expected from 4GB-VRAM-other hardware reservations, and it seems like if it was possible to do without a hell of a lot of development they would have for the competitive advantage.

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

movax posted:

Typo I assume, a P55 mobo wouldn't have PCIe 3.0 support.

A card strapped to a x4 link off the PCH also has to travel DMI to the CPU.

P55, yep, that's me misreading the situation based on a few key errors in the meat of the post. Aligning my answer with Factory Factory's, now. New motherboard time. Shouldn't be too hard to find an inexpensive price:performance board from last gen, should it? Considering we're talking about unleashing two cards' performance across a broad range of scenarios, he might even think forward and upgrade to a price:performance Ivy Bridge setup, both to guarantee performance for his current setup, and to allow for what will be the most meaningful upgrades during the likely life of the system (GPU).

As far as the rest, still impressed that properly done PCI-e 2.0 is holding its own just fine despite our general concern about bandwidth limitations. I'm actually better off, it appears, by running on the Asus Sabertooth P67 with an in-spec 8x/8x split than I would be if I were running 16x/4x, or trying to mess around with PCH to fudge the bandwidth.

Can't last, I understand that, this is REALLY looking like the last generation where PCI-e 2.0 8x is going to offer performance in the vast majority of scenarios within a few percentage points of PCI-e 3.0 16x. But it does make much more comfortable waiting 'til Haswell for my next major system upgrade. Probably just need to just enable the Marvell SATA controller for my optical drive to free up one SATA slot in case I need to expand storage, otherwise it looks like I should be fine until Haswell. Which is pretty exciting in itself :dance:

Agreed fucked around with this message at 02:18 on Oct 18, 2012

Chuu
Sep 11, 2004

Grimey Drawer
One more question about 4GB cards, if you have two 2GB cards in SLI, are the textures duplicated or is it essentially the same memory addressing as a single 4GB card?

Jan
Feb 27, 2008

The disruptive powers of excessive national fecundity may have played a greater part in bursting the bonds of convention than either the power of ideas or the errors of autocracy.

movax posted:

Typo I assume, a P55 mobo wouldn't have PCIe 3.0 support.

A card strapped to a x4 link off the PCH also has to travel DMI to the CPU.

Yeah, that's totally my bad, sorry! I was misreading the GPU-Z display:



The primary (first one) GPU says it's 1.1, but running the render test shows it at 2.0. So: 16x 2.0, and... 4x 1.1. Ick. No wonder my Crossfire scaling is so bad, and that the whole upgrade from a 5870 felt rather underwhelming.

Agreed posted:

Aligning my answer with Factory Factory's, now. New motherboard time. Shouldn't be too hard to find an inexpensive price:performance board from last gen, should it?

I'd gotten this P55 from the top picks back when LGA 1156 was still the reasonable choice (over LGA 1366). I suppose it's still perfectly adequate as a single GPU motherboard, but that ship sort of sailed when I sprung for a 30" monitor. In retrospect, I should've gotten a good single GPU Kepler card, instead of assuming that the previous idiom of running 6850s in Crossfire for higher resolutions would extend itself to 7850s.

I guess if I'm going to replace the motherboard, I could consider upgrading the CPU as well. But I'll move this imminent talk of parts picking to the upgrade thread, heh.

Thanks for clearing this up, guys. :unsmith:

vvvvv

Edit: I never benchmarked it proper -- I got over my "zomg 3dmark points" phase a while ago, and now pretty much just estimate performance on "yep, this feels smooth". Which, incidentally, is also why I never really bothered checking why my Crossfire upgrade didn't feel so awesome up until now.

I suppose I'll have to try it out now. It'll be an excuse to finally install Metro 2033, too! :v:

Jan fucked around with this message at 03:05 on Oct 18, 2012

DaNzA
Sep 11, 2001

:D
Grimey Drawer
So how much faster are your games in CF configuration under that condition vs single card?

movax
Aug 30, 2008

Alereon posted:

I'm not going to pretend I was able to understand this, but is there maybe a difference between the "right" way to do things and the way it gets done in practice? Or is that what you're saying? I've never seen a system with 32-bit OS and a discrete videocard have more available RAM than what would be expected from 4GB-VRAM-other hardware reservations, and it seems like if it was possible to do without a hell of a lot of development they would have for the competitive advantage.

Sorry, I was just finishing up patches to an internal platform that corrected some issues with BIOS MMIO assignment so the acronyms just kinda flowed :shobon:

Not quite sure what you're asking, but in terms of PCI MMIO (memory-mapped IO) there's nothing terribly special about a GPU, other than them being the consumer device most likely to take a huge bite of address space. Your ethernet controllers and such also eat up MMIO space, but their BARs are less then a megabyte in size.

In my case, I have custom data acquisition hardware that eats ~128MB worth of MMIO per unit, and supports up to 8-10 of these things hooked up to a given system. The customer has a 32-bit Linux kernel that they have no plans to upgrade from soon, so they have to suffer with only ~2GB of usable RAM in the system.

Chuu posted:

One more question about 4GB cards, if you have two 2GB cards in SLI, are the textures duplicated or is it essentially the same memory addressing as a single 4GB card?

Not sure actually, that's an interesting question. From a hardware perspective, I could see the GPUs recognizing that there is a SLI bridge special and changing the BARs they request appropriately.

Agreed posted:

Can't last, I understand that, this is REALLY looking like the last generation where PCI-e 2.0 8x is going to offer performance in the vast majority of scenarios within a few percentage points of PCI-e 3.0 16x. But it does make much more comfortable waiting 'til Haswell for my next major system upgrade. Probably just need to just enable the Marvell SATA controller for my optical drive to free up one SATA slot in case I need to expand storage, otherwise it looks like I should be fine until Haswell. Which is pretty exciting in itself :dance:

The downside of PCIe 3.0 is that it's pricier to develop. Due to the increased speed, you need probably a 12.5GHz or 16GHz scope to properly debug signal integrity issues. Granted, a lot of the cost of testing is eaten by companies like Altera or Xilinx (plus the usual Intel, AMD) that developed PCIe 3.0 IP and validated their transceivers/solutions against PCI-SIG specs.

At PCIe 3.0 speeds you have to use a time domain reflectometer, an accurate model of your board (HyperLynx SI or similar) or brute math to get the s parameters of your board (Touchstone file) and properly apply emphasis/de-emphasis to your captured waveforms.

Basically PCIe 3.0 is fast as hell and requires some investment in development tools and increased development time. It lowers pin-count sure, but a lot of companies will still find it cheaper to push out PCIe 1.1/2.0 devices, especially if they started developing their ASIC with older-generation IP and SerDes.

The lower pin-count is awesome, but peripheral vendors need to catch up. Think of how many RAID/HBA controllers you could run from 1 x16 PCIe 3.0 link, heh. Could even throw a PCIe switch into the mix to use as a bandwidth bridge.

e: BAR is Base Address Register. Software writes all 1s to this register, and then reads the value back. Hardware ties certain bits to 0, therefore reporting to the host system how much memory it wants.

e2: Yeah Jan, going up to PCIe 2.0 even should result in a nice performance boost for you.

Jan
Feb 27, 2008

The disruptive powers of excessive national fecundity may have played a greater part in bursting the bonds of convention than either the power of ideas or the errors of autocracy.

movax posted:

Not sure actually, that's an interesting question. From a hardware perspective, I could see the GPUs recognizing that there is a SLI bridge special and changing the BARs they request appropriately.

Not sure about SLI, but I know that for Crossfire in Alternate Frame Rendering mode, you essentially need each GPU to have a copy of the exact same resources (give or take one frame :v:), otherwise they won't be able to efficiently render said frames. So in AFR, you essentially have as much VRAM as your individual GPUs. I see no reason SLI would be any different with regards to AFR.

Now, for the fancier multi GPU techniques, things get a lot more complicated than "render one frame on each GPU", so all bets are off. I haven't really looked into the techniques involved, but an engine that explicitly supports multiple GPUs definitely could make effective use of both cards' VRAM. But then, the challenge would become to find a way to divide the work of one frame evenely across all GPUs, in a manner as transparent as possible to the nature of said frame.

Depending on the renderer, there's lots of possible ways in which one could divide draw calls between multiple GPUs... But absolutely none could guarantee a perfectly even workload without sharing a considerable amount of resources. So might as well stick to the incredibly trivial AFR.

DaNzA posted:

So how much faster are your games in CF configuration under that condition vs single card?

So, just ran the Metro 2033 benchmark with the following settings:

Options: Resolution: 2560 x 1600; DirectX: DirectX 11; Quality: Very High; Antialiasing: AAA; Texture filtering: AF 16X; Advanced PhysX: Disabled; Tesselation: Enabled; DOF: Disabled

With Crossfire:

Average Framerate: 41.33
Max. Framerate: 174.89
Min. Framerate: 9.08

Without Crossfire:

Average Framerate: 22.67
Max. Framerate: 77.41
Min. Framerate: 5.43

So I guess it's not a complete loss. But I suspect both would sink pretty low if I used 4x MSAA instead, or turned on DoF.

Edit: Huh, turned on MSAA and DoF, and didn't quite get what I expected.

Crossfire + DOF

Average Framerate: 26.33
Max. Framerate: 188.01
Min. Framerate: 6.18

Crossfire + MSAA 4x + DOF

Average Framerate: 23.00
Max. Framerate: 154.42
Min. Framerate: 4.63

Crossfire + MSAA 4x

Average Framerate: 32.33
Max. Framerate: 149.84
Min. Framerate: 5.48

I would've expected the hit from MSAA to be greater than that of DoF. I guess I can conclude that in this particular case, the game isn't memory bound (even with 2560x1600 resolution) as much as ROP fragment bound. Which makes the memory bandwidth thing sort of moot. :v:

Edit 2: I probably meant fragment bound, not ROP bound. GPU programming is still sort of new to me.

Jan fucked around with this message at 04:58 on Oct 18, 2012

Professor Science
Mar 8, 2006
diplodocus + mortarboard = party

movax posted:

Not sure actually, that's an interesting question. From a hardware perspective, I could see the GPUs recognizing that there is a SLI bridge special and changing the BARs they request appropriately.
BAR size is generally set in the GPU VBIOS nowadays, and it can't be changed after POST. Also, BARs on GPUs are barely used except to program the DMA controllers that actually do the work because otherwise you spend way too much CPU time interacting with the GPU. So every card contains basically the same data set in SLI/CF.

(important note: one PCI device can have more than one BAR, and there are fun alignment restrictions with BARs that may cause the actual physical address space consumed to be much greater than what you expect versus the sum of the size of the BARs)

movax
Aug 30, 2008

Professor Science posted:

(important note: one PCI device can have more than one BAR, and there are fun alignment restrictions with BARs that may cause the actual physical address space consumed to be much greater than what you expect versus the sum of the size of the BARs)

Right, that's why I mentioned older Nvidia cards having multiple BARs, one of them being command/control, one (presumably) aliased to VRAM, etc. At least that's what the Noveau docs seem to suggest up to NV50 or so. Type 0 header allows for up to six 32-bit BARs, though I haven't run into a device with that many in the field. BAR alignment as described (hardware tying certain bits to 1 and software writing 1s/reading back) ends up being power-of-two, so yeah, if you need 90MB you end up burning 128MB.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
Oh boy oh boy, I'm not even reading the intro paragraph before posting:

AnandTech: Intel HD 4000 from DDR3-1333 to DDR3-2400.

BusinessWallet
Sep 13, 2005
Today has been the most perfect day I have ever seen
So I have an XFX Radeon 5850. When I first got it, I didn't use the HDMI port for a while, but in the past year and a half, it has been used in my HTPC, so I use the HDMI port to connect to my home theater. Since the first time I plugged a cable into it, the port has felt lovely and loose. The port looks fine, but there is clearly something wrong with it. It actually only works with one of my HDMI cables, I have tested tons of cables with it, and it only works with one. Furthermore, the one cable that does work with it, if it isn't inserted just the right way, the connection breaks. There is no "click" or anything when a cable is inserted into the port on the card like any other HDMI port would have.

It is a major annoyance because it will disconnect if someone is even walking across the room and their steps are a little heavy. I sent it back to XFX and this was their reply today:

[KEVIN_C 10/19/2012 12:03:35 AM] Hello we tested the card extensively and found no issues and the HDMI port doesn`t feel loose at all. At this point I am confident that the product is fully functional and the issue you described is not related to the card. We would like to send the card back as it is fully functional. Please let us know if there is anything in particular you would like to have tested before we do so. Thanks, Kevin

(And my reply to them):

[ 10/19/2012 12:11:01 AM] The HDMI port feels and looks fine, but I have tested it with several HDMI cables, and it only seems to work with one cable. I have used all the HDMI cables with other devices with no issue. If the cable is even slightly touched, it will stop the connection. I have tested other videocards with HDMI in the same PC using the same cables connecting to the same devices and had no issues. I have a brand new 30 foot HDMI cable that works with every device other than that videocard, I have even brought it to work and tested it on laptops and projectors and found no issues with that particular cable. I am positive it is the card, since as I have described to you, I have done extremely thorough testing. I have tried different driver versions with this card as well and found that not to be the issue. For what it`s worth, the HDMI port on my card, the one you`re testing right now, feels more loose than any other port on any other device that I have. I would really appreciate it if you could replace the port or the card. This has been a very inconvenient issue since I`ve had the card, but again, more of an issue now, since the HDMI port is being used to plug into a home theater and before it was not.

I have done extensive testing and I know the port is bad, but they are denying my claim. I have a double lifetime warranty (whatever that means), do I have any recourse here?

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
Short answer: Your story is another on the pile for exactly why we don't recommend XFX.

I'm not sure there's much you can do other than be a pain in their rear end about it, or sell the card and get a new one.

BusinessWallet
Sep 13, 2005
Today has been the most perfect day I have ever seen
Yeah I heard they weren't the greatest anymore, but back when I bought this card, everyone said they were great. It's almost insulting that they'd take a few hours with my card, say they tested it extensively and tell me they're sending it back. I'm a loving sysadmin, I know how to test hardware.

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice

Factory Factory posted:

Oh boy oh boy, I'm not even reading the intro paragraph before posting:

AnandTech: Intel HD 4000 from DDR3-1333 to DDR3-2400.
The big surprise to me is how little difference there is, I guess HD 4000 still doesn't have the throughput for memory bandwidth to matter. It's also very interesting how significant the real-world performance improvements are for things like copying files over USB from moving from DDR3-1333 to 1600. It really supports the conventional wisdom that EVERYONE should be using DDR3-1600, with enthusiasts potentially benefiting from DDR3-1866 (if they're not taking the money from something else).

More directly on-topic for this thread: EVGA has released their PrecisionX 3.0.4 software, the interesting new feature is what they're calling K-Boost:

EVGA posted:

This feature will allow you to “lock” GTX 600 series cards to Boost Clock/Voltage, even in 2D mode. Some important notes about this new feature:
-If using SLI, please disable SLI prior to enabling this feature, you can re-enable SLI once system is rebooted.
-Please disable EVGA K-Boost before reinstalling, or uninstalling the NVIDIA driver.
I haven't done any testing yet but it seems like this could come in pretty handy for overclock testing.

craig588
Nov 19, 2005

by Nyc_Tattoo

quote:

GPU Utilization will show as 0% when this is enabled.

I can take a guess as to how this works and I'll bet it has the same effect of all of the power target talk I had in the overclocking thread. Instead of giving a card effectively no limit, it now just ignores or overwrites the sensors so it thinks it's never maxed out. The only disadvantage I see with this software hack is you lose power management with it enabled. I changed my cards bios from 170 watts to 225 watts and gained over 150mhz on average with no side effects.

I'm too busy at the moment to check if that's actually how it works, but it's how I read it.

TheRevolution1
Sep 21, 2011
I have the worst luck with graphics cards ever. Basically every single one I've gotten has been unstable at basically all clocks. I got my MSI 560ti a year and a few months ago and it had artifacts when it got stressed too much, mainly in bf3. Tweaking the voltage didn't help much so I returned it to newegg for another MSI 560ti, this one had basically the same problem. I lived with it for a year, since it only occurred in some games and I didn't want to be out of a graphics card. Now a few months ago I bought an MSI 7950 and it's the same poo poo all over again except this one mainly has problems with planetside2 and dota2. It either blackscreens with a "the amd driver has crashed" message or it just throws artifacts all over the place. I usually don't let my own personal experiences affect my PC part buying decisions cause everyone gets unlucky sometimes but this is just too much. Really not looking forward to dealing with the RMA.

Beautiful Ninja
Mar 26, 2009

Five time FCW Champion...of my heart.

TheRevolution1 posted:

I have the worst luck with graphics cards ever. Basically every single one I've gotten has been unstable at basically all clocks. I got my MSI 560ti a year and a few months ago and it had artifacts when it got stressed too much, mainly in bf3. Tweaking the voltage didn't help much so I returned it to newegg for another MSI 560ti, this one had basically the same problem. I lived with it for a year, since it only occurred in some games and I didn't want to be out of a graphics card. Now a few months ago I bought an MSI 7950 and it's the same poo poo all over again except this one mainly has problems with planetside2 and dota2. It either blackscreens with a "the amd driver has crashed" message or it just throws artifacts all over the place. I usually don't let my own personal experiences affect my PC part buying decisions cause everyone gets unlucky sometimes but this is just too much. Really not looking forward to dealing with the RMA.

Have you been using the same motherboard this entire time? If between 2 560 TI's and a 7950, even if all from MSI, have continually demonstrated problems, I'd be more likely to believe something is wrong with the mobo and not the video cards.

I'd also not rule out a weak PSU as a source of instability.

TheRevolution1
Sep 21, 2011

Beautiful Ninja posted:

Have you been using the same motherboard this entire time? If between 2 560 TI's and a 7950, even if all from MSI, have continually demonstrated problems, I'd be more likely to believe something is wrong with the mobo and not the video cards.

I'd also not rule out a weak PSU as a source of instability.

How would I even tell if it was the motherboard or PSU? The motherboard has been the same gigabyte z68 the whole time. The PSU is a 650w XFX.

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice

TheRevolution1 posted:

How would I even tell if it was the motherboard or PSU? The motherboard has been the same gigabyte z68 the whole time. The PSU is a 650w XFX.
Post your own thread in the Haus of Tech Support, use the template in the sticky Rules thread and include the exact model of your motherboard and power supply. Gigabyte motherboards are notorious for their poor power delivery quality, though that usually manifests as hangs, restarts, or power-offs.

Adbot
ADBOT LOVES YOU

GRINDCORE MEGGIDO
Feb 28, 1985


Upcoming 12.11 drivers have been previewed by a few sites, and look like they'll be pretty decent performance increases - if you're on a 7 series card.

http://www.techpowerup.com/reviews/AMD/Catalyst_12.11_Performance/
http://hexus.net/tech/reviews/graphics/46905-amd-catalyst-1211-benchmarked-surprising-performance-gains/
http://www.anandtech.com/show/6393/amds-holiday-plans-cat1211-new-bundle

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply