Search Amazon.com:
Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us $3,400 per month for bandwidth bills alone, and since we don't believe in shoving popup ads to our registered users, we try to make the money back through forum registrations.
«264 »
  • Post
  • Reply
movax
Aug 30, 2008



Mod Note: This thread is for general GPU and videocard discussion, head over to the parts picking megathread if you just need help picking a card to buy.



Hi and welcome to the GPU Megathread! The idea is for this to be a discussion thread for GPUs...architectural discussion, how to min/max settings, rampant speculation about upcoming GPUs, etc.

Thanks to Factory Factory for co-authoring this beast, and RizieN for the awesome imagery.

Version 2012-05-10
Table of Contents
Post 1 - General GPU Information
Post 2 - Explanation of all the different models and model numbers
Post 3 - Optimizing Videogame Settings
Post 4 - Quotes of Knowledge from the thread
Post 5 - GPU Architecture
Post 6 - LucidLogix, Links to good posts in thread, etc.
Post 7 - Factory Factory Says Hi!
Post 8 - News and Updates


Dear reader - if you read nothing else in this thread, read this: Don't get a pricey GPU if you don't game or need a crazy number of monitors.

Do you want free performance from your nVidia card? Check out Agreed's bad-rear end guide here

Now, as of 2012-07-31:

The current best values for 1080p gaming are:
Nvidia GeForce 660 Ti (memory over clocking will make it even better)
And, if you must have an AMD card, the Radeon 7950 will serve you as well. It's just a bit pricier than the 660 Ti is all.

If you want the fastest single-card publicly available:
GeForce GTX 690 (really two GPUs on one-card). Really, you should only get this if for some reason you only have room for one card and you need balls-out performance and money is no object.

Fastest single-GPU, single-card:
GeForce GTX 680 (stock), AMD Radeon 7970 (overclocked matches 680’s overclocked performance). (note: the GTX 670 nips right on the heels of the 680 for much less cash, but technically it is slower)

The fastest mobile GPU:
AMD Radeon HD 7970M

Cards from "last-generation" that will still do the job for most games at 1080p:
Nvidia GeForce GTX 560Ti
AMD Radeon 7850

Price/performance chart: http://www.craftools.com/gpu/ (Goon-made, courtesy of Top Quark!)

Is x card faster than y card?
AnandTech GPU Bench; select both cards and see what does better in the games you play.

Now, you can read the rest of this thread for more detail on GPUs!





A GPU is a “Graphics Processing Unit,” or “a thingy that draws the thingies on the big flat thingy, you know, the hard drive.” These chips evolved toward their current form after the computer industry realized that with rapidly increasing complexity of graphical displays/technologies, the CPU could not not perform graphical tasks without suffering performance penalties. Also, another device to sell to people!

The GPU is now generally dedicated silicon, either on an entirely separate chip, or integrated with the CPU. It is mandatory for display support on every modern computer, and every PC features a GPU in some capacity or another. Every system from a $300 eMachines to a $3000 Alienware has at least one GPU.



Every desktop/laptop computer will require a GPU for the bare minimum task of displaying something useful on your monitor. The amount of GPU power you need is governed by the software you want to run. If you want to play the latest video games, you need a powerful GPU. If you have some productivity software that can utilize GPU acceleration, you may require a GPU.

If you just play 2D “casual” games and don’t require support for a crazy amount of monitors, you probably don’t need a discrete GPU (that is, a separate and dedicated device) if your system has integrated graphics capability.



As of 2012, there are only three major players in the PC GPU market that we'll consider. Companies like Matrox, 3dfx, S3, etc. have all sadly fallen by the wayside or ceased to exist entirely, but feel free to wax nostalgic!


Nvidia (Team Green)
Founded: 1993
Fanboy Support Material: GeForce 256, GeForce 260 Core 216, CUDA
Fanboy Flame Material: GeForce FX Series (Dustbuster), model name overloading, batch of poorly assembled/failing mobile GPUs

Nvidia has a ridiculously bro-d out CEO, Jen-Hsun Huang. They’re based in Santa Clara, CA, and got their start in GPUs (Nvidia is often credited for the popularization of the term ‘GPU’). Early successes included the Riva TNT, but things really took off with the very first GeForce. They also make motherboard chipsets under the ‘nForce’ line, and have began aggressively targeting the mobile market with System-On-Chips like the Tegra.

Nvidia got the contracts to supply the physical Xbox GPU (A Geforce3 derivative) and delivered a GPU to Sony for the PS3 in the form of an IP Core (GeForce 7 derivative). Combined with aggressive marketing and partner programs with dev houses, Nvidia is now the world’s dominant high-performance GPU manufacturer.

Nvidia has recently put a lot of development into general-purpose GPU (GPGPU) computing. The power to push pixels in video games can also be used for less-frivolous things to great effect. To this end, Nvidia has lines of GPUs not just for consumers and for workstations, but a line for high-performance computing (i.e. supercomputers) as well. As such, many of its recent GPU architectures have been focused as much around GPGPU performance as graphics performance.


AMD (Formerly ATI, Team Red)
Founded: ATI in 1985 as Array Technologies Inc., AMD (Advanced Micro Devices) in 1969
Fanboy Support Material: RV770 (Radeon 4850/4870), Eyefinity
Fanboy Flame Material: Driver support prior to RV770

Team Red started life as ATI Technologies, based in the great north. They enjoyed great success in delivering integrated solutions for major OEMs (server motherboards can still be found shipping with ATI Rage XL), and their early GPUs fell under the Mach and Rage series. They began to deliver consumer-3D accelerators with the ‘Radeon’ series, and this has been the model name ever since.

For a while, ATI lived in the shadow of Nvidia when it came to raw 3D performance; ATI delivered versatile solutions like the All-In-Wonder Radeon variants while Nvidia delivered video cards that would utterly destroy in 3D gaming performance. This turned around with the R300, aka Radeon 9700, which thrust ATI back into the eyes of gamers everywhere with its utter domination of the cringe-worthy Nvidia GeForce FX series (AKA Dustbuster).

In 2006, ATI was acquired by AMD, in a nearly $6 billion deal. The ATI brand has been fully retired, to be replaced by the AMD name...in red. This can be a bit confusing, as AMD’s traditional corporate theming has been green, which they retain for their CPUs/original business units.

AMD’s big thing right now is what they call an APU - Accelerated Processing Unit. Right now, that just means a CPU and a GPU on the same chip, and making the GPU not suck so consumers will want to buy it. But for the future, AMD is aggressively pushing towards a system architecture that allows complex CPU cores and arrays of simpler, GPU-style processors to act as a seamless functional unit, sharing memory and address space. Such technology would allow a system to switch between CPU processing and GPGPU processing much more efficiently than is possible today, allowing more powerful and power-efficient processing of many tasks.


Intel (Team Blue)
Founded: 1968 (entry into GPU market much later)
Fanboy Support Material: Making lovely little discrete GPUs obsolete in 2010
Fanboy Flame Material: Everything else

Intel has been in the game for a long time. When it was founded in 1968, the state of technology was only then finding ways to exceed more than a few tens of transistors per chip. Intel first commercial successes were primarily memory chips - chips faster and more-reliably built than competitors thanks to its unparalleled braintrust on semiconductor technology. In 1971, Intel brought the first commercial microprocessor to market. In 1983, it built and sold a graphics controller board around its 82720 graphics display controller, the first commercial GPU years before they were called “GPU.”

Fast forward to 1998, Intel releases the i740, an inexpensive AGP-based graphics card with many important innovations that also performed like a week-dead hog. And so began Intel’s GPU legacy: cheap graphics cards that displayed an image but choked hard on any intensive workload. Not that Intel stopped trying, mind you - their chips got better and started offering features like video acceleration. But the market they could sell to was cheap graphics integrated on the motherboard as “good enough” for the bottom of the market and business desktops.

But in today’s world of ARM tablets, Nvidia’s GPGPU success, and AMD’s aggressive push towards CPU/GPU compute integration, Intel decided it needed to raise the bar if its GPUs were to continue to be “good enough.” As part of its “tick-tock” CPU microarchitecture strategy, it has started updating its GPU architectures on a similar schedule. Intel’s GPUs no longer eschew the 3D performance that consumers and modern desktop compositing demand, and, starting with the graphics cores in its Ivy Bridge CPU “tick,” its GPUs will support GPGPU computing.

Because Intel builds its GPUs into its CPU packages (or even on the same silicon), almost every consumer Intel processor currently on the market comes with an Intel GPU. As such, Intel holds the majority share of the GPU market. Looking forward, Intel plans to continue its aggressive pursuit of increased GPU performance, raising the bar for “slowest” much faster than Nvidia and AMD can do the same for “fastest.”



You buy one, silly. It comes with a new computer, or you buy a graphics card from a tech vendor like Newegg or a general retail place like Amazon. See further on in OP for purchasing advice (i.e., which one to get, and from WHO).

If you’re buying any modern Intel CPU, chances are it’ll have a GPU as well. Market penetration! :giggity: AMD’s A-series APUs have Radeon hardware on-die, and occasionally its non-APU motherboards will have a crappy IGP on them.

How do I get a GPU (Laptop)?

You buy a laptop that already has the one you want in it. Laptop GPUs are a “losing” proposition most of the time. One of the biggest flaws is the inability to upgrade the chip; it is generally a BGA soldered to the laptop motherboard. There was an initiative to switch to pluggable modules (MXM), but this never really took off beyond ultra-high-end GPUs in very large laptops, and both of those things are very expensive.

Discrete laptop GPUs (as opposed to graphics integrated in the chipset or CPU) have a distinct negative effect on battery life. However, discrete laptop GPUs are needed for any high-detail or high-resolution gaming on the go. Most modern laptops offer some form of “switchable” graphics, where you can switch between using the integrated graphics and discrete graphics. These solutions get better all the time, but the switching can be fiddly and the power savings aren’t perfect even now.

If you want a specific laptop, your GPU options are limited to whatever the laptop manufacturer offers. If Dell or whoever sells its sexy new laptop with only one particular Nvidia chip, you get that one particular Nvidia chip, and that’s final. Of course, “that one particular chip” may not actually refer to one particular chip, because the intentionally-ambiguous model numbering for mobile GPUs can often fool you into thinking you’re getting a new-generation card when it is in fact a re-labeled older card. But more on that in Post 2.

Finally, because laptops have such tight design limits for heat and power draw, top-end laptop hardware is simply not as powerful as top-end desktop hardware. For example, the GeForce GTX 580M is actually the same exact GPU as a desktop GeForce GTX 560 Ti, significantly less powerful than a desktop GTX 580. Similarly, the Radeon HD 7970M is the exact same GPU as the desktop Radeon 7870, just clocked about 15% lower.

And yet laptop GPUs cost as much, if not more, as the top-end desktop parts. Why is this? Because they can do the job with only 35W-100W of electricity when a desktop part needs twice that or more. Chips that can pull that off aren’t easy to design or produce. But from a price/performance standpoint, it can be pretty bad.

So if you want a powerful GPU and mobility at the same time, it’ll cost ya.



Ever wonder if say, two GPUs would be faster than one? So did the GPU makers. SLI stands for Scalable Link Interface, and is Nvidia’s technology for linking two or more GPUs together to boost performance. CrossFire is AMD’s name for the technology which does the same thing.

You will need to purchase two of the same GPU, as well as a beefier PSU to support the greater power draw of the extra hardware. This doesn’t mean that the cards have to be from the same card vendor (i.e. Asus or EVGA), but depending on whether you have Nvidia or AMD hardware, the matching may otherwise be fairly strict. Nvidia hardware requires the same GPU (e.g. GF114), the same number of enabled CUDA cores (e.g. 384 vs. 336 vs. 288), and the same memory bus bandwidth and memory type for SLI. This can be hell to get just right sometimes. AMD cards are more forgiving: as long as the first two digits of the model number are the same, they can be paired in CrossFire (e.g. Radeon HD 5830 and HD 5870); it’s better to match cards as closely as possible, though.

SLI/CrossFire has the advantages of letting you build a system capable of pushing insane resolutions at high FPS, but it can also have some disadvantages. Some games may actually see a decrease in FPS due to vagaries in programming, and you may see some micro-stuttering. Micro-stuttering isn’t usually obvious at first glance. You could say that GPU A and GPU B consistently perform at an average of 60FPS (16.7 milliseconds per frame). But, GPU A could exhibit micro-stuttering, with a majority of frames much, much faster than 16.7ms, but an odd frame that exhibits a massive delay. This is opposed to GPU B, which plods along at a steady 16.7ms per frame. Newer cards contain logic to reduce micro-stutter, though, which is nice.

So, what are the pluses? Generally, increased performance in your games - more frames per second at higher resolution and higher details. SLI/CrossFire is pretty much necessary for surround-gaming setups or other ultra high-resolution setups. Recently, it seems that CrossFire scales better in multiple-GPU setups than Nvidia’s offerings. Also, it should be noted that twin GPU cards (two GPUs on one physical card) work by being SLI’d/CrossFire’d out-of-the-box, so all the caveats of dual-card apply to high-end dual-GPU boards as well.

For Nvidia SLI:

Factory Factory posted:

SLI require both the same GPU (e.g. GF110), the same number of active SMs/SMXs (i.e. CUDA cores) and ROPs, the same memory bus bitwidth, and the same amount of memory.

Examples:
    GeForce 460 768MB + GeForce 460 (current SKU)
  • Same GPU: No (GF104 + GF114)
  • Same CUDA cores: Yes (336)
  • Same memory bus: Yes (192 bit)
  • Same memory amount: No (768MB + 1GB)
  • SLI: No
    GeForce 570 + GeForce 560 Ti-448
  • Same GPU: Yes (GF110)
  • Same CUDA cores: No (480 + 448)
  • Same memory bus: Yes (320 bit)
  • Same memory amount: Yes (1.25GB)
  • SLI: No
    GeForce 460 1024MB + GeForce 460 768MB
  • Same GPU: Yes (GF104)
  • Same CUDA cores: Yes (336)
  • Same memory bus: No (256 bit + 192 bit)
  • Same memory amount: No (1GB + 768MB)
  • SLI: No
    GeForce 580 1.5GB + GeForce 580 3GB
  • Same GPU: Yes (GF110)
  • Same CUDA cores: Yes (512)
  • Same memory bus: Yes (384 bit)
  • Same memory amount: No (1.5GB vs. 3GB)
  • SLI: No
However, you can use the Coolbits software package to soft-off the features of the better GPU until they are both at the lowest common denominator, but this is not an automatic thing like it is with AMD, and Nvidia does not recommend doing it. This probably is because Coolbits hasn't been updated since 2004.



Modern graphics cards utilize PCI Express as their interface bus. PCI Express is a high-speed serial bus utilizing differential signaling to deliver high-bandwidth at low cost to consumers while maintaining compatibility with the PCI software architecture.

PCI Express links are defined in “lanes”, with a single lane being “x1”. Lanes can be found in x1, x2, x4, x8, x16, x32, etc. All compliant PCI Express devices must be able to “down-train” their links, meaning that even if a card is physically x16, it must at least link up with the OS even if it only has a x1 electrical connection.

PCI Express has several generations; 1.0, 2.0 and 3.0. 1.0 cards are all but obsolete, with PCI Express 2.0 being predominant. Currently, even a x8 PCI Express 2.0 link has been shown to deliver enough bandwidth for modern games at a 1080p resolution. Consumer CPUs are beginning to ship with PCI Express 3.0 controllers, which nearly doubles the effective bandwidth of PCI Express 2.0, but this is certainly not a requirement.

As of this OP, there is a very minor difference in performance between x8/x16, and I believe it’s not worth considering (just don’t go below x8 2.0). Post-Nehalem (Core i series) Intel CPUs have the PCI Express RC integrated into the CPU, and are capable of providing either a single x16 link, or 2 x8 links. Ivy Bridge and beyond have a greater flexibility in splitting links. Only the highest of high-end cards are slowed by more than a frame per second by even an x8 PCIe 2.0 link (equivalent to x4 3.0), and the worst such performance drop is less than 5%.



The GPU is useless without drivers. These drivers interface with the hardware itself, reading/writing from registers, dispatching workloads, etc. They have to maintain compatibility with thousands of programs, meet a stiff set of deadlines, and support a huge variety of hardware. Drivers can have a massive effect on performance, and it’s not uncommon for a driver release to come out a few days after a large title that delivers a 10% boost in performance. This is especially true for users of SLI/CrossFire setups; a new driver can enable multi-GPU scaling where there was none before.

You should always keep an eye out for the latest driver updates from your GPU manufacturer (no need to go to the board maker’s website). If there’s a new AAA-title on its way out though, you may want to take a look at the “Beta” section of the website, and try installing those drivers. They are labeled Beta, but not in the sense of being system-destroying monsters. They can be a good option if you find yourself needing some more FPS for a title that just launched, and you can’t wait for the stable-drivers.

You may see ‘WHQL-qualified’, which ostensibly means it was qualified by the Windows Hardware Qualification Labs. This generally equates to ‘stable’, but no need to sweat it if your drivers aren’t WHQL.



The 3 listed teams are the guys that make the actual GPU chips. Other companies package these chips onto cards and sell them, often based on a reference design from the GPU manufacturer. These guys are called ‘add-in-board partners’ and include names familiar to you such as eVGA, XFX, Sapphire, MSI, etc. They’ll take the GPU, throw in memory, add a cooler, design power regulator and sell it to you on a nice PCIe card.

The GPUs will be the same, but each vendor will differ in cosmetics, factory overclocking, bundled goodies, and warranty. In this goon’s opinion, warranty is the most important thing to consider when shopping for a card. Some companies will offer lifetime warranties on certain models (eVGA cards with an SKU ending with -AR, for instance), others a limited warranty. It’s also nice to see if they have a US-based (or whatever country you are in) RMA center.

Intel won’t have add-in-board partners, obviously, as their GPU is with the CPU.

There are other companies that design and build GPUs, like PowerVR and Matrox, but they specialize outside the consumer PC GPU market. (XGI Volari )



Much like a CPU, you can overclock a GPU for a “free” increase in performance. Much like a CPU, this increases heat generation and the load on the power delivery system, and it can cause instability.

Almost every GPU manufacturer has a software tool you can run from Windows to set the clocks to whatever you desire. AIB vendors will also sell pre-overclocked versions, where they’ve cherry-picked GPUs and verified their stability at higher clock speeds.



How much power should your power supply be able to deliver? Just use Newegg’s PSU calculator.

In terms of actually getting power to the GPU, there are two ways. First, the PCIe slot itself can provide up to 75W of power to the card, enough for a low-end discrete GPU. For video cards which need more power than that, there are connectors that come directly from the computer’s power supply, called PCIe power cables.

PCIe power cables come in three flavors:
  • 6-pin connector: provides 75W of extra power to the card
  • 8-pin connector: provides 150W of extra power to the card
  • 6+2-pin connector: provides 150W, like an 8-pin connector, but compatible with both 6-pin and 8-pin ports
Generally, you need to plug in a connector to each socket the GPU has. Sometimes cards will provide an extra socket, or will offer both a 6-pin and an 8-pin socket in case your PSU only has one type available. In such cases, consult the card’s manual to see what you need plugged in. Be sure to get a power supply that has enough PCIe connectors for your GPU(s).

Unless you want to nerd out about voltage rails, thermal design power, and Joule’s Law, that’s really all you need to know.

Side note: if your power supply doesn’t have enough PCIe connectors, yet you’re sure it can supply enough power for your GPU, there are inexpensive adapters available for 4-pin Molex connectors. Be sure you know what you’re doing, though, so you don’t cause an overcurrent on your power supply.



I’ll try not to get too spergy. Basically, in the DOS days, software had extremely low-level access to hardware, making performance-conscious programmers very happy. Newer OSes, such as Windows, threatened this, because in the brave-new world of desktop computing, security/stability were of high concern, and hardware access had to be abstracted.

DirectX is a set of APIs intended to serve as the bridge between hardware and software. The GPU vendors write their graphics drivers which implement DirectX functions, and software programmers use DirectX functions in their code. This is an absolute necessity, considering how many different hardware combinations are out there. Contrast this to consoles, where programmers only have to design for one specific set of hardware.

DirectX versions represent versions of the API, and newer versions introduce slick new features. We’re currently on 11.1, though some blame consoles (designed around the DirectX 9.0c-era) for limiting the number of titles where programmers invest time in adding new features. Since 9.0c, some legacy parts have been deprecated (DirectInput, etc.), and new graphics features have been added (new Shader Model version, Tessellation, etc.)

This really shouldn’t be much of a concern. The new cards we recommend are all DirectX 11-aware, and even the “last-generation” are DirectX 11 compliant. Windows 8 will ship with DirectX 11.1 out of the box.



OpenGL is just another graphics API, alternative to DirectX. It, however enjoys greater cross-platform support (Linux, OS X, non-x86 architectures, etc.) Games written using OpenGL renderers, or having one available are more likely to run cross-platform. Blizzard titles like Warcraft III or WoW ran happily under Linux using WINE because they could utilize OpenGL.

All the cards we recommend are OpenGL 4.2-compatible, and cards as far back as the GeForce 400 series or Radeon HD 5000 support OpenGL.



On Linux, your primary determining factor should be driver support. If you feel strongly about binary vs. open-source drivers, or have a specific application in mind, you should choose accordingly. Historically, Nvidia binary-drivers have been very solid, with ATI lagging behind somewhat. Recently, AMD has been contributing to open-source efforts, and has made up some of the disparity between the two.

On OS X, the choices are made a little more difficult. First and foremost, you’re stuck with whatever you get on Apple portables or Mac Minis. It’s soldered on the board, and Apple has offered both brands in the past. Currently, AMD’s got the contract.

On Mac Pros, it gets a little more complicated. Graphics cards have on-board what is known as a ‘Video BIOS’. This is pure x86 machine code that is executed very early in the startup sequence, that supports INT 10h/15h calls. Obviously, PowerPC-based Macs won’t know what the gently caress to do with x86 code. x86-based Macs utilize EFI-based BIOS systems, meaning that legacy BIOS ROMs may not work. (PC BIOS based on EFI generally will happily execute both Legacy and EFI ROMs).

So, for a Mac where you CAN change the graphics card, I’d check the Hackintosh Thread to see what folks have been using.

movax fucked around with this message at Jul 19, 2013 around 00:41

Adbot
ADBOT LOVES YOU

movax
Aug 30, 2008





The GPU market is a mix of old and new architectures. Sometimes this means “last-gen products still for sale,” and sometimes this means “last-gen products re-badged as new products.” Both AMD and Nvidia use these strategies liberally in an intentionally confusing manner, so it can be difficult to know exactly what GPU you’re buying, especially in the laptop market. Dick move, guys.

All three big companies offer compelling options for varying price points and performance levels. Generally, AMD and Nvidia battle over the performance crown while AMD and Intel duke it out on the low end.

Nvidia’s current-gen cards are the GeForce 600 series, based on its Kepler architecture. It was preceded by the Fermi-based GeForce 500 series, which is still for sale.

AMD’s current-gen cards are the Radeon HD 7000 series, based on its GCN architecture but including older VLIW4 and VLIW5 architecture GPUs, as well. It was preceded by the VLIW4-based Radeon HD 6000 series, which is still for sale at the lower end.

Intel’s current generation of GPUs is called HD Graphics, of which the HD 2500 and HD 4000 on Ivy Bridge are the newest revision.



Stop. You can’t directly compare execution hardware from one core to another except in very limited ways. The most useful information for knowing what to buy is real-world benchmarks. Countless times in computer history, theoretically better hardware has performed worse than older, supposedly weaker parts. Pick GPUs based on how many actual frames per second they get in the games you want to play at the resolution you are going to play at.

Below, a lot of that information is pre-digested into recommendations. If you want to find it for yourself, review sites like AnandTech, Tech Report, and [H]ard|OCP will tell you how hardware performs in real-world tasks.



These refer to the semiconductor fabrication process the chip was built on. Generally, this number refers to the smallest feature size on the chip, generally a transistor. Smaller processes allow a greater number of transistor to fit into a given area, and advances in process technology generally result in some degree of power-savings as well.

As with any new process, there can be production issues. Lower than expected yields on the 28nm process has somewhat limited the availability of high-end GPUs.



The current generation is a bit of a sea change in the AMD vs. Nvidia fight. Each has had enough time to steal each other’s best ideas from the past few years, and their GPU architectures are growing more and more similar. If you’re buying here, it will probably come down to simple price/performance.

But in case you do have more specific needs, let’s look at their comparative strengths and weaknesses:



Nvidia has the single, most powerful card, the GeForce GTX 680, and it runs cooler and quieter than AMD’s flagship Radeon HD 7970 while using less electricity. Both cards give a good gaming experience on a high-end 2560x1440 resolution monitor. However, the the 7970 has more headroom for overclocking, giving the cards nearly identical performance when tweaked to the hilt. The 680 still uses less power in that case, though.



AMD’s CrossFire scales better than Nvidia’s SLI, giving AMD the edge when you match up otherwise close-performing cards. However, CrossFire is a bit buggier than SLI, and it sometimes takes AMD longer than it takes Nvidia to support newly-launched games with full multi-card performance.



Competition here is fierce, and rarely do AMD and Nvidia both have cards that occupy exactly the same point on the price/performance curve. AMD historically has more wins, and its offerings are usually superior in the under-$200 market. Not that Nvidia’s cards don’t perform, but they cost more than their AMD equivalents. Above $200, it’s generally easy to find an increase in performance for every extra $25, including factory-overclocked cards.

What specific games you play may factor into this. While AMD and Nvidia’s cards tend to hold the same general level of performance across most games, some games strongly favor one company’s architecture or the other. If you play a ton of one of these games, that may change your price/performance calculus.



AMD has the single most powerful video card that doesn’t require a connector from the power supply, the Radeon HD 7750, making it the best graphics upgrade to a mass-market desktop that doesn’t require replacing the power supply as well. Nvidia has done quite a bit to improve its power consumption relative to previous generations, but until there are more GeForce 600 series cards, AMD still wins everywhere but at the tippity-top of the market.



AMD’s Eyefinity is more mature than Nvidia’s Surround. AMD cards support up to six monitors and can array them pretty much however you like in portrait or landscape orientation for a “single” display. Any AMD card with a DisplayPort connector can drive at least three monitors, and display daisy-chaining or a DisplayPort MST hub can increase that to five or six (up to three per DisplayPort and/or two DVI/HDMI screens) if the GPU supports it.

Nvidia’s Surround is more limited, offering at most four screens in a 3+1 arrangement (game-capable array plus utility screen). Surround has only limited hardware support - only the GeForce GTX 670/680 can currently run all four displays off of one GPU. All 500-series cards can drive a maximum of two monitors per GPU (which means you either need SLI or an expensive dual-GPU GeForce GTX 590 for Surround).

As well, Nvidia has rather esoteric rules for clocking its 600-series cards with multiple monitors; short version: single monitor, identical dual monitor (DVI/HDMI only) or Surround is full idle clocks, and any non-Surround use of the DisplayPort raises power consumption and fan noise.

Both AMD and Nvidia provide tools for bezel correction, moving the taskbar around to different screens, and managing windows.



Nvidia’s lead here has been eaten up by AMD, and the two companies are pretty much neck and neck in terms of visual quality, bugs, and API penetration. AMD is better situated looking forward, as its S3D technology is more broadly compatible, but as it stands today this benefit is largely theoretical. S3D is so demanding that you will likely be looking to upgrade around the time this factor really starts making a difference.



AMD has done a lot to increase GPGPU performance with the HD 7000 series, and Nvidia has intentionally let it slip with GeForce 600. Performance-wise, the current offerings are now pretty similar. However, Nvidia still has a huge ace in the hole: its CUDA API has much more market penetration than non-proprietary platforms like OpenCL. There is a significant amount of GPGPU software that only works on CUDA, including many 3D, video, and audio engines and plug-ins for professional content creation. See Post 5 for more details on GPGPU.



It’s not a big factor, but Nvidia has PhysX for lots of physical simulation shinies like glass in Mirror’s Edge, cloth in Metro 2033, or goop in American McGee’s Alice. The list of games that use PhysX is incredibly short, but it looks really pretty when it’s there.



One of the non-gaming functions of a video card is to accelerate the decoding or encoding of digital video. Modern video-codecs such as MPEG-4 AVC (aka H.264) require considerable horsepower to decode, and with its growing prevalence (YouTube, Blu-Rays, etc.), a GPU can help a weaker CPU decode video.

The GPU can also assist in encoding video, a common task for people who often transcode content from discs or downloads for their tablet or smartphone. Modern CPUs with plenty of cores do well here, but some GPU encoders blow them out of the water speed-wise.

While both AMD and Nvidia offerings do very well at decoding (assuming the software support is there!), Nvidia’s transcode engine just blows - it’s fast but low-quality. AMD’s Southern Islands parts have a video transcode engine faster than a high-end CPU (though not as fast as Intel’s QuickSync), and the image quality is excellent. Well, hypothetically; there’s no software support yet.



AMD and Intel compete in the low-end graphics space, where people don’t have as much need for gaming oomph or where price is the deciding factor, and the purchaser wants only as much performance as the least money will get. Relatedly, this is also where AMD and Intel create GPUs which try to do as much as possible within a small power budget, enabling good battery life and good thermals in mobile computers. While technically Nvidia competes in this space, too, they really don’t have any compelling offerings for low-end notebooks. Tablets? Sure. Mid-range and high-end? Absolutely. But at the low-end of laptops, AMD and Intel are the big deals.

Intel GPUs are not speed demons. They are intended to be functional, power-efficient, and as useful as can be given minimal cost of production. AMD follows this strategy, as well, with its netbook APUs (like the E350), but it differentiates itself with higher graphics performance in its higher-end laptop APUs. However, along with AMD graphics comes an AMD CPU, and AMD’s CPUs are not very good compared to Intel’s. If gaming is your top priority, however, AMD APUs are the better balance in terms of frames per second and eye candy.

Let’s take a look at strengths and weaknesses:



AMD wins here, hands down. The Radeon HD 6000 graphics on its A-series APUs are just more powerful than Intel’s offerings. Intel offerings are not performance-oriented; they are “good enough” in the sense that you can hop on WoW for auction house or play a Source engine game like Left4Dead or run StarCraft 2 on low detail. With Intel, you can run it. It takes AMD to look pretty, too.



At the low end, how much CPU speed you need factors in heavily: if you want the oomph of Intel cores, Intel graphics are essentially “free,” and an AMD (or Nvidia) chip is an add-on that adds cost and drains battery for performance. If you’re okay with the lesser performance of an APU’s cores, the same amount of money gets you significantly better graphics performance (both alone and when there’s an additional cheap chip). So price/performance is a question of priorities. For general performance and non-gaming tasks, Intel is king. For pure gaming performance, the crown goes to AMD.

At the high end, AMD doesn’t have competitive CPUs, and you can’t do CrossFire with an APU’s graphics, so all that goes out the window. You’ll pair an Intel CPU with either AMD or Nvidia graphics, and the AMD/Nvidia discussion above applies instead.



There’s no use generalizing. Check reviews for the specific model.



Mixed and situational, as the number of monitors supported is limited by the number of connectors the laptop manufacturer is willing to provide. Intel graphics top out at two displays on Sandy Bridge (HD 2000/3000), but Ivy Bridge (HD 2500/4000) supports three. Laptops with AMD graphics support full six-display Eyefinity, but only if the laptop has at least two DisplayPort connectors and you can put your hands on elusive MST hubs. Most laptops sold currently only have an internal display, an HDMI connector, and maybe a VGA connector. AMD, Intel HD 2500/4000, and Kepler-based Nvidia 600-series can run all of these at once. Intel HD 2000/3000 and all Fermi Nvidia GPUs can only run two of these at once.



3D Blu-Ray? Everyone can do it (except for Intel’s Pentium and Celeron chips - stick to the Core series). 3D gaming? Intel cannot. Not that a low-end AMD APU will provide any kind of acceptable S3D gaming experience, so really, at the low end, nobody can do it.



Let’s be clear, you won’t be solving GPGPU problems quickly on a laptop, especially a low-end one. Except for expensive mobile workstations, the idea is more that you can run the programs at all for development purposes. With that said, AMD supports OpenCL 1.2, Intel HD 2500/4000 support OpenCL 1.1, and Intel HD 2000/3000 do not support GPGPU. If you need CUDA, that may be a compelling reason to go for a low-end Nvidia GPU.



Until AMD gets Southern Islands’ video transcoder some software support, Intel is the hands-down winner for transcoding thanks to QuickSync. For just watching video, any GPU will do.



Intel is bad at drivers. Though they now perform game validation, Intel’s release schedule has been described as erratic. AMD does actual support for its drivers in the sense of reacting to game releases and fixing bugs in a timely manner.



Nvidia’s current offerings are the GeForce 600 series (Kepler architecture) and GeForce 500 series (Fermi architecture). The 500 series is on its way out as new 600 series offerings are released. In the consumer space, the Kepler architecture trades GPGPU performance for gaming performance by optimizing the shader pipeline for rendering tasks rather than all-purpose compute.

Nvidia’s model numbers have three core digits and a variety of text affixes. For example, take the GeForce GTX 560 Ti. For SLI, you have to match the GPU configuration (except for the frequency) exactly - number of cores, memory bus width, RAM amount. This means that if you’re not buying two identical cards to start with, you will have to be careful to buy an appropriate second GPU.



The family refers to the general target sector. GeForce is the family name for consumer graphics, Quadro is for professional graphics, and Tesla is for high-performance computing (HPC). Quadro and Tesla have their own naming conventions, so the rest of this is only relevant to the GeForce family.

The market refers to the segment within the sector which the card is targeting. There are a lot of letters that can go here, but the most common are GTX (enthusiast), GTS (mainstream), and GT (entry-level). The market tag might be omitted altogether. The only thing you really need to know is that only GTX cards have SLI connectors.

The generation is pretty much the model year. In desktop offerings, it generally aligns pretty well with the specific architecture/process size used across the entire line, but not always. The current generation is the 600 series (Kepler + Fermi Lite), which was preceded by the 500 series (Fermi Lite, a refresh of Fermi).

The model and variant together specify the particulars of the card’s GPU and memory configuration. For example, the GeForce GTX 560, GeForce GTX 560M (the M is for mobile), the GeForce GTX 560 Ti, and the GeForce GTX 560 Ti 448 Core (or 560 Ti-448) all perform very differently. More common variants are Ti (Titanium, better), SE (worse), and LE (worse). Thankfully, a -60 will always be a better card than any -50, regardless of variant, and a -70 will always be better than a -60, regardless of variant.*

Mobile parts (with an M suffix on the model) cannot be directly compared to desktop parts. A GeForce GTX 560M has a core/clock configuration similar to a desktop GeForce GTX 550 Ti. A GeForce 580M/675M is similar to a desktop GeForce GTX 560 Ti.

The model and variant can be unendingly and confusingly overloaded, especially in the mobile market. Probably the worst offender is the GeForce GT 555M (rebadged as the 635M), which is a name attached to three different GPUs in five different configurations, with performance ranging from “barely above a 550M” to “almost a 560M” and no way to tell which is which based on model name. Other offenders are the GeForce GTX 460 (which has four configurations with only one explicit variant), the GeForce GT 640M LE (two GPUs, at least three configurations), and 630M (includes both a 40nm version and a 28nm version which uses half the power for the same performance). And in the 600 series, desktops get to play, too, with the two-GPU, three-configuration GeForce GT 640.

* Except one of the three GeForce GT 640s variants is a GT 630 clocked 9% slower.

Just to be complete on the 555M/635M:
  • Nvidia GeForce GT 555M (GF106, 144 core @ 709 MHz, 128 bit GDDR5)
  • Nvidia GeForce GT 555M (GF106, 144 core @ 590 MHz, 192 bit DDR3)
  • Nvidia GeForce GT 555M (GF106, 144 core @ 590 MHz, 128 bit DDR3)
  • Nvidia GeForce GT 555M (GF108, 96 core @ 753 MHz, 128 bit GDDR5)
  • Nvidia GeForce GT 555M (GF116, 144 core @ 525 MHz, 128 bit DDR3)
  • Nvidia GeForce GT 635M (Whatever's left over from the unsold 555M stock)
  • Nvidia GeForce GT 635M (GF116, 144 core @ "up to" 675 MHz, 192 bit GDDR5)
  • Nvidia GeForce GT 635M (GF116, 144 core @ "up to" 675 MHz, 128 bit GDDR5)
  • Nvidia GeForce GT 635M (GF116, 144 core @ "up to" 675 MHz, 192 bit DDR3)
  • Nvidia GeForce GT 635M (GF116, 144 core @ "up to" 675 MHz, 128 bit DDR3)
In summary, gently caress Nvidia marketing.

Parts to care about for gaming:
  • Desktop: GTX 560 (or better), GTX 660 (or better) (ex. 560 Ti-448, 680)
  • Mobile: Midrange Kepler parts (GT 640M through GTX 660M), 28nm Fermi parts (some configs of GT 630M, GT 640M LE). High-end Kepler not yet released; wait for it.



AMD’s current offerings are the Radeon HD 7000 series (flagshipped by the Southern Islands family of GPUs and Graphics Core Next architecture) and Radeon HD 6000 series (flagshipped by the Northern Islands family and VLIW4 architecture). The 6000 series is on its way out as stock is depleted, to be replaced by 7000 series offerings. However, the HD 7000 series continues to use rebadged Northern Islands parts at the lower end, and even a few Evergreen (HD 5000 series) parts. This leads to inconsistent features within product generations. If you’re buying based on features, not just performance, shop carefully and read reviews.

AMD’s model numbers have four core digits. For CrossFire, you only need to match the first two digits of the model number (below, the generation and market). Ideally the cards would be identical, but it’s not necessary for them to be so. Any mismatch in configuration will go to the lowest common denominator - e.g. a CF pair of a 1GB 6850 and a 2GB 6870 will perform like a pair of differently-clocked 1GB 6850s - to avoid exacerbating problems like micro-stutter.



The family refers to the general target sector. Radeon is the family name for consumer graphics, and FirePro (formerly FireGL) is for professional graphics. Nobody cares about FirePro, so the rest of this is only relevant to the Radeon family.

The generation is pretty much the model year. In desktop offerings, it aligns with the latest architecture revision being used on the top two or three market segments, which is also often smattered about the lower-end offerings. The current generation is the HD 7000 series (Southern Islands et al.), which was preceded by the HD 6000 series (Northern Islands et al.).

The market refers to the segment within the sector the card targets. These are single digits from 2 to 9, where 2 is “low-end netbook” like the Radeon HD 6250, and 9 is a top-end enthusiast gaming GPU, like the Radeon HD 7970. Higher market numbers always mean better performance within a generation. With rare exception, a part with market n performs like last generation’s n+1. E.g. the Radeon HD 7850 matches or moderately outperforms a Radeon HD 6950 (while also using less electricity and having new features).

The model specifies the particulars of the card’s GPU and memory configuration. Generally, the larger the number, the higher the performance. A model of 70 generally implies a fully-functional part, e.g. Radeon HD 6970, where a 50 or 30 implies the same GPU with fewer cores enabled and lower clock rates.

The model may also include a single-letter suffix: M denotes a mobile part, and G and D denote an APU-integrated part (G for mobile, D for desktop). G2 and D2 imply a CrossFire pair of an APU and a discrete GPU. M parts are not market-comparable with desktop parts - a Radeon HD 6990M is similar to a desktop Radeon HD 6870 in configuration and performance. G and D parts on desktop and mobile APUs try to have performance similar to similarly-named desktop and mobile parts; the equivalences aren’t exact, but they’re pretty close. G2 and D2 parts are an exception - they typically perform better than similarly-numbered single-GPU parts.

Parts to care about for gaming:
  • Desktop: HD 6770, HD 68xx, HD 7750 and above
  • Mobile: APU parts (G and G2) for low-end, 69xxM, 77xxM through 79xxM.



Intel only has a handful of GPUs right now, all part of the Intel HD Graphics family, so this is blissfully easy: bigger number = better. Including the imminent Ivy Bridge releases, we have, in increasing order of power:
  • Intel HD Graphics (Ironlake, 2010)
  • Intel HD Graphics 2000 (Sandy Bridge, 2011), a.k.a. HD 2000
  • HD 2500 (Ivy Bridge, 2012)
  • HD 3000 (Sandy Bridge, 2011)
  • HD 4000 (Ivy Bridge, 2012)
HD 2500 and HD 4000 are DirectX 11 integrated GPUs found on Ivy Bridge (3rd-generation Core) processors. HD 4000 performs about 20-40% better than the DirectX 10.1 HD 3000 GPU found on top-end and mobile Sandy Bridge chips. Be aware that Pentium and Celeron chips currently have hardware like HD 2000, but with a large number of features disabled, including QuickSync, video decode acceleration, WiDi support, and 3D video playback. That is, everything that relies on the GPU’s video engine.

As these are IGPs, they compete most directly with the Radeon parts in AMD’s APUs. AMD APUs generally have lower CPU performance but better graphics performance, making your choice between them a trade-off.

You’ll need a chipset that supports integrated video for these to work, however. Irritatingly enough, at the launch of Sandy Bridge, the P67 chipset would let you overclock, but not use the IGP, and some Z68 motherboards omitted video connectors. Luckily, the announced 7-series chipsets don’t pull this bullshit again, and pretty much every H77, Z75, and Z77 board has the correct hookups.

Hardware engineer trivia: System manufacturers, especially on the mobile-side like Intel IGPs because of the huge variety of interface options available. The GPU connects to the PCH over FDI, and as an example, the Ibex Peak-M (5-series, HD Graphics/Ironlake) could handle analog video (RGBHV), LVDS (integrated flat panel, i.e. laptop), integrated DisplayPort (over CPU PEG), three DP/HDMI/DVI outputs and SDVO on top of that. Not all at the same time, but huge flexibility.

Parts to care about for gaming:
  • HD 4000, found on all mobile Core processors and the following desktop parts: i5-3475S, i5-3570K, i7-3770(S/T/K)

So what gaming GPU do I want already?

To find a graphics card with the appropriate amount of oomph, you need to ask “What games will I be playing?” and “What is the resolution of the monitor I will be gaming on?” If you’re playing only 2D indies or titles off GOG.com, you don’t need as much power as if you were playing Rift, Metro 2033, Battlefield 3, and Shogun 2 (all famously intensive games) all the time.

If you aren’t looking to game, then pretty much any video card, including integrated graphics, will run a desktop, do Hulu/Netflix, surf the web, etc. at up to 2560x1600 resolution. $25 of hardware can do 1080p video playback. Don’t stress out about having a super powerful video card if you don’t need one.

If you want Stereoscopic 3D gaming, you will need to double your frames per second. This means getting a much beefier card than you would otherwise need for your screen resolution, lowering visual details significantly, and/or purchasing an SLI/CrossFire pair of GPUs.

~1 Megapixel: 1280x720 (720p), 1280x800, 1366x768

These are popular resolutions for inexpensive HDTVs and most consumer laptops. Not very demanding!
    Budget
  • Mobile: Intel HD 4000; AMD Radeon 64xx-65xx(G), 75xx;
  • Desktop: AMD Radeon 6570
    Performance
  • Mobile: AMD Radeon 66xx/76xx; Nvidia GeForce 540M/630M, 555M/635M, 640M
  • Desktop: AMD Radeon 6670
~1.5 Megapixel: 1600x900, 1680x1050

This step up comes on many larger entry-level entertainment laptops and mid-range 18-22” desktop monitors.
    Budget
  • Mobile: AMD Radeon 67xxM; Nvidia GeForce 640M, 650M
  • Desktop: AMD Radeon 5770/6770, Nvidia GeForce 550 Ti (if no more than $15 more expensive than 5770)
    Performance
  • Mobile: AMD Radeon 68xxM, 77xxM; Nvidia GeForce 560M, 660M
  • Desktop: AMD Radeon 7770, 6850; Nvidia GeForce 460 (if cheaper than 6850)
  • Special mention: AMD Radeon 7750 (for desktops with low-wattage PSUs)
~2 Megapixel: 1920x1080 (1080p), 1920x1200

The most popular resolutions for good HDTVs, new monitors, and high-end gaming, entertainment, and productivity laptops. These are the resolutions of the goon-favorite Dell UltraSharp U231x and U241x.
    Budget - 1GB of VRAM
  • Mobile: AMD Radeon 69xxM, 78xxM; Nvidia GeForce 560M, 660M, good versions of the 555M
  • Desktop: AMD Radeon 6850, 6870; Nvidia GeForce 560

    Performance - >1GB of VRAM starts becoming useful around the Radeon 7870/560 Ti-448 tier
  • Mobile: AMD Radeon 79xxM; Nvidia GeForce 570M/670M, 580M/675M
  • Desktop: AMD Radeon 7850, 7870; Nvidia GeForce 560 Ti, 560 Ti-448
~4 Megapixel: 2560x1440, 2560x1600

The resolution of ultra-high-end 27”+ displays, including 27” iMacs. Starting to come down in price, and may become more common on mobile devices in the future (which hopefully trickles down to desktop displays).

all: 1GB of VRAM at the minimum. More may be necessary depending on title, especially if you want to enable MSAA or use high-res texture packs.
  • Desktop: At least AMD Radeon 7970, 7850 CF; or at least Nvidia GeForce 680, 560 Ti-448 SLI

Multi-monitor (Eyefinity and Nvidia Surround)

If you have a lot of screens, you’ll need a lot of graphics oomph to run them. This is the realm of kilobucks. Remember that we’ll all think your penis is really small if you post here with your multi-SLI setups!

The Widescreen Gaming Forum is a great resource for checking compatibility of games with multimonitor setups.

all: Probably need in excess of 1.5GB of VRAM, but if you’re looking for full details and antialiasing with a multi-monitor setup, get the 2GB+ versions.
    3x1920x1080 (6.2 MP)
  • Desktop: At least AMD Radeon 7970, 7850 CF; or at least Nvidia GeForce 680, 560 Ti-448 SLI
    6x1920x1080 (12.4 MP) or 3x2560x1440 (11 MP)
  • Desktop: At least AMD Radeon 7970 CF, consider triple-CF; or at least Nvidia GeForce 680 SLI (3x2560 only)
    Nvidia 3D Surround or Eyefinity 3D (3x1920x1080@120Hz)
  • Desktop: At least Radeon 7870 CF, 7950/70 CF better; or at least Nvidia GeForce 570 SLI, 670/680 SLI or a 690 better



I’m not gaming, but I need horsepower for my workstation

Each software package and workload likes slightly different things, and that makes it hard to give general suggestions. If there isn’t a certified hardware list, you can always ask the software company for recommendations. Nvidia also has pre-purchase support for Quadro and Tesla hardware which you can take advantage of.

If you have to have Goon advice, though, go ahead and post in the thread. Tell us what you’re doing, what software you’re using or want to use, and your budget. If you’re looking for advice on a full workstation, you might prefer the system building sticky thread.

As of this OP, it seems Adobe GPU acceleration has a thing for Nvidia cards is finally based on OpenCL in CS6 and will work with any reasonably beefy discrete card. CS4 through CS5.5 only support CUDA.

Gunjin posted:

The only cards that do GPU acceleration with OpenCL instead of CUDA in CS6 are the HD 6750M and HD 6770M (1gb vRAM versions), and then only on OSX 10.7.x. Everyone else needs to use a CUDA card still.

Note that there is a way to "hack" an unsupported card into Adobe CS, so if you have a Mac Pro with a 5770 or 5870 on OSX 10.7.x you could theoretically add support for one of those cards, but I'd be wary of doing that in a production environment where stability is a primary concern.

I am an HTPC neckbeard, and I want to use a custom decoder for my animes

Using madVR or EVR-CP? Dealing with 3:2 pulldown? Is one dropped frame per five minutes one too many? Well, there are probably more informative forums out there for you, but to crib from AnandTech: AMD Radeon HD 6570 or Intel HD 4000 (be sure the system has DDR3-1600 CL9 RAM or faster) give the best results for little cost up to 1080p60.

Of course, there’s always the option of shaving your neckbeard and using your powerful quad-core 32nm CPU to watch your anime, but where’s the fun in that?

I want multi-monitor support, but just for non-3D stuff

As long as the motherboard comes with DVI, HDMI, or DisplayPort plugs, IGPs generally support two or three monitors (as long as the motherboard has the correct ports), and you can use them at the same time as a discrete card. Intel HD 2500 and HD 4000 support up to three displays, and HD 2000/3000 supports two. AMD APU IGPs support three displays with Eyefinity support.

Some models of Radeon HD 7770 have a DVI output, a full-size DP output, and two mini-DP outputs for four simultaneous displays per dual-slot card. Some models of HD 6790 add a second DVI output for five simultaneous displays. More generally, any Radeon HD 5000, 6000, or 7000 card with a DisplayPort plug will support at least three monitors (2 HDMI/DVI/VGA + 1 per DP port), all with Eyefinity support. If you need to plug in a third+ DVI/HDMI display and you only have DisplayPort available, you will need an active DisplayPort adapter.

Hypothetically, with a card that has DisplayPort 1.2, you can run up to four independent 1920x1200 displays off a single DisplayPort plug using a Multi-Stream Transport (MST) hub, or swap two at a time for up to two 2560x displays. However, these have been “just a few months away from market” since the launch of the Radeon HD 6000 series. AMD has promised them for reals for this summer, though.

Finally, if you only need to show low-motion content like a webpage or terminal window, or if you need to add an external display to a laptop with no more available plugs, you can use a USB video adapter.


movax fucked around with this message at May 23, 2012 around 04:04

movax
Aug 30, 2008





Agreed wrote a great guide for overclocking newer nvidia cards later in this very thread, check it out!

Work in progress, check out the Overclocking Megathread in the interim!



Let’s just get this out of the way: GPUs are one of the worst computer components in terms of how long it “lasts” before it gets outclassed. CPUs have grown so powerful that newer models focus more on increasing number of cores or power efficiency. RAM has gotten dirt-cheap. But GPUs...with GPUs, there is always something new around the corner. Never try to future-proof a GPU.

This being said, just because a new-generation card launched doesn’t automatically make yours useless. After all, what do you think the developers who made your favourite games used in their dev process? Sure, the new card may push your games to an extra 10FPS, but you can certainly get playable FPS out of your card (within reason of course. Though, like a CPU, at some point your GPU will just be too-drat-old).



The software the runs a video game’s world is called an “engine.” Game engines run everything about the game - graphics, logic, AI, sound, extensions for physics, etc. Engines are complex, and rather than develop one in-house for each game, many game devs will license an engine from another development team. One of the most popular current engines is Unreal Engine 3, which has versions for PCs, game consoles, iPads, Android phones, and even Adobe Flash. You have probably played a UE3 game, such as the hotly anticipated blockbuster sequel to a popular series, 50 Cent: Blood on the Sand. Also some Batman games and something about a sheep herder fighting the Grim Reaper in space, I dunno.

Unreal Engine isn’t the only player, either; CryEngine 3 (e.g. Crysis 2), id Tech 5 (e.g. Rage), Dunia Engine (e.g. Assassin’s Creed: Brotherhood), and Source (e.g. Team Fortress 2) are also out there turning code into pixels and noise, as well as many other, lesser-known engines.

Anyway, the point is that games using the same engine will generally perform about the same with equal graphics settings. Engines are continuously upgraded with new features and eye candy, so performance can vary wildly depending on the options you can set, but with the same settings, performance between different games on the same engine should be similar.



Let’s get to fiddling with things to make the images pretty or make them come faster! And let’s be scientific! With the free program FRAPS, you can get an on-screen indication of your current FPS. Of course, a lot of games have built-in FPS counters as well, usually accessible via console (like BF3).

What does this mean to you? It means that if you are patient, you have an easy way of seeing what settings make a huge difference. Measure your FPS with say anti-aliasing @ 4x FSAA. Then turn off AA and check your FPS again. What, it went up by like 10FPS? Looks like you found a good setting to play with to tweak your performance.

An important note on the FRAPS counter, though: it counts when the video card is told to render a new frame, not when frames are actually being displayed on the monitor. Video cards contain frame timing logic to smooth out hitching and stutter, and they may delay the display of frames which rendered very quickly vs. the average frame time. Therefore, using FRAPS to time framerate isn’t perfect.

Textures
Without textures, your polygon models would be nothing more than a collection of wiremesh...polygons. Textures are all about video memory. One of the advantages PCs have over the current crop of consoles is an amazing amount of video memory available to hold high-res textures (if we’d get high-res textures more often ).

On an older card with not as much VRAM, texture detail is a setting you’ll want to start turning down. Some tools like eVGA Precision will report how much video memory is in use at a given time, helping you identify if you need to touch this.

Anti-Aliasing
Anti-aliasing reduces the “jaggies” that appear when the video card tries to render a detail that is smaller than a pixel, like power lines in the distance. In a perfect world, we would have antialiasing all the time, but many antialiasing algorithms have significant impacts on video memory usage and overall performance.

Common Types:
  • FSAA/SSAA (full-scene/super-sampling): This renders the scene at a much larger resolution and then scales it down. Very high quality, but EXTREMELY impacts performance and video memory.
  • MSAA (multi-sample): checks the geometry for edges and renders edges multiple times before averaging them. Good quality, but textures don’t get sampled multiple times, so transparent textures (like used to fill in a chain-link fence) will remain aliased. Most common AA mode in use today.
  • FXAA, MLAA, AAA (Analytical): These modes are shader-based antialiasing filters; they operate on the scene late in the render pathway. They all work a little differently, but they all give a good increase in image quality for almost no impact on performance. Upside: they work on everything on the screen, including transparent textures. Downside: they can affect edges that really should stay sharp, making everything look a tiny bit soft. In fact, if the game isn’t programmed properly, they can even blur out UI text, which blows. But when they work, they’re great.
  • TrSSAA, AAA (Adaptive): Nvidia and AMD’s implementations, respectively, of supersampling antialiasing that only acts on transparent textures to cover the gaps in MSAA.
You often see antialiasing modes given a rating of “2x” or “4x” or “8x.” This refers to how much oversampling is performed, and a higher oversampling rate means a higher-quality anti-jaggies effect. 4x MSAA is generally the most popular setting to balance image quality and performance

You can mix the shader-based filters with MSAA, as well, in some games. For example, Batman: Arkham City allows you to set MSAA + FXAA, and Metro 2033 allows you to set MSAA + AAA. Adaptive AA and TrSSAA are also frequently paired with MSAA, when available.

As a side note, the higher your screen’s pixel density, the less likely you are to notice jaggies in the first place. This can be affected by how away you sit from your screen, as well.

Killing off AA (or lowering it) is a great first step in stealing back FPS. Or switching AA modes to one that works well with your particular game/engine/GPU.

Texture filtering
Rendering textures that are far away or not facing you straight on is computationally expensive. To counter this, a technique called Mip-mapping is used to generate less-detailed textures that are faster to render from a distance. However, generating mipmaps can reduce detail and create artifacts. To counter these artifacts, a few techniques are used:

Bilinear filtering generates a gradient between pixels in the mipmap instead of having hard boundaries. However, bilinear filtering still leaves sharp, noticeable borders between the detail levels of the mipmap.

Trilinear filtering interpolates the borders between mipmap detail levels to reduce those sharp edges, making the transition between detail levels smooth.

Anisotropic filtering adjusts for the effects of perspective when you look at a flat surface from an angle. It creates mipmaps with more detail in places that will be rendered closer to the viewer, so that the same number of detail levels can show higher image quality. Includes levels from 2x to 16x to indicate how many mipmap levels are used.

Anisotropc filtering (AF) is no longer a significant burden on GPUs, and generally can be cranked to full without a significant impact on performance or memory use. If you use a high-res texture pack, the impact becomes a little greater, however.

Shadows


Ambient Occlusion
A fancy, recent feature. Basically, in computer graphics, this is an attempt to model the real-life behaviour of light when it strikes an object, whilst factoring in non-reflective surfaces and surfaces that may normally be hidden for view. It’s like only the shadow parts of radiosity, providing a good approximation of the phenomenon.

You’ll find this in game settings generally as SSAO or HBAO. These are GPU-intensive effects, and in my personal opinion, they can often be rather subtle and so difficult to notice. In other-my personal opinion, the subtle effect adds greatly to the realism of 3D shapes. But this should be near the top of your list on things to turn off for a gain in FPS, as AO will hit your GPU core and shaders hard.

Tesselation
See also: Nvidia’s Q&A page on what tessellation means in video games.

In general, tessellation is using shapes to create a surface, with no gaps between shapes. In computer graphics, tessellation is the process of subdividing a surface into more complex geometry. Combine tessellation with displacement mapping, and you have a way to render fine geometry detail to a model using only a coarse model and a texture. This can be used to add detail to a character model or prop, or it can be used to easily create patterned surfaces like tiles or cobblestones.

Tessellation looks great! Unfortunately, it’s also a real performance hog. Like ambient occlusion, it’s going to be one of those settings you turn down first to increase performance, though some people might prefer to turn off antialiasing before reducing tessellation.

Draw Distance
In a nutshell, draw distance is the distance of the furthest object that will be rendered. At the edges of the draw distance, you might get “pop-in,” where objects suddenly appear. This might be obscured by fog or by having the objects fade in, but it’s distracting. Further stuff being rendered means more stuff being rendered, and that means more load on the GPU and its memory, so shortening the draw distance (if that’s an option) can increase performance. Increasing it costs performance, but makes things look better.

Options like detail distance or grass distance or etc. in some games work the same way, just for specific objects or levels of detail. Back when Oblivion came out, just turning down draw distance slightly would be the difference in slideshow and buttery-smooth for elderly GPUs.


Your monitor refreshes at a single, fixed rate. Unless the video card renders frames EXACTLY that fast, there’s a chance that the image may be drawn to screen before it’s completely finished updating to the newest render. This causes screen tearing, where parts of different frames from different points in time end up drawn in the same refresh of the screen.

Vertical synchronization is an option to eliminate screen tearing. It forces the graphics card to pause after rendering an image until that image has been displayed on the screen. This eliminates tearing, but it increases lag (the time difference between game events and what is displayed on screen). It also locks your framerate to a number that divides evenly into your monitor’s refresh rate, so that a video card capable of 50 FPS would be locked into 30 FPS on a 60 Hz screen. So it’s a trade-off between better image quality and increased lag/decreased FPS.

If you have a recent motherboard that allows you to use the integrated graphics on an Intel Sandy/Ivy Bridge processor or AMD APU, you might be able to use Lucid’s Virtu and MVP software to enable vsync without increased lag. If you use that tech, you would skip the V-Sync settings in-game. More on Lucid Virtu and MVP in post five.

You may also see this referred to as “double buffering” or “triple buffering,” depending on how it’s implemented. If you’re pushing against your VRAM limit, V-Sync/triple buffering may not be possible, and you may have to settle for double buffering or even no anti-tearing.

Field of View
The field of view affects the shape of your cone of vision. Wider fields of view mean you can see further around you, but objects at the edge of the screen may become distorted. Narrow fields of view make objects in front of you large and prominent, but reduce your peripheral vision.

Adjusting the field of view is important when you are dealing with screens that have different aspect ratios. Since 16:9, 16:10, 4:3, and 5:4 ratio monitors are all fairly common now, most games will handle FOV automatically using one of a few techniques. For the curious, the most common are:
  • Hor+ scaling: Keeps a fixed vertical field of view, and widens the horizontal FOV on wider screens so that objects in the screen’s center look the same across all aspect ratios. As in “horizontal plus.”
  • Vert- scaling: Keeps a fixed horizontal field of view, and shrinks the vertical FOV on wider screens. Not very popular these days, but it still crops up occasionally. It’s a major pain for super-wide Eyefinity/Surround setups.
  • Anamorphic scaling: The field of view is fixed for a particular aspect ratio, and black bars are added around the image to fit it on different-aspect screens. Also known as letterboxing; it’s common with movies but relatively rare in games.
  • Pixel-based scaling: Objects are rendered at a fixed size in pixels, and if more pixels are available, the amount of objects you can see at once is increased. This used to be really popular in RTS games like the Command and Conquer series (more resolution = more battlefield seen at once), but more recent games have switched to Hor+ to eliminate the competitive advantage of a higher-resolution monitor.

The field of view scaling type can have gameplay implications. In StarCraft 2, the field of view is Hor+ scaled; this means that a 1920x1200 (16:10) monitor will actually display less of the battlefield than a 1920x1080 (16:9) monitor despite having more pixels. On the positive side, some FPS players like to manually increase their FOV (when possible) to increase situational awareness.

You’re most likely to come up against FOV issues with either an Eyefinity/Surround setup or if you play an older game from the time when everyone had 4:3 monitors and aspect scaling wasn’t implemented. You’ll likely have to look up where in the config files to make changes to adjust the resolution, UI, and field of view if proper support is not there out of the box. An excellent resource is the Wide-Screen Gaming Forum.

Particle Effects
Particle effects are how games create fire, sparks, magic, running water, smoke, hair, and other “fuzzy” effects. They can look amazingly pretty, but systems with lower-end video hardware might not be able to sustain playable framerates when large amounts of particles are continuously created. If you’ve ever had a smoke grenade bring your FPS to single digits in an shooter, you know what I mean.

If performance is a problem, especially in multiplayer games, turn down the number of particles. That will free up both CPU and GPU cycles for other things.

PhysX
Speaking of particles, Nvidia PhysX is something that may come up in a few games you play. PhysX is a physics engine which leverages an Nvidia GPU’s CUDA cores for extra eye candy, like cloth or fluid simulation, particle physics (for having a ton more with a lower impact on the CPU), and improved destructible environment elements.

PhysX is great for eye candy, when it’s available, but the CUDA power it’s using comes at the expense of shader/core performance for rendering graphics. Like ambient occlusion, PhysX is one of those settings that adds eye candy and realism, but is a prime target for turning off if you need more frames per second.

Some games allow you to enable PhysX even when you have an AMD GPU and no dedicated card for PhysX. If you do this, performance will tank unbelievably. Turn it back off.

Motion Blur
By default, a computer-rendered image is made with the photographic equivalent of an infinitely fast shutter speed. Motion blur effects simulate longer shutter speeds and make the illusion of motion smoother when objects are moving at high speeds and/or low framerates (and in this context, 30 FPS counts as low). Motion blur is used most prominently in racing games, though it shows up in other genres with fast motion, including FPS and action games.

There are a number of ways motion blur can be applied in-game. Some games will offer a few choices, if you can configure blur at all: no blur, camera-based blur, and object-based motion blur.

No blur is self-explanatory. Camera-based blur only applies motion blur if the game’s viewpoint itself is moving quickly, and it has a fairly low impact on performance; however, objects moving quickly past a stationary camera will not be blurred. Object-based blur is more complex to render and has a greater performance impact (shader/core-dependent), but it provides the most “real camera”-like motion blurring and image quality.

You may also opt to disable motion blur for gameplay reasons. For example, Team Fortress 2 includes the option for object-based blurring, but players may value a momentary sharp image to a more realistic smear for competitive value.

Coming soon: Using NV/AMD Tweak Tools to really get under the hood (once Agreed gets tired of being a new dad he could write this!)

movax fucked around with this message at Jul 19, 2013 around 00:43

movax
Aug 30, 2008





Somewhere, in a Dungeons and Dragons game, some engineers looked at each other and realized that when a GPU wasn’t busy gaming, it wasn’t good for much else. They had this piece of silicon capable of pushing millions of pixels per second that was idling away while their owners watched midget pornography.

“But wait...what if we had some kind of software APIs that could aid our users in watching midget pornography?”

And thusly GPU acceleration was born. It started limited, with both makers introducing hardware video acceleration, but rapidly grew. Nvidia led to market with CUDA, an API that allowed programmers to leverage the ridiculous amount of silicon in the graphics card for their tasks.

Traditional desktop CPUs aren’t too hot at parallel computations. For the longest time, we could run one single core at a time, rapidly switching between tasks, at many GHz. Then we started tacking on some more cores. Your average GPU on the other hand, gives you hundreds of processing units that are eagerly awaiting data to munch on. Very useful for scientific research, accelerating multimedia tasks, and physics computations for games. And speaking of scientific research, help goons cure cancer!

(movax is a lazy rear end and needs to write more here)

movax fucked around with this message at May 23, 2012 around 03:59

movax
Aug 30, 2008





CPU architectures are generally monolithic and serial: there are very few redundant functional units that make up a core, but they are very large and complex; and there is a single instruction decoder per core. They are optimized for generality, i.e. the ability to do as many tasks as possible as well as possible. In the consumer space, CPUs can generally only address one or two discrete computing tasks (“threads”) per core.

At first, GPU architectures were similarly constructed, with the exception that they sacrificed generality for extremely fast calculation of specific functions. At first, this was basic stuff like “draw a square” or “draw a circle,” because you have to walk before you can run. Once the consumer 3D era began, it was mostly mapping textures to geometry.

As the demand for increased performance grew, manufacturers found it easier to add redundant functional units rather than design enormous monolithic chips. Perhaps most iconic of this trend was 3DFX’s Scan-Line Interleave, or SLI, which used two physical video cards working together to nearly double (in theory) render performance. This worked out extremely well, because when get down to brass tacks, the job of a graphics card is to calculate the color of a single pixel many, many times. Since the pixels don’t depend on each other for much of the rendering process, you’re free to calculate two pixels at once, as long as you have the hardware.

Then Pixar happened. Their RenderMan 3D graphics software included the ability to use microprograms called “shaders” to add effects to images. Shaders let an artist specify multiple textures, changes to geometry, and diverse lighting conditions, all in a single render pass. With the release of Toy Story in 1995, their power was thrust into the limelight, and shaders became the big thing every 3D digital artist wanted.

Nvidia got started on a shader-optimized architecture practically immediately, because four years after Toy Story, they had the first ever shader-based consumer graphics card on the market, the GeForce 256, with hardware-accelerated, fixed-function transform and lighting shaders. ATI followed suit with the Radeon in 2000. This was the time of DirectX 7.0, the first era of GPUs.

Over the next decade, GPU architectures evolved into parallel groups of dedicated but programmable shader hardware, and from there into massively parallel functional blocks of general-purpose processors.


AMD tech slide showing the architecture changes from fixed-function shaders, to programmable dedicated shaders, to highly parallel multi-purpose compute units.

The programmability and increasing complexity of shaders allowed programmers to hook into the GPU for general-purpose, non-graphics computing. The massively parallel hardware could handle some types of work much, much faster than a CPU. And now this general-purpose GPU (GPGPU) computing has become a goal until itself.

A final note: This section would have been incredibly hard(er) to write without the work of the folks at AnandTech, who put a lot of effort into translating engineering and marketing materials for non-engineers, which one of us is. Please get in the habit of reading their site so I don’t have to keep updating this post.



Fermi
Nvidia’s Fermi architecture, released in 2010, was designed from the ground up to be a GPGPU architecture. Nvidia’s first DirectX 11 architecture, Fermi has two major variations: GF100, a multipurpose compute/graphics chip, and GF104, a lower-powered, graphics-oriented design aimed at consumers. Additional variations (GF106 and GF108) occupied lower-tier market segments.

At the highest level, both GF100 and GF104 consisted of multiple Stream Multiprocessors (SMs). Each SM had its own task scheduling hardware, memory interface, cache memory, texture units, interconnect to other SMs and VRAM, and a group of CUDA execution cores. This design allowed for simple die harvesting - a manufacturing flaw in an SM could be sidestepped by disabling the SM entirely. Within each SM, the CUDA cores were driven at double the clock rate of the rest of the hardware to improve shader performance.


Block diagram for a GF104 Streaming Multiprocessor with 48 CUDA cores.

The main differences between GF100 and GF104 were how many CUDA cores were present on the GPU and how many SMs into which they were organized. GF100’s 512 CUDA cores were organized into 16 SMs of 32 cores each. The consumer graphics work that GF104 was optimized for needed fewer thread scheduling resources, so its 384 CUDA cores were organized into 8 48-core SMs. GF106 and GF108 were organized like GF104, albeit with fewer SMs: 4 for GF106, and 2 for GF108.

Fermi also kinda sucked. GF100 was an enormous, expensive-to-manufacture chip with 3.2 billion transistors, and yields were not great. The GPUs were extremely power-hungry, as well, and so were extremely hot-running, requiring very loud cooling. In fact, there was never a consumer GF100- or GF104-based card that had a fully-enabled GPU. The flagship GeForce GTX 480 (GF100) had only 15 of 16 SMs enabled, because otherwise the chip could not be cooled sufficiently. And the poor yields on GF104 meant that there never was a consumer-marketed version of the chip with all 8 SMs enabled; the closest was the 7-SM, 336-core GeForce GTX 460 (and even some of those had a quarter of the memory controllers disabled, too).

Fermi was a hot mess. Luckily, it didn’t last long - between delays getting it to market and the planned revision coming out on schedule, it was replaced the same year it came out.

Fermi Lite
Nvidia quickly revised Fermi for better power consumption, cutting down the number of transistors by about 6%, and released GF110 in the flagship GeForce GTX 580 near the end of 2010. It’s still Fermi, but with heat down, the clocks could be driven higher, and fully-enabled parts have be released.

As said, GF110 powers the GeForce GTX 580, with 16 fully-enabled SMs (512 CUDA cores) and 10% higher core/shader clock rates than the GeForce 480, all for a slightly lower TDP. GF1x4 finally has a fully-enabled part, as well, with the GeForce GTX 560 Ti (GF114); the GeForce 560 kept the 460’s 7-of-8-SM GPU in a rare case of sensible and consistent naming from Nvidia. Other variations were GF116 (4 SMs), GF118 (2 SMs), and GF119 (1 SM).

Kepler - current-gen
This year has seen the release of Nvidia’s Kepler, a more significant revision of Fermi. While still Fermi at heart - Streaming Multiprocessors are still the fundamental unit of the GPU, though they’re now called “SMX” - Kepler takes advantage of a manufacturing process shrink to trade the double-clocked CUDA cores for the complete doubling up of most of the hardware in each SMX, compared to Fermi SMs. Kepler then clocks the entire SMX very high, about 30% higher than the stock clocks of the GTX 580.

That clock speed is also variable. Nvidia has implemented automatic overclocking of their Kepler GPUs, allowing them to overclock themselves when the workload demands it and the TDP allows it, similar to Intel Turbo Boost.


A Kepler GK104 SMX with 192 CUDA cores.

Kepler also has a significantly re-engineered memory controller, which is capable of running at a 6 GHz clock rate, massively increasing the memory bandwidth available to the GPU. While GDDR5 memory itself has been able to clock that high and higher for quite some time, building a memory controller that can reliably run at that speed is a major achievement.

To go with these massive speed improvements is a move to the PCI Express 3.0 bus, doubling the maximum bandwidth between the GPU and the rest of the system.

As a result of all these architectural innovations, the Kepler flagship card, the GeForce GTX 680, is a real beast, with 1,536 Kepler CUDA cores (equivalent to 768 Fermi cores) running at over 1 GHz. It’s the fastest single-GPU card on the market, bar none (as of this post!). While most cards have a fairly low overclocking headroom, allowing the Radeon 7970 to catch up when tweaked, the GeForce 680 still delivers best-of-the-best performance with a significantly lower thermal envelope than a 7970. Extremely well-binned GTX 680s can reach a core clock of 2 GHz, which Zotac is planning to sell as an ultra-premium SKU.

And you know what’s scary? The GTX 680 is GK104, with only 8 SMXs. It’s the cut-down, consumer-graphics-oriented Kepler. Big Kepler is still forthcoming, and if the performance gap between GK100 and GK104 is similar to that between GF110 and GF114, then drat. It may never hit the consumer market as a fully-enabled part, but just the idea of a hypothetical 16 SMX, 2,048-core massively-parallel GPU running at over 1 GHz should make anyone pop a trio of nerdboners.


The block diagram for Nvidia’s GK104 GPU, based on the Kepler microarchitecture, with eight SMXs (1,536 CUDA cores).

Being a graphics-oriented part, however, the less-robust thread scheduling of GK104 compared to GF110 (even more so than GF114 vs. GF110) leaves the newer part with inconsistent and often inferior GPGPU performance. While some benchmarks inherently favor Nvidia’s CUDA core architecture specifics, the GK104 on a GeForce 680 almost always performs worse than a GF110, sometimes worse than a GF114 (ouch), and usually worse than AMD’s direct competitor, the Radeon 7970 (double ouch). And if that weren’t bad enough, Kepler’s differences from Fermi are large enough that many 3rd party CUDA applications no longer function correctly, and this will continue to be the case until those programs are rewritten. This was the price that Nvidia paid for GK104’s high level of efficiency and thermal performance.



VLIW: Evergreen (VLIW5) and Northern Islands (VLIW4)
In 2009, AMD released the Evergreen (VLIW5) architecture, the first DirectX 11 GPU design, with Cypress, the GPU used in the Radeon HD 5800 series. This release significantly beat Nvidia’s Fermi (the GeForce 400 series) to market, and Fermi - hot mess that it was - generally offered poorer price/performance and power consumption once it finally was released.

VLIW4 and VLIW5 are graphics-optimized designs based on the “very long instruction word” high-level processor architecture, once nearly ubiquitous for GPUs since the introduction of shaders. Compared to the execution cores you might find in an Intel or AMD CPU, VLIW designs have very little task scheduling hardware. Instead of giving the processor the burden of optimizing the execution order of instructions by itself, VLIW puts that work on the software compiler, which automatically groups multiple, independent, small instructions into one - wait for it - very long instruction. These instructions can then be dispatched to highly-parallel execution resources without needing large and complex scheduling units.

Building on this efficient instruction-level parallelization, AMD paired its VLIW architectures with large arrays of relatively simple SIMD execution units - units which could receive a single instruction and apply it multiple times in one clock cycle, i.e. SIMD - Single Input, Multiple Data. (on the CPU, MMX and SSEx are instruction sets that try to bring some SIMD to the x86 world) All GPUs work on the same principle, but VLIW4/5 turned this kind of parallelism up to 11.

The combination resulted in an architecture with fantastic performance for graphics rendering tasks while keeping overall production costs and heat production reasonable. The architecture was also incredibly scalable: because the stream processors were arranged in discrete groups (Compute Units, or “CUs”) and were all fed by relatively simple scheduling hardware, it was straightforward to scale the GPU to all sorts of different sizes, both by design and by harvested slightly flawed parts.


Block diagram for Cedar, a.k.a. the Radeon HD 5450 (VLIW5). The SIMD engines contain 80 stream processors (40 each).


Block diagram for Cayman, a.k.a. the Radeon HD 6970 (VLIW4). The SIMD engines contain 1,536 stream processors (128 each).

Near the end of 2010, AMD followed up on Evergreen with the Southern Islands family (VLIW4 architecture), flagshipped by the Radeon 6970 using the Cayman GPU. VLIW4 was a small evolution, architecturally, its major innovation being the elimination of one rarely-used stage in the execution pathways from VLIW5, allowing more room on a same-sized chip for general-purpose stream processors.

Compared to Nvidia’s Fermi, however, VLIW architectures have a glaring weakness: while they handle highly-parallel and well-defined workloads (like graphics rendering or brute-forcing passwords) very well, VLIW does very poorly on less-threaded tasks. This means that using a VLIW5/VLIW4-based GPU for general-purpose (i.e. GPGPU) computing yields much lower performance than Fermi. And in today’s market, there’s a lot more money in GPGPU than in graphics.

Graphics Core Next (GCN) and Southern Islands - Current Gen
GCN, released first as the Radeon HD 7970 (using the Tahiti GPU), was developed to address VLIW’s issues with GPGPU computing. At a basic level, the SIMD units stay the same, and they are still organized into groups of CUs, but the task scheduling hardware that feeds the CUs is very different.

To understand the difference, a bit of vocabulary:
  • An instruction is a single command sent through the processor to an execution unit. Software is not directly aware of instructions. Add, move, and compare are all examples of instructions.
  • A thread is a logical stream of instructions, of which the operating system is aware and can prioritize over other threads. For example, a game programmer may decide to put sound processing and enemy AI in different threads. Then these threads may be executed alternately or, on a dual-core processor, at the same time, rather than forcing sound processing to always wait for AI processing to finish.

Both VLIW4/5 and GCN are highly-parallel architectures, but in different ways. VLIW is parallel at the instruction level; any similar instructions, regardless of how they are threaded, the GPU tries to execute simultaneously. This causes inconsistent performance, because the processor will execute as many instructions at once as it can, regardless of any logical priority they may have as threads. Therefore, the speed of computing individual threads may be quite slow at some times, e.g. when threads contain few instructions that can be run in parallel, and very fast at others. This inconsistency is undesirable and frustrating for software developers.

GCN is parallel at the thread level. Where VLIW4 would spread one thread across all four SIMD units in each CU, GCN runs four threads per CU at a quarter the rate per thread. When all of the threads are similar and instructions are parallelizable, as in graphics work, the instructions-per-clock rate is identical to VLIW4. When the ability to parallelize instructions varies, the worst-case and best-case performance are nowhere near as inconsistent. As such, GPGPU workloads work much better on GCN than they did on VLIW4, and GCN’s compute performance makes AMD truly competitive with Nvidia in this space.


A not-to-scale comparison of how GCN schedules threads (“wavefronts”) to SIMD units/Stream Processors vs. VLIW4 (Cayman, far right).


Block diagram of the 2,048 Stream Processor Tahiti GPU (Radeon 7970), with the parts that really changed from Cayman obscured by logos.

GCN’s other major innovation was a jump to a new semiconductor process, 28nm (from 40nm). This allowed the Tahiti chip to be no larger than the VLIW4 Cayman chip despite having a third more stream processors and additional scheduling hardware. Add to that the smaller processes’ decreased power consumption and faster voltage rise times, and GCN can be clocked up quite a bit higher than previous-generation GPUs. However, Nvidia’s Kepler did even better still with the 28nm transition, at least in the consumer space, which takes some of the wind out the sails of this particular accomplishment.

As a final note: Though Evergreen and Northern Islands are no longer AMD’s flagship lines, the VLIW4 and VLIW5 architectures still have places in AMD’s consumer lineup. They are still used for lower-end mainstream GPUs in desktops and laptops, including on AMD’s APUs. Though this multi-architecture product lineup happened a small amount in the transition from VLIW5 to VLIW4, AMD has now made it an explicit strategy to use multiple GPU architectures to serve different markets.

Nvidia and AMD wrap-up
A lot didn’t get covered here, frankly. The full-GPU block diagrams show off all sorts of features dangling off the main processor that I didn’t touch on, like memory, the display controllers, CrossFire/SLI controllers, and, when applicable, fixed-function video encode/decode engines. (Feel free to post if you have questions though!)

But what did get covered is the core architecture of some pretty impressive highly-parallel processors from two major semiconductor design firms. And over time, those core architectures have gotten more and more similar. Both Team Red and Team Green are jockeying for the advantage in execution and details of their designs, rather than competing with wildly different architectures.

It’s hard to say what things will look like going forward. GCN has not yet rolled out to professional products, and Kepler is barely even out in the consumer space. The big money is in high-performance and professional computing right now, and both major players have solid GPGPU architectures. Could it be that future GPU architectures will simply try to do the same things just a bit better, leaving major changes for other parts of the card or the system? Or is something in the pipeline that will change the game once again?

Nvidia and AMD know, but they’ll only tell us so much. Their design cycle takes four or five years, and they are beginning work now on what will be 2016’s flagship GPUs. We’ll take a look at their published roadmaps below.


Now that we’re done with the big kids’ chat, it’s time to look at Intel’s HD Graphics 4000:


Well, that was nice. See ya.

... Oh, fine.

Intel’s HD Graphics is one part trying to get the most out of a tiny power envelope and transistor budget, and one part a trip back in time ten years. Rather than using oodles of generalized parallel execution resources, as many things as possible on Intel’s HD Graphics are fixed-function. Threads are only dispatched to compute units once everything possible has been done to them in small, efficient, specialized functional blocks.

Strategically, this makes perfect sense. Intel is fantastic at manufacturing, CPU design, and all sorts of things; but it’s playing catch-up in GPU design. It hasn’t had two decades of experience building graphics-optimized designs like AMD or Nvidia has, nor has it faced such strict thermal and power limits as those companies have habitually worked with until relatively recently. And yet it’s trying to build up expertise to eventually offer graphics suitable for everything from mobile phones to ISV-certified professional graphics (e.g. the IGP on Xeon processors).

But when you get down to it, Intel’s GPU architecture is not very parallel, and so it evokes old GPU designs. It contains only one of things which most performance-oriented GPUs will contain multiple, even dozens of, like texture units, tessellators, and rasterizers; things which even many entry-level GPUs have at least two of.

Factory Factory did an awesome job parsing engineer speak!

movax fucked around with this message at May 23, 2012 around 03:39

movax
Aug 30, 2008



paste #5

Factory Factory
Mar 19, 2010

I can do sex. It's just alien sex.


Hi, I'm Factory Factory, your co-OP for this thread. I was the technical writer to Movax's engineer. That's not a euphemism.

E: Quoting this:

movax posted:

Shadows

Factory Factory fucked around with this message at May 10, 2012 around 23:22

Factory Factory
Mar 19, 2010

I can do sex. It's just alien sex.


Reserved so I have a place for news and content.

You may now poo poo up this thread.

Factory Factory fucked around with this message at May 10, 2012 around 23:16

movax
Aug 30, 2008



I got really excited and prematurely spooged poo poo Post...SWSP, please fix

e: post any and all feedback, seriously, and will make edits! Sorry for not publicly previewing

movax fucked around with this message at May 10, 2012 around 23:50

Star War Sex Parrot
Oct 2, 2003




Holy mother of god.

unpronounceable
Apr 4, 2010

Bad Maki! No more fantasizing about guys.

movax posted:

The generation is pretty much the model year. In desktop offerings, it aligns with the latest architecture revision being used on the top two or three market segments, which is also often smattered about the lower-end offerings. The current generation is the HD 7000 series (Southern Islands et al.), which was preceded by the HD 600 series (Northern Islands et al.).

I'll read it more thoroughly later, but here's the first typo I saw.

Ramadu
Aug 25, 2004

THAT'S our O-line?


I'm looking at getting either 2 2312hm's or 2 2412m's and have posted a few times in the monitor thread but I think this question belongs here. I currently have a 570gtx card and was wondering if that will be enough to power 2 of those monitors. I'd like to play in 1080p on one monitor while still being able to work on the other one. Will this card be enough or should I begin thinking about getting a 670 (which I already was after I stop being indecisive about monitors)?

movax
Aug 30, 2008



Ramadu posted:

I'm looking at getting either 2 2312hm's or 2 2412m's and have posted a few times in the monitor thread but I think this question belongs here. I currently have a 570gtx card and was wondering if that will be enough to power 2 of those monitors. I'd like to play in 1080p on one monitor while still being able to work on the other one. Will this card be enough or should I begin thinking about getting a 670 (which I already was after I stop being indecisive about monitors)?

If you just want to game on one monitor, the 570 GTX is plenty. If you want to do surround-gaming, in post 2, Factory Factory goes over some of the GPU options suggested for multi-monitor gaming setups.

Ramadu
Aug 25, 2004

THAT'S our O-line?


movax posted:

If you just want to game on one monitor, the 570 GTX is plenty. If you want to do surround-gaming, in post 2, Factory Factory goes over some of the GPU options suggested for multi-monitor gaming setups.

Ok, I'm kinda computer stupid so I didn't know if I was supposed to combine the resolutions of the monitors and pick a card based on that or what. Thanks for the quick response.

Star War Sex Parrot
Oct 2, 2003




I'd like to pick card recommendation discussion in the stickied parts-picking thread so as not to fracture discussion; keeping this thread for architecture, industry, and upcoming product chat.

real_scud
Sep 5, 2002

One of these days these elbows are gonna walk all over you


Star War Sex Parrot posted:

Holy mother of god.
Yep, but good god is it full of really good useful information. Thanks movax and Factory Factory!

Alereon
Feb 6, 2004

For me but LEFTHANDED

I just pulled the trigger on an EVGA Geforce GTX 670 2GB for $399.99 at Newegg. This is the first time in my life I've bought a videocard on launch day, or an nVidia videocard ever. I picked this particular card because I needed something shorter than 10" and it seemed slightly better than the other nearly-stock cards.

text editor
Jan 8, 2007



Mostly saving this spot for some spergy unixposting, but as far as graphics go, but for substance for now I'll just add in that the level of graphics support on Linux and BSD has changed a bit due to the requirements of Kernel Mode Settings by newer Intel, AMD, and Nvidia caards. For basic usage, Intel is considered to have some of the best support for regular desktop usage. The Sandy Bridge driver has made some huge imrprovements, and the Phoronix has been posting what feels like every day for a year now about how much work Intel was putting into the IvyBridge/HD4000 series..

movax
Aug 30, 2008



text editor posted:

Mostly saving this spot for some spergy unixposting, but as far as graphics go, but for substance for now I'll just add in that the level of graphics support on Linux and BSD has changed a bit due to the requirements of Kernel Mode Settings by newer Intel, AMD, and Nvidia caards. For basic usage, Intel is considered to have some of the best support for regular desktop usage. The Sandy Bridge driver has made some huge imrprovements, and the Phoronix has been posting what feels like every day for a year now about how much work Intel was putting into the IvyBridge/HD4000 series..

This would be awesome for the OP. I haven't been running Linux much outside of VM these days, and when I was last using it on the desktop, Nvidia binary-blobs were the best things available.

Fatal
Jul 29, 2004

I'm gunna kill you BITCH!!!


This thread reminds me of that one time I wrote the weed mega thread. It was huge, but not this huge. Great job, bar raised.

eggyolk
Nov 8, 2007

NO FAT CHICKS
WOOOOOOOOO!
(so lonely)


Amazing OPs. But just to be an rear end in a top hat...

Can we get any info on workstation graphics cards? I was tasked with putting together a CAD PC for my last job (which never happened sadly) and got lost researching the Quadro and Firepro cards. I know most of their differences are drivers, but the newer architecture seems to be able to do some really amazing things with regards to in-software rendering. Plus they always sport crazy numbers that are fun to nerd out on. I spent way too much time comparing the FirePro 4800 to the Quadro 600 in each program and application, configuring builds with absurd core counts.

Maxwell Adams
Oct 21, 2000

T E E F S

The antialiasing section could use a few words about SMAA. It's fast, it looks great, and you can inject it into most games.

Nierbo
Dec 4, 2010

Go for the take him down!

This is happening with my HIS HD5750



I put in a support ticket after having to sign up to some website. Its about a year old but I bought it new on ebay from a online computer store. Will they still replace it if I can't provide a receipt?

Factory Factory
Mar 19, 2010

I can do sex. It's just alien sex.


You don't have a PayPal receipt or copy of the auction page that you can dig up?

Nierbo
Dec 4, 2010

Go for the take him down!

Yes you're right, I just found it.
"Date Ordered: Monday 14 March, 2011"

Guess I'm forking out for a new card right?

doczoid
Mar 14, 2012

by Y Kant Ozma Post


I (and many others) would be very interested in finding out which GPU's are crippled in terms of OGL/OCL/CUDA performance.

The consensus is that all cards since the GTX 285 are double precision floating point crippled to the point where they perform worse in GL than an 8800GTX and I can confirm that I have tried and sold a GTX 480 and a GTX 570 due to abysmal GL performance.

Top Quark
Aug 2, 2010

"Going where no man has gone before."

movax posted:

Price/performance chart: http://www.craftools.com/gpu/ (Goon-made, I believe!)

Yay my little graph project made it in! I still have plans to work on this but I've been crazy busy recently (moving to a new country and all).

A note: I recently broke the scraping script thingy so the prices are a bit outdated. Also I'm moving to PassMark scores rather than average FPS' as it's too hard to normalise the benchmarks across cards. I will also be adding support for techreports 99th percentile benchmarks.

Also I looked into converting the thing to a dynamic PNG so you can link it here but holy god it looks complicated and I'm not THAT good a coder. I'll keep seeing if I can throw something together though.

lllllllllllllllllll
Feb 28, 2010

Now the scene's lighting is perfect!


Great OP, OP. In a nutshell, what's the difference between the 5xx and 6xx family of NVidia's cards (energy saving?)? I currently have a i7 2600 and a NVidia 550, which is not an ideal combination. But currently I don't really need a new card anyway. What't the next affordable thing that is more better than the 560? Or can't anyone say at this point. Sorry for the stupid/unanswerable question.

\/ Thanks a million and sorry for the oversight.

lllllllllllllllllll fucked around with this message at May 11, 2012 around 12:37

Factory Factory
Mar 19, 2010

I can do sex. It's just alien sex.


Practically, it's 1) power savings and 2) CUDA/GPGPU performance. The consumer 600 series will just be worse at CUDA than the 500 series. Otherwise, you can divide the CUDA core count on a 600 series card by 2 and get a pretty good first approximation of how it relates to a 500-series card in GPU-bound scenarios.

As for the purchasing questions, not only is there a price/performance chart linked at the top of the OP that will tell you everything you need to know to answer your question. It's also quoted in the post directly above yours. But that said, SWSP said he wants to keep this thread more on info and less on buying assistance, so let's try to keep that kind of question in the system building sticky.

Nierbo posted:

Yes you're right, I just found it.
"Date Ordered: Monday 14 March, 2011"

Guess I'm forking out for a new card right?

They come with a two year warranty?

Factory Factory fucked around with this message at May 11, 2012 around 11:26

Verizian
Dec 18, 2004
The spiky one.

Can we talk about non-gpu bits of a videocard here too because I've got a few questions?

  • Why do drivers suck so much?

    Mainly an AMD thing from my experience with OpenGL loving up for idTech5 and the Windows 8 CP drivers requiring manual registry editing.
    Also heard some horror stories about nVidia drivers too but no first hand knowledge.

  • Aftermarket cooling and replacing stock TIM.

    Is the thermal goop really that bad and why shouldn't you just replace it with a dot of Arctic Silver?
    How do you pick a good aftermarket cooler apart from searching "Aftermarket cooling MSI 6870" then trawling opinionated blog and forum posts that often contradict each other?

  • Muiltiple displays and non-standard resolution.

    Are there any problems with running multiple displays at different resolutions and using them for gaming or content creation? Do you still just multiply X*Y then add them together for each screen to get the total number of pixels?
    If so, for 8-10MP total display area would you aim for a single 7970, 680, 670 or is CF/SLI required after 6MP?

Nierbo
Dec 4, 2010

Go for the take him down!

Factory Factory posted:

They come with a two year warranty?
It doesn't say. I always thought stuff only came with a one year warranty. I guess they're only like 60 bucks now anyway for a new one. Not as bad as I thought. Its enough for TF2.

Great OPs by the way.

Factory Factory
Mar 19, 2010

I can do sex. It's just alien sex.


HIS cards come with a two year warranty. The confusion was in you possibly miscounting the years.

Factory Factory
Mar 19, 2010

I can do sex. It's just alien sex.


Verizian posted:

  • Why do drivers suck so much?

    Mainly an AMD thing from my experience with OpenGL loving up for idTech5 and the Windows 8 CP drivers requiring manual registry editing.
    Also heard some horror stories about nVidia drivers too but no first hand knowledge.

Everybody's drivers suck, just at different times. Intel's Windows drivers sucked until a few months ago, now they merely sip. Nvidia has constant problems, like a massive Shogun 2 performance bug for high resolutions and 3D Vision generally being a slightly bigger horror than AMD's stereoscopic 3D. AMD sucks at Rage and is filled with small bugs and performance issues that take them a bit longer than Nvidia to iron out because they don't give game devs a bunch of free hardware in return for marketing and early access for driver optimizations.

quote:

  • Aftermarket cooling and replacing stock TIM.

    Is the thermal goop really that bad and why shouldn't you just replace it with a dot of Arctic Silver?
    How do you pick a good aftermarket cooler apart from searching "Aftermarket cooling MSI 6870" then trawling opinionated blog and forum posts that often contradict each other?

With current gen cards, you will not apply better TIM than the manufacturer can, period. Get the cooler you want the first time around if you want TIM perfection.

If you have to have aftermarket cooling, Arctic Cooling makes pretty much the only good coolers.

quote:

  • Muiltiple displays and non-standard resolution.

    Are there any problems with running multiple displays at different resolutions and using them for gaming or content creation? Do you still just multiply X*Y then add them together for each screen to get the total number of pixels?
    If so, for 8-10MP total display area would you aim for a single 7970, 680, 670 or is CF/SLI required after 6MP?

For gaming (Eyefinity and Surround), the monitors must have the same resolution. For non-full-screen multi-monitor, the resolutions may be different; however, if they are, most video cards will clock up to "low 3D" clocks, which raises idle noise and power consumption significantly. You can sum up height * width resolutions for how many megapixels your card is pushing out, but that's only relevant to workload for gaming.

Suggestions for full detail/AA are in post 2, after going over model numbers. AnandTech also benchmarks at 1920x1080x3 surround, and according to its numbers, then if you are willing to drop AA and often some detail, you can get 40+ FPS with a GeForce 680. For 3x2560x or such, you will definitely need more horsepower for a high-detail gaming experience.

E: Quote != edit

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast


Factory Factory posted:

For gaming (Eyefinity and Surround), the monitors must have the same resolution.

I don't think this is true - but it will force them all to the lowest common resolution. The native res doesn't stop you trying. (Although that would be silly).

Gunjin
Apr 27, 2004

Om nom nom

Movax posted:

As of this OP, it seems Adobe GPU acceleration has a thing for Nvidia cards is finally based on OpenCL in CS6 and will work with any reasonably beefy discrete card. CS4 through CS5.5 only support CUDA.

The only cards that do GPU acceleration with OpenCL instead of CUDA in CS6 are the HD 6750M and HD 6770M (1gb vRAM versions), and then only on OSX 10.7.x. Everyone else needs to use a CUDA card still.


Note that there is a way to "hack" an unsupported card into Adobe CS, so if you have a Mac Pro with a 5770 or 5870 on OSX 10.7.x you could theoretically add support for one of those cards, but I'd be wary of doing that in a production environment where stability is a primary concern.

Athropos
May 4, 2004


Very very impressive OP. Holy crap.

kuddles
Jul 16, 2006

Like a fist wrapped in blood...

Yeah, really good OP there. I can't really think of anything to add.

I was skeptical in the AMD thread of the review that made the Asus GTX 670 DirectCU II TOP out be be essentially a cheaper, quieter and cooler 680, but another review came out with the same results. I guess it was a blessing in disguise that I couldn't find a GTX 680 anywhere because now I'm set on picking up one of these beauties.

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down


Dogen, I highly advise you pick up an Asus GTX 670 DirectCU, overclock the crap out of it, and supply me with a very affordable GTX 580 to end my torment in Metro 2033. We can haggle on power supply needs. This is professional advice, I am a professional.

If as your avatar might suggest you like guitar stuff, I have some amazing gear we could work out as a trade... Think on it

spasticColon
Sep 22, 2004

In loving memory of Donald Pleasance

I ordered a HD7850 Tuesday and Amazon still hasn't shipped it. I guess that's what I get for selecting the free shipping.

The main reason I got one is that I saw in benchmarks the HD7850 runs Skyrim and BF3 a lot better than the 560Ti. Is there a specific reason for this?

Adbot
ADBOT LOVES YOU

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down


spasticColon posted:

I ordered a HD7850 Tuesday and Amazon still hasn't shipped it. I guess that's what I get for selecting the free shipping.

The main reason I got one is that I saw in benchmarks the HD7850 runs Skyrim and BF3 a lot better than the 560Ti. Is there a specific reason for this?

Yes and no - nVidia had issues with Skyrim from launch. There was a driver update that netted me a solid 30-40% performance increase in Skyrim on my GTX 580. So it could be driver related. Or, it could be that AMD/ATI's architecture is better for what Skyrim does than nVidia's. Or, it could be that AMD/ATI's drivers knock it outta the park on that engine, while nVidia's still struggle some by comparison.

For what it's worth, this is probably the answer to nearly every possible iteration of the question: "Why does ________ perform better than the competition in game __________?"

Barring known incompatibilities like the 560Ti's weird issue with Battlefield 3 and black textures despite attempts from the devs and nVidia's driver team to fix them, it usually just comes down to peculiar, idiosyncratic things that are difficult to pin down as being specifically because of this or that. Exception might be very high resolution gaming, or 3x/4x GPU gaming, where AMD/ATI's greater VRAM and multi-GPU scaling give them a distinct edge over nVidia for known reasons.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply
«264 »