GPU Megat[H]read - the cores of wrath grew heavy on the die that day

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > GPU Megat[H]read - the cores of wrath grew heavy on the die that day

«‹›2 »

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

movax posted:

Not sure actually, that's an interesting question. From a hardware perspective, I could see the GPUs recognizing that there is a SLI bridge special and changing the BARs they request appropriately.

BAR size is generally set in the GPU VBIOS nowadays, and it can't be changed after POST. Also, BARs on GPUs are barely used except to program the DMA controllers that actually do the work because otherwise you spend way too much CPU time interacting with the GPU. So every card contains basically the same data set in SLI/CF.

(important note: one PCI device can have more than one BAR, and there are fun alignment restrictions with BARs that may cause the actual physical address space consumed to be much greater than what you expect versus the sum of the size of the BARs)

# ¿ Oct 18, 2012 05:35

Adbot: ADBOT LOVES YOU

# ¿ May 19, 2024 10:18

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Jan posted:

What the hell is a TDR?

timeout detection and recovery in Vista and later, aka "your GPU or the GPU driver has done something bad so we restarted everything" and a message pops up

# ¿ Dec 17, 2012 08:29

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

In essence, this is Intel fighting back against Nvidia's Tegra 3 and other ARM-based SoCs, extended to attack Nvidia's core business. Though Intel is very dominant in x86 laptop-through-server processors, in the broader category of "compute silicon," they've got a lot of competition from ARM and its licensees on CPUs and SoCs (as well as IBM in consoles), from AMD and Nvidia on graphics and HPC (and, to a lesser extent, PowerVR et al.), from AMD on the growingly-popular good-enough multimedia CPUs/APUs/SoCs/whatevers, Samsung on NAND. Intel is big and dominant, but it won't stay that way through size alone; it's in the top spot of a competitive oligopoly, and they can be dethroned by someone else doing better.

The interesting thing to see will be if Intel tries to make a workstation IGP. That market is based 100% on driver quality and marketing agreements (there's a reason NV has owned 90% of that market for a decade), but if you wanted to hurt NV, cheap WS IGPs would be the way to do it. Bringing down Quadro margins would put a serious damper on NV's bottom line, and the stock price would take a serious beating accordingly.

# ¿ Jan 10, 2013 22:16

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

You mean like the Intel HD Graphics P4000 (PDF) found on Intel Xeon E3 v2 CPUs, with 13 ISV certifications?

Or, for that matter, the AMD FirePro APU?

Something that could actually take on the mid-range Quadro boards, not just the low-end NVS cards. For example, the Quadro 2000 is a $400 card based around a GF106--if Intel starts taking those kinds of margins away from NV, it would be ugly.

# ¿ Jan 11, 2013 02:28

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Athropos posted:

So in a nutshell, could a Titan be more powerful than my SLI 680s?

if you're doing DP, yes. if you're doing gaming, no. some of it is a major improvement for CUDA (e.g., CUDA streams are actually usable and predictable on GK110 instead of some weird invocation that you do to try to get any extra performance that happens to show up on every previous chip), some of it is stuff that might be usable (dynamic work creation).

the real point of this card, as far as I can tell, is not a gaming card; it's something that academics can buy in order to start developing on GK110 without having to spend $4k per board without giving them something that they could buy in any significant volume to use for deployments. it makes a lot of sense to do so, too.

# ¿ Feb 20, 2013 04:08

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

All of the GeForce 600 series is based on GPGPU-gimped high-level architecture. The GeForce 680's GK104 is more closely related to the 560 Ti than it is to the GeForce 580. The only compute-optimized Kepler card is Titan, based on the GK110 GPU.

that doesn't make GK104 bad at compute (except for DP), just makes it different to write code for

# ¿ Feb 27, 2013 03:22

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

The specifics are generally trade secrets, but in general, it's:

This is all true and accurate, but it misses the most important point:

0. Certification. Apps are tested with specific workstaion cards and workstation drivers, and if you find a bug you can get someone to actually care and probably fix it within a few weeks. That generally will not happen with consumer cards. Essentially, it's insurance. If you're not doing something that gets a huge speedup from workstation functionality that isn't available on consumer cards, workstation cards are worth buying if the cost of having a showstopper driver bug is worth significantly more than the price disparity.

Now, if you're doing GPU compute work for any length of time and not using ECC, you're going to have a bad time. But 3D stuff is generally more forgiving.

# ¿ Apr 9, 2013 05:44

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Un-l337-Pork posted:

Am I missing something about these charts (http://www.tomshardware.com/reviews/geforce-gtx-780-performance-review,3516-28.html) or does the 7970GE smoke the poo poo out of Titan for OpenCL performance?

Or is it just that nobody really uses OpenCL at the moment?

HalloKitty posted:

There's a reason all bitcoin mining rigs sport AMD GPUs.

There are basically two users of high-end GPU compute: HPC and bitcoin mining. In HPC, you need things like ECC, very fast double precision, tons of memory bandwidth, good tools, integration with languages like Fortran, BLAS/FFT libraries, etc. NV GPUs and CUDA have all of those things in spades; AMD cards and OpenCL generally do not. With bitcoin mining (or other crypto in general), you need lots of integer math and a barrel shifter. AMD cards provide this just through different architectural tradeoffs. This doesn't mean that the AMD cards would be faster in a lot of HPC workloads because of pipeline differences; the crypto workloads are very easy to optimize in order to max out an architecture, so the more ALUs on the AMD arch actually get used instead of sitting idle due to stalls or inability to provide enough independent work or whatever.

In HPC, nobody uses OpenCL because CUDA support is so much better on NV cards and AMD cards aren't in the running for deployments. Outside of HPC, OpenCL certainly hasn't been widely adopted outside of bitcoin; however, this is true of GPU compute in general (primarily because people don't have too many workloads that actually benefit from it outside of games, and games have their own GPU work to deal with). The weird perception that OpenCL is the up-and-coming thing is due to bitcoin and the weird notoriety of that particular use case, but it's as stagnant as OpenGL ever was in the Longs Peak erar.

However, the idea that you can run the same OpenCL benchmark and get some meaningful performance metric across a range of architectures is total bullshit. CUDA was never designed to be a performance portable programming model, OpenCL did less than nothing to improve on the CUDA programming model (actually made it worse in many ways), and as a result you simply can't write one kernel and expect it to run well on X different architectures (at least, not with anything more complicated than SAXPY). It would be very easy to take a real workload and tweak it very, very slightly to run best on one architecture or another for no obvious reason. Graphics suffers from this as well, but to a dramatically more limited extent for a lot of reasons.

# ¿ May 27, 2013 23:02

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Agreed posted:

nVidia is gonna get so much richer because of this. They're licensing... seriously? It's brilliant and a little bit crazy and I love it, haha, talk about a bold move.

Realistically, there are only two potential licensees. Neither one nets a ton of money.

Apple's an interesting possibility and would net a fair amount of money (it would probably kill Imagination in the process). Apple builds their own CPUs, has no problem building relatively enormous mobile chips, and doesn't care about margins nearly as much as other vendors, so it could take a next-gen Tegra part relatively early and put a low-clocked enormous die in a tablet. A5X in the iPad 3 was 2x the size of Tegra 3 because Apple can make the SoC cost back on the tablet; there's no SoC vendor that has to make money and a final OEM that has to make money). However, this isn't a ton of money, as Imagination's total revenue was ~$200M last year, and that's with owning every iPhone and iPad sold (along with a bunch of other processors).

Samsung is more difficult to understand because it doesn't build its own CPUs--it takes stock ARM cores. If Samsung continues to do that, why wouldn't they just use Tegra? Licensing a GPU core wouldn't net them anything meaningful there versus buying Tegra that I can see.

# ¿ Jun 19, 2013 06:08

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

PC LOAD LETTER posted:

Only big development in hardware that maybe applies here for a while is going to be AMD hUMA I think. What is MS supposed to be doing for DX12?

I've read all the HSA stuff released thus far--I don't see any way it can possibly work with discrete GPUs. also, considering half of it basically "hey WDDM is bad, we should do this magical thing instead that is less bad in this entirely hand-wavy way," yeah, not happening.

# ¿ Sep 24, 2013 01:49

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

Does anyone actually feel hyped by the 290X?

You mean you're not wetting yourself over 3D positional audio and a proprietary 3D API only used by specific games? What, was 1998 not good enough?

# ¿ Oct 18, 2013 06:25

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Rahu X posted:

As for what I gathered from this video, it seems that all parties aren't particularly thrilled at the existence of Mantle, but rather the possibility of Mantle ushering in changes to APIs to allow more low level access in general.

WDDM is designed for a really specific problem--3D accelerated desktop UIs--and all of its tradeoffs are built around that. This is why you can do things like "games generally don't crash when I alt-tab" and "I can see window previews when I alt-tab" and "badly behaved drivers don't cause BSODs." It's also why you get other things like "command buffer submission to the GPU takes forever" and "compute APIs are always going to be second-class citizens." WDDM was designed for NV40/G70 class hardware ten years ago, and it shows. If you remember back in the proverbial day, there was a proposal for WDDM 2.0 that was spectacularly unrealistic, like "all hardware must support instruction-level preemption" unrealistic (to my knowledge, no GPU supports instruction level preemption). MS finally added support for any sort of preemption in WDDM 1.2 (Win8), but they haven't done anything to address things like buffer queue overhead (not since they fixed something completely horrible in Vista with something less horrible in Win7), GPU page faulting, or shared memory machines.

The thing I'm most curious about with Mantle is how it will work alongside WDDM, because upon reflection and discussion with some similarly knowledgeable folks, none of us can figure out how you could get WDDM interoperability except in one of two ways:

1. a large static memory carve-out at boot and a second Mantle-specific device node, rendering into a D3D surface
2. only run on a platform that has a GPU with reasonable preemption (at least per-wavefront) and an AMD IOMMU

Of course, they could ship Mantle in a separate driver that blatantly circumvents WDDM and that they never attempt to get WHQL'd, but that seems unrealistic.

If you look at the HSA slides from Hot Chips, the driver they propose is definitely a response to the stagnancy of WDDM, but it's also mired in some unrealistic stuff (the idea that you can return from a GPU operation to a suspended user-mode process without entering the kernel is nonsense) and some pointless stuff; a standardized pushbuffer format was tried by MS briefly in the DX5/6 timeframe, I think, and it was a travesty that vendors all rebelled against.

(i know a lot about driver models, i should really write my own sometime)

# ¿ Oct 20, 2013 19:57

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Byolante posted:

If they really care about DP compute stuff, why aren't they buying Teslas or something?

you only buy Teslas if you care about multi-year reliability, service, ECC, things like that. by the time most of these consumer cards break, a slightly newer/faster chip will be out, and upgrading will pay for itself (assuming a boom).

# ¿ Dec 5, 2013 06:40

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Agreed posted:

Unrelated note, so it looks like I may be getting an Oculus Rift soon as a gift from a good friend who unfortunately gets motion sickness and headaches from the technology. I've been wanting to test the technology for a long time, really excited to have that become a real possibility. Not set in stone, depends on some other factors, but I will definitely be happy to answer any questions about it if anyone is curious enough to ask but not curious enough to drop $300 on an in-development product. I am mega stoked, I guess I need to get the Doom 3 re-release since that's one of the games that apparently is perfectly tuned for it. /

I got one pretty recently. I may have teared up a little bit the first time I used it because it was the first time I'd ever seen 3D (eye muscle disorder as a kid, lost normal stereo vision as a result). The resolution is fantastically bad, the lack of lateral tracking is incredibly annoying, there are very few apps, and I'm still almost completely sure that they're going to become the first new category of hardware that is required for PC gaming since 3D accelerators. It honestly reminds me of the first time I saw a Glide app.

# ¿ Dec 13, 2013 05:29

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

re: HPC workloads: no, NVIDIA generally runs the tables on AMD. better ECC, better compilers, better tools (Allinea DDT and TotalView just work), infinitely better libraries, FORTRAN support. the last two are what seal the deal. AMD is a non-entity in that market aside from the occasional Linpack stunt machine. it's a two-horse race between Phi and Tesla. Phi will probably win come Knights Landing on QPI, though.

# ¿ Dec 14, 2013 06:54

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

For those of you interested in Mantle, this thread on Beyond3D may be interesting (primarily about draw call performance and CPU work dispatch versus GPU work dispatch). Andrew Lauritzen is a graphics researcher at Intel who knows more about shadowing systems than maybe anyone else and an all around Very Smart Dude. Several other industry developers in there too.

# ¿ Dec 22, 2013 06:42

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Agreed posted:

Man, this thread rules. I thought this post was a little funny.

That is pretty much a textbook example of why standards don't mean nearly as much as people claim, especially when it comes to GPUs.

# ¿ Dec 22, 2013 20:53

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Agreed posted:

(parenthetical thought bubble two: AMD and nVidia need to drop the bullshit and get together on hammering out a compliant methodology for fixing D3D's issues. OpenGL has way, way fewer issues but the last OpenGL game to try to do anything cool was RAGE and it launched to, deservedly, no real acclaim and had probably the worst lighting and object texture issues of any game I've played that was made after Doom 3))

they can't, Microsoft has to and has shown no interest in doing such.

# ¿ Dec 30, 2013 04:57

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Agreed posted:

So what's the consumer equivalent of torches and pitchforks to Microsoft's castle?

(wooh long post)

Let's be real--even though OGL is doing some stuff, it still sucks. There's pretty much nothing revolutionary going on, it's minor improvements at best and further evidence that nobody has a clue what the actual successor to GL will be. (or more specifically, the successor to the classic GL pipeline model.)

If you put your Remembering Hats on and think back to 2007 or 2008, Larrabee actually tried to do something about this. We can mock Larrabee all we want ("it's what happens when software guys design hardware" is what I heard a lot at the time, and it's pretty true), but I will give them props for trying something to get beyond the standard pipeline. (if you don't actually know much about Larrabee, go read everything linked here)

Larrabee failed for two reasons:

1. it sucked tremendously at being a D3D/OGL device. A big part of this was their own naivete ("pfft we don't need a ROP, we'll do it in software"). I think Forsyth in his talk at Stanford mentions that they have a large number of different pipelines implemented for different apps that all have different performance, with no way to tell a priori which way would be fastest. The software costs would be astronomical.

2. it didn't get a console win. lots of reasons for this (Intel being largely indifferent, everybody being really gunshy about exotic architectures after Cell, really high power consumption), but Larrabee only had a chance at doing something interesting if it could get a console win in order to gain widespread developer interest/acceptance/traction.

okay, I keep talking about "doing something interesting." what am I talking about? if you go back and look at the graphics pipeline throughout history, it hasn't changed all that much. sure, we've added pixel, vertex, geometry, and now compute shaders in both D3D and OGL. there's tessellation too. lots of new stuff! but there's still fundamentally a pretty straightforward pipeline that ends at the ROP. you can insert interesting things in various places (and people definitely do, see the siggraph papers any given year), but nobody is able to build any sort of arbitrary pipelines with reasonable efficiency. Larrabee's goal was to be able to throw all of the existing model out the window and let sufficiently smart developers implement whatever. want some sparse voxel octree nonsense? sure. micropolys everywhere? also okay. something really weird? yeah fine.

(for more on this, read Kayvon Fatahalion's dissertation or his SIGGRAPH talk. actually, just read everything he writes. he's one of Pat Hanrahan's former PhD students, like the one guy that invented CUDA at NVIDIA and the other guy that was one of the main GCN architects, and he is ludicrously smart.)

similarly: in the GPU compute realm, nobody's figured out anything to do with the ROP. it is a vestigial weird graphics thing that has no interface that looks anything like a normal programming language and nobody knows how to expose it. if somebody did figure out how to expose it, you could probably end up writing reasonable graphics pipelines in CUDA/OCL/something else. but nobody has, and now that CUDA is purely focused on HPC and OpenCL is focused (insofar as OpenCL has ever been focused at all) on mobile, I don't know that anyone will. (well OCL won't, the CPU and DSP guys won't let the GPU guys enable specialized hardware that they can't emulate quickly)

obviously, Microsoft could improve their driver model without addressing these issues (as Mantle tries to do), but there's little reason for them to do so. despite developers whining to the contrary, their driver model is largely fine for its intended use, and it's not clear that existing hardware could support anything better.

so if MS doesn't care right now, who's left? Apple could do something Mantle-like on iOS, but they won't until they ship only Rogue-based platforms for a long time (also not sure if it even makes sense there). I doubt they really care on desktop based on their rate of GL version adoption. maybe Android at some point.

# ¿ Dec 30, 2013 07:45

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Agreed posted:

Read everything this poster says and everything linked and become enlightened or just take it as "Intel's coming up, but basically it's going to be those three for the foreseeable future yeah."

lmao, thanks Agreed. I thought I posted in here more over the past year. apparently you're trying to make me write long posts, oh well.

so okay, a brief overview of the GPU market. what does it take to build a GPU?

- the right hardware architects. these people are employed overwhelmingly at one of four companies: NVIDIA, AMD, Intel, Qualcomm (acquired ATI's old mobile GPU division back in they day, they've picked up most of the fleeing hardware people from AMD in the past year). there are other players too, like Imagination (PowerVR) and ARM (Mali). these people can and are poached between each other, and that's really the only way to acquire them besides training your own. successful GPUs are a ludicrous collection of special-purpose units, and knowing the right ways to build those units can only be learned from what's come before. the bottom line is that there just aren't that many of these people.

- the right software people. all GPUs are obviously very dependent on drivers, so for (let's say) a desktop GPU you need essentially three classes of people: the extremely low-level people that can deal with the chip directly, the slightly higher-level people that can implement the kernel-mode interface with WDDM, and everyone else that is required to implement the APIs that sit on top of that. the third class in particular is also very hard to find people for, because OpenGL and Direct3D are really complicated standards with eighty billion corner cases. this also means both driver people and compiler people; keep in mind that developing a good optimizing compiler for a GPU is one of the hardest compiler tasks imaginable because GPU ISAs are so complex. (both AMD and NV GPUs are effectively variable register count, where using more registers results in lower potential parallelism for potentially higher per-warp/wavefront performance. have fun making that tradeoff automatically!)

- emulators of some sort. these chips are big, you need some way to run things before tapeout. all those software people won't be able to do too much without it, too.

- access to a fab, specifically TSMC or Intel on the latest process (GloFo is way behind and probably unusable).

you need 40-50 hardware people and 100+ software people, probably three years for both (for your first chip if you're starting from scratch). let's assume each person gets $150k (realistically they cost a lot more than that but whatever). you're at ~70M before you even have a chance at making money. add another $5M for emulators of some form (those things ain't cheap).

okay, you've spent $75M and are now ready to tape out. get another $30-50M (tapeout costs are already insane for 28nm at TSMC and only getting more expensive!), and now you have a product that, assuming it actually works, you can sell, assuming you are able to sell into OEMs, convince game devs to actually test with it so the experience isn't trash out of the gate, solve various compatibility woes, and never run out of money at any point during this time. oh, and you better have a follow-up in 18 months that is significantly better, or you're out of business and this was all pointless.

just to make things worse, you have much bigger companies doing exactly the same things, and your only hope is that you somehow offer better perf/$ and perf/W. there exists no differentiating feature in this market anymore, so those are your metrics. meanwhile, the market continues to get smaller (IGPs are getting better, you're not going to compete with Intel on that), so you have to target only the very high-end, a small and extremely discriminating market.

I've vastly underestimated the time and resource requirements here, probably by at the very least a factor of two, but you get the point. despite all we talk about them, for the vast majority of the market, desktop GPUs are a commodity. it's a race to the bottom, there's very little money to be made. HPC (the compute market) is a high-margin low-volume market that leverages a lot of the same tech, so that helps, but it's not that much overall.

now, there's traditionally a wildcard in this equation: mobile. that's another post altogether, but it doesn't change the desktop GPU equation too much. (fun fact: I can't think of any GPU eventually for sale that started from scratch since 2000, which is when what is now Mali was originated)

edit: fun fact number two: I think it's literally impossible to build a GPU without violating patents from at least NVIDIA, AMD, and probably Intel. they don't go after each other because mutually assured destruction, but they will smash/acquire any new player if it looks to be anything at all interesting.

Professor Science fucked around with this message at 10:07 on Dec 31, 2013

# ¿ Dec 31, 2013 09:54

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

Only catch was that some shader effects would have to be rewritten (so, e.g., the tech demo didn't have lightsaber effects).

that's a pretty significant caveat--anything that makes the asset creation process more taxing is going to be viewed with huge skepticism because of the potential to inflate production costs.

edit: LucasArts used to have a pretty serious engine department, Marco Salvi (I think he was either graphics or engine lead on Heavenly Sword back in the proverbial day) worked there around that time. he's now one of the advanced rendering folks at Intel, publishing lots of papers about cool tricks you can do with Gen. (and before you poo-poo that it's Intel and they don't care about graphics, Tomas Akenine-Moller is one of the guys in that department, and he wrote Real-Time Rendering, a book that literally everyone has read.)

Professor Science fucked around with this message at 00:53 on Jan 21, 2014

# ¿ Jan 21, 2014 00:45

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

beejay posted:

I would agree but BF4 is a pretty big name and they did it. Developers put PhysX in games. It probably won't be widespread for a while if ever but I think it will catch on decently.

they did it because repi has been pushing for a new 3D API for years and years and years and AMD probably paid a lot of money (that's how developer relations works in the game industry, PhysX is the same)

# ¿ Feb 2, 2014 00:01

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Phuzun posted:

There is a MemtestCL that works on the gpu memory.
https://simtk.org/home/memtest/

the associated paper is interesting in that they find lots of soft errors independent of temperature or overclocking, but overclocking definitely exacerbated the problems.

# ¿ Feb 4, 2014 08:43

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

It might be a Big-Maxwell-only feature, since it's compute-oriented. The idea behind it is that there are some lightly-threaded tasks that are rear end-slow on a GPU, but PCIe being what it is, it's slower to send those back to the CPU than just process them locally on a wimpy ARM core.

I think you can trace this idea back several years to one article written during one of JHH's GTC presentations and it's metastasized since. it never made sense and it continues to make no sense because memory latency.

# ¿ Mar 23, 2014 20:33

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Alereon posted:

... capabilities that are disabled on gaming cards costs a huge number of transistors that could either not be spent (lowering costs and raising clock headroom), or could be spent on things that do improve gaming performance.

just wanted to point out one thing here: GK110 is 20W higher than GK104 despite having 50% more memory and 2x the transistors. I actually think the extra gig of GDDR5 makes up the bulk of that disparity, but the thing to keep in mind is that the vast majority of those extra transistors in GK110 are just *off*. the FP64 stuff doesn't even get powered on, which means there's no static leakage (which is a huge chunk of your power costs at 28nm). it's not like they're giving up 3.5B transistors that could be powered and actively doing something useful for an app.

edit: also read this http://www.highperformancegraphics.org/previous/www_2012/media/Hot3D/HPG2012_Hot3D_NVIDIA.pdf

Professor Science fucked around with this message at 23:57 on Apr 10, 2014

# ¿ Apr 10, 2014 23:47

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

re: PCIe--bandwidth-wise, it's not a big deal for games because even if you double or quadruple PCIe speeds you're still an order of magnitude off from GDDR5 so falling off the fast path is still catastrophic. the reason to move away from PCIe in some spaces is that for serious compute, the latency is horrific (multi-microsecond, when for some exotic interconnects like Aries or the one in Blue Gene you're talking maybe <1us for a packet to reach a remote node), which directly impacts strong scaling, which (along with how much power/cooling you have) directly determines how big your cluster can be. if you've got something better between your GPU and your interconnect, you might be able to get better latency, which means bigger clusters with your processors.

also PCIe doesn't have cache coherence which is a Big Deal for compute, but that's more of a programming model concern than a "poo poo goes fast" concern.

# ¿ Apr 25, 2014 05:49

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

A is NV, B is Intel, C1 is Intel Linux, and C2 is Intel Windows.

# ¿ May 20, 2014 02:44

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

oops, B is AMD. (I wish I ingested something, maybe I'm just getting sick)

# ¿ May 20, 2014 06:03

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

deimos posted:

There are, but programmers have to program for it, thus my comment. Moving cache coherency and transparently moving multi-core algorithms to the GPU will be a boon.

uh, unified memory definitely does not imply transparently moving multicore algorithms to the GPU

# ¿ May 24, 2014 02:42

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

deimos posted:

Err that'll teach me to try to simplify things without re-reading what I type, and this might be wrong, but... Doesn't cache coherency make it easier to write multi thread/core algorithms that can offload to GPU?

Maybe transparent isn't the word to use.

easier, sort of--the algorithm itself may be easier to get up and running in the first place, but that's not any sort of guarantee that it will actually run well (and getting it to run well will likely require as much or almost as much work as is done in large applications today in order to make them amenable to GPUs). keep in mind that most applications you care about that have big obvious data-parallel workloads are also games and therefore have a lot of graphics work to put on the GPU in the first place. outside of games and specific niche markets (very big Photoshop operations, serious video editing), there's probably not a real market for desktop compute (games will always be a weird edge case, where traditional D3D/GL workloads trade off with DirectCompute/GL compute workloads for runtime).

HPC will obviously continue down the accelerator/coprocessor path, and there is increasing adoption in mobile.

Professor Science fucked around with this message at 08:28 on May 24, 2014

# ¿ May 24, 2014 08:19

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

cough I remember the last time somebody thought a loud cooler or high temps was a feature.

# ¿ Jun 7, 2014 03:09

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

DrDork posted:

Also, a lot of the people who would care about such high compute power are going to be financial or scientific applications, where the lack of ECC memory on the Z is going to be a problem. I just struggle to see who, exactly, the card was aimed at.

CUDA developers. CUDA won in that market, and you don't need ECC if you're just building apps and testing perf before you deploy on a Tesla cluster.

# ¿ Jun 18, 2014 06:56

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Arzachel posted:

Why wouldn't you go for one or two Titans/Blacks then?

having two GPUs behind a PLX switch noticeably improves CUDA P2P behavior

# ¿ Jun 19, 2014 02:26

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

GPU PhysX uses CUDA to do the physics processing, and there is indeed a context switch involved. The context switch's performance impact is so great that you can improve average and minimum framerates more with a low-power dedicated PhysX card than with an SLI pairing.

For OpenCL... eh. Some games use DirectCompute, though - IIRC, Civ 5 uses DirectCompute to unpack compressed textures, rather than putting that on the CPU. E: Oh, Tomb Raider's TressFX does hair calculations through DirectCompute, too.

AFAIK (and I'm pretty well informed), no game uses OpenCL (at least on Windows, presumably also on Linux, maybe something on Mac, especially via a core system API?), the only thing besides PhysX that uses CUDA is Just Cause 2 for water. Lots and lots of stuff use DirectCompute, although it's usually as a fancy rendering pass ("sweet rear end lighting model that can't be implemented as a pixel shader with any sort of efficiency") rather than what you would generally think of as a compute workload.

# ¿ Jun 19, 2014 05:24

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Factory Factory posted:

On the Team Green side, a research paper for scientific GPU computing let slip some expected performance-per-watt figures for Nvidia's roadmap. We're apparently about to see a Big Kepler refresh as GK210, and Big Maxwell GM200 should arrive by the end of the year (at least in Teslas). After that, Big Pascal, GP100, is slated for the beginning of 2016. The FLOPS/watt statistic in the paper suggests that Big Pascal will have three times the performance per watt of current Big Kepler.

I wouldn't read anything into this. Looks like academics are extrapolating based on JHH's presentations at GTC, which are about as reliable as throwing darts at the wall.

# ¿ Aug 15, 2014 05:40

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

DrDork posted:

There was a big to-do about it because one of the ways NVidia managed to get ahead was (reportedly, anyhow) basically a massive amount of IP infringement on 3DFX's technologies, like SLI. When they started to go belly-up, NVidia bought them and adroitly avoided any risk of having to deal with a protracted court case.

3dfx died in 2000, long before NV had any sort of SLI support (introduced in NV4x in 2004, I think). 3dfx died because they were incredibly late to hardware T&L, thought Glide would become a permanent monopoly and didn't anticipate D3D/GL correctly, but most importantly because of the disastrous acquisition of STB and the attempt to become a vertically-integrated GPU supplier.

# ¿ Sep 7, 2014 05:31

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

yeah I'm buying at least a 970 the instant they're in stock anywhere. if the spirit moves me, then I'll just buy a 980 :unsmith:

# ¿ Sep 19, 2014 04:46

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

do we know when these are going on sale? haven't seen any at the usual haunts and didn't see any mention in reviews.

# ¿ Sep 19, 2014 05:54

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

970s and 980s are showing up on Amazon--just ordered a 970. 2-5 week shipping or out of stock on all, though.

# ¿ Sep 19, 2014 08:01

Adbot: ADBOT LOVES YOU

# ¿ May 19, 2024 10:18

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

Whoops, just saw the "eVGA ACX cooler is bad" link so now I'm trying to cancel my order of that and get the ASUS one. We'll see how well that works...

# ¿ Sep 19, 2014 16:16

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > GPU Megat[H]read - the cores of wrath grew heavy on the die that day

«‹›2 »