Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Cygni posted:

I imagine Nvidia is as fully aware of the manufacturing constraints and future performance needs as AMD is.

Bingo.

The problem in general is that all of this adds latency. It's still more performant to have giant monolithic dies if you can. The more slices you cut your power into, the worse the issues.

Also, I think AMD is going to have even more trouble because their uarch has a lot more inter-engine interaction (and cache-coherency needs although that might be software-configurable), whereas NVIDIA is using relatively "dumb" hardware. For NVIDIA having your SMX engines living on different dies isn't much of an issue. I think AMD has some pretty fundamental scalability issues with GCN, every time they have tried going past Hawaii size so far it's been a trainwreck, and I keep reading about how the uarch just isn't designed for more than 4 stream engines inside or something (can't evaluate that without more detail).

NVIDIA is going with 4-die packages in their paper. With a 100mm^2 die that only leaves you with Polaris 10-plus-a-bit. You still need some fairly big dies if you want high-end gaming performance.

Paul MaudDib fucked around with this message at 23:12 on Jul 7, 2017

Adbot
ADBOT LOVES YOU

wargames
Mar 16, 2008

official yospos cat censor
AMD GPU needs a blank sheet for navi and forget Hawaii ever existed.

EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!

NewFatMike posted:

Yeah, I man a TR/Epyc chip are really just big in general, but most of the cost from those are still silicon, right? The metal for the pins, package, substrate are all still relatively cheap, right? A big rear end package that's still cheap and performs really well is still a desirable thing.

If it turned out the Raven Ridge (or next generation) APUs were larger packages than usual (not likely since they still have to fit on AM4 mobos and cooling solutions) as long as they still hit price/performance/power targets the world keeps spinning.

What'd be really neat is if at some point we were able to customize CPU/GPU CCX combinations. If we got to work inside the TR4 footprint but could get some area based combination of CCXs, I'd go bananas.

I wonder how repurposable a TR4 socket is, like could it in theory accept a socketable GPU? Imagine AMD selling EPYC 4 socket boards with Zen2, and the sockets accept either a Navi cluster or a Zen2 cluster. I could see the PCB being insanely thick and thus expensive, but you could get a lot of flexibility out of something like that, especially if AMD could deliver future socket compatibility. It makes buying into the AMD GPU ecosystem easier if you've bought into AMDs CPU's at all, and they could still provide PCIE slots for backwards compatibility or even further scalability.

Paul MaudDib posted:

Bingo.

The problem in general is that all of this adds latency. It's still more performant to have giant monolithic dies if you can. The more slices you cut your power into, the worse the issues.

Also, I think AMD is going to have even more trouble because their uarch has a lot more inter-engine interaction (and cache-coherency needs although that might be software-configurable), whereas NVIDIA is using relatively "dumb" hardware. For NVIDIA having your SMX engines living on different dies isn't much of an issue. I think AMD has some pretty fundamental scalability issues with GCN, every time they have tried going past Hawaii size so far it's been a trainwreck, and I keep reading about how the uarch just isn't designed for more than 4 stream engines inside or something (can't evaluate that without more detail).

NVIDIA is going with 4-die packages in their paper. With a 100mm^2 die that only leaves you with Polaris 10-plus-a-bit. You still need some fairly big dies if you want high-end gaming performance.

Honestly this seems to be a thing for AMD, they've never had a big die that did well. I don't really see Hawaii as a big die, it's cutting it close but I tend to think of them as around 480-600mm˛. AMD's successes in GPUs were also predicated on Nvidia resting on Tesla uarch too long and Fermi being a disaster (oh god why does that sound like GCN and Vega). Guess this makes AMD's Navi thier Kepler, and maybe Navi2 their Maxwell :v:. See ya guys in 4 years!

PC LOAD LETTER
May 23, 2005
WTF?!

NewFatMike posted:

Don't you think they might target APU GPU dies and move up from that like the Zen CCX?
It'd be really cool if they did since they could pull of a very high performance APU if they wanted to but manufacturing costs would be big problem. Such a APU would have to be relatively high price to make it work financially. Market for that sort of thing might end up being fairly small.

Cygni posted:

GPUs are theoretically more suited to multiple die packages too due to the parallel workload.
Yes absolutely. The issue will be bandwidth not latency for making multiple GPU dies work together. I don't actually know what bandwidth numbers exactly would be needed to make it work but if Epyc is any indicator it looks to me like they'll be able to put north of 100GB/s (not Gb) per die in to making it work. That seems fairly respectable to me.

Cygni posted:

Epyc's latency for going between the dies is fairly atrocious, and AMDs way of addressing that was essentially working around that reality (just like Intel did with the Pentium D).
Have the inter die, MCM and off package latency numbers been released? I thought AMD was still being cagey there and that is what will really matter more for Epyc. These are the best numbers I know of still. Inter CCX latency is known and is indeed higher than it should be but it still comes off looking a heck of a lot better than the Pentium D there (which apparently was more than double the latency of Operton inter core latency of the time...that was legit a 'oh poo poo' reaction there by Intel putting out a product like that). Given the way most software seems to do OK with it by default on Ryzen, or needing very little work to work well with it, maybe it'd be reasonable to say AMD did a decent enough job there rather than comparing to the shitshow that was Pentium D.

My understanding with a ccNUMA set up like AMD appears to be using is that so long as latency on each inter core/die/package/whatever bus hovers around what you'd expect from main system RAM then you'd be OK. Then it just comes down to bandwidth and the number of hops necessary between dies and it seems like they've done a OK job on bandwidth and minimzing hops on Epyc with the sheer number of buses going on there. Proof will be in the pudding so to speak but nothing about it looks actually bad so far.

Cygni posted:

I mean heck, the concept you're describing for Navi is basically VSA-100.
Pretty much. I had a V5 5500 back in the day too. The concept of stitching multiple GPU's/dies together to make them work like one big machine is certainly nothing new but the approach AMD is taking is for the GPU world.

Cygni posted:

The problem Navi faces is the same one AMD has faced for years: Nvidia isn't just sitting around waiting for them to catch up. Volta is already at the reticle limit of a 12nm process, so I imagine Nvidia is as fully aware of the manufacturing constraints and future performance needs as AMD is.
AMD's problem to me is one more of execution. They have great ideas but they seem to have huge problems implementing them properly or getting them out on time. But fundamentally nothing about what will be used in Navi to connect the GPU dies should differ all that much with what is being used in Epyc/TR. So if Epyc/TR turn out fine I don't think its unreasonable to assume Navi will be solid. Even if the actual GPU die itself 'only' gets around mid range-ish performance 2-4 of them together with near perfect scaling should be a whole lot of performance even vs a huge die NV product.

GRINDCORE MEGGIDO
Feb 28, 1985


If RTG don't sort their poo poo out and nVidia launch an MCM GPU before Navi, it'll just be cosmically sad.

EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!
AMD's execution comes down to a sharp disconnect between marketing and engineering departments, lovely marketing, and not enough money for marketing and engineering combined. If Ryzen and Epyc do well you'll first see the changes in marketing, and then maybe in 2019-2021 for products.

Also apparently AMD is going on a hiring bonanza for marketing and engineering positions, so uh, guess things are in fact going well?

wargames
Mar 16, 2008

official yospos cat censor

FaustianQ posted:

AMD's execution comes down to a sharp disconnect between marketing and engineering departments, lovely marketing, and not enough money for marketing and engineering combined. If Ryzen and Epyc do well you'll first see the changes in marketing, and then maybe in 2019-2021 for products.

Also apparently AMD is going on a hiring bonanza for marketing and engineering positions, so uh, guess things are in fact going well?

with stock prices up and ryzen generating enough cash to keep the debters off their back, turn the rest of the cash into much needed company investments.

PC LOAD LETTER
May 23, 2005
WTF?!

GRINDCORE MEGGIDO posted:

If RTG don't sort their poo poo out and nVidia launch an MCM GPU before Navi, it'll just be cosmically sad.
Navi is supposedly a late 2018/early 2019 product and they've been working on it for years. We're only now hearing about NV getting in on the multi GPU die + MCM approach but its quite possible they've been working on it quietly for some time as well so who knows when its supposed to be out.

FaustianQ posted:

not enough money for marketing and engineering combined.
I don't think marketing has any real say in what gets developed there or how it gets done either. Even back in the K8 days, when AMD was doing good financially and all, there were numerous rumors of problems at AMD. I think they ended up scrapping entirely whatever K9 or K10 was supposed to be originally and that is why they ending up doing what amounted to revisions of K8 for longer than they should've.

I really don't know why AMD keeps having these issues, and execution has been a problem for years and years there, but whenever it comes down to do or die they usually seem able to pull something off. To me that is strongly suggestive of a management issue and not a engineering or marketing one. Upper management has seen a whole lot of turnover there so maybe we won't see a repeat of past behaviors. Have to wait n' see....

EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!

PC LOAD LETTER posted:

I don't think marketing has any real say in what gets developed there or how it gets done either. Even back in the K8 days, when AMD was doing good financially and all, there were numerous rumors of problems at AMD. I think they ended up scrapping entirely whatever K9 or K10 was supposed to be originally and that is why they ending up doing what amounted to revisions of K8 for longer than they should've.

I really don't know why AMD keeps having these issues, and execution has been a problem for years and years there, but whenever it comes down to do or die they usually seem able to pull something off. To me that is strongly suggestive of a management issue and not a engineering or marketing one. Upper management has seen a whole lot of turnover there so maybe we won't see a repeat of past behaviors. Have to wait n' see....

Maybe I was misunderstanding your argument, but I was thinking about marketing signing checks the engineers can't cash. Stuff like, 1.5Ghz 4096SP Vega 10XTX @ 225W, vs current reality, or the over promises on Barcelona and Bulldozer (performance, release date, etc.). Sometimes it's a legit fab problem (glares at Glofo's 45nm and 32nm process) that's out of AMD's hands though.

But yea I can definitely see how past management was a problem, esp. w/r/t Bulldozer for instance. As soon as they got back the internals on that they should have kept it exclusively as a server options and pushed forward on iterations of K8 until Zen was ready (based on release dates, it's likely as soon as they got the stuff back on Bulldozer, plans for Zen were in motion) because holy lol Bulldozer was godawful until like, Steamroller and that was just getting back to parity.

PC LOAD LETTER
May 23, 2005
WTF?!

FaustianQ posted:

but I was thinking about marketing signing checks the engineers can't cash. Stuff like, 1.5Ghz 4096SP Vega 10XTX @ 225W, vs current reality, or the over promises on Barcelona and Bulldozer (performance, release date, etc.).
I don't think marketing gets to say that stuff without the VIP's go-ahead so I'd attribute those issues to leadership too rather than marketing per se. If marketing is saying stuff like that and VIP's/management isn't aware of what they're planning to say beforehand than that would be catastrophic incompetence of both their marketing and VIP's/management to me.

I don't work at AMD so I have no firsthand knowledge of how they do things there but every place I've ever worked before, or ever heard of, the management had firm control over marketing stooges so that is why I'm looking at things that way.

FaustianQ posted:

But yea I can definitely see how past management was a problem, esp. w/r/t Bulldozer for instance.
I think with Zen AMD might finally be able to shake off the remaining taint of Ruiz on the CPU side of things so maybe they'll at least stay competitive. Current upper management does seem to be more focused on the practical side of things to me.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Geekbench seems ridiculously useless. I've been looking up my current CPU on it to compare it to these supposed TR 1950X results, and within the results, there's huge disparities, like a presumably stock clocked 5820K returning higher results than severely overclocked ones. (Or at least the app seems to be royally stupid about recording clock speeds.)

incoherent
Apr 24, 2004

01010100011010000111001
00110100101101100011011
000110010101110010

wargames posted:

with stock prices up and ryzen generating enough cash to keep the debters off their back, turn the rest of the cash into much needed company investments.

Or, ya know, pay a dividend. ever.

Canned Sunshine
Nov 20, 2005

CAUTION: POST QUALITY UNDER CONSTRUCTION



incoherent posted:

Or, ya know, pay a dividend. ever.

Dividends are for companies in good financial situations. That's not AMD.

fishmech
Jul 16, 2006

by VideoGames
Salad Prong

SourKraut posted:

Dividends are for companies in good financial situations. That's not AMD.

AMD hasn't appeared to have paid a dividend, ever, at least since January 13, 1978 when google starts tracking their stock. Surely you wouldn't count them as never being in a good financial situation since 1978?

Canned Sunshine
Nov 20, 2005

CAUTION: POST QUALITY UNDER CONSTRUCTION



fishmech posted:

AMD hasn't appeared to have paid a dividend, ever, at least since January 13, 1978 when google starts tracking their stock. Surely you wouldn't count them as never being in a good financial situation since 1978?

We'd have to know what their cash and liquidity situation has been during that time. Companies shouldn't pay dividends whenever they happen to be in the black. The closest I'd imagine they came to being in a good position to do so was in the mid-2000s and instead they bought ATI which is worth a whole separate discussion. There are plenty of much more stable companies who've never paid dividends, and I think most would rather see AMD survive then try to appease some armchair stock investors.

Anime Schoolgirl
Nov 28, 2002

FaustianQ posted:

I'm now wondering if it'd be possible to MCM two Raven Ridge dies for the TR4 platform. 8C/16T, 1408SP iGPU, 150W, Quad memory.
Put that poo poo in mini-STX (and 7mm motherboard thickness) and we're in business

EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!

FaustianQ posted:

TR4 platform

A whole department of Engineers just leaped from a building, I hope you're happy.

EmpyreanFlux fucked around with this message at 02:45 on Jul 9, 2017

SwissArmyDruid
Feb 14, 2014

by sebmojo
ASRock didn't want them anyways, if they weren't licking their chops at the idea of that, they clearly weren't cut out for ASRock's mad science to begin with.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
So about MCM latencies of TR/EPYC, I suppose it would have been better to have a memory controller as separate entity on the IF and have the CCX groups be autonomous?

TheCoach
Mar 11, 2014
So I got myself an R7 1700, loosened the timings a bit and the ram is running @ 3200 (wasn't even on the QVL), system is rock solid everything is fiiiiiiine.
GF got my i5 4570 system to finally replace her E8200 that was beyond inadequate.

NewFatMike
Jun 11, 2015

I am also extremely happy with my R7 1700. It is A Good Chip.

Fingers crossed Raven Ridge is also good.

PC LOAD LETTER
May 23, 2005
WTF?!

Combat Pretzel posted:

So about MCM latencies of TR/EPYC, I suppose it would have been better to have a memory controller as separate entity on the IF and have the CCX groups be autonomous?
Maaaybe if they were doing a server specific arch. but then they wouldn't be able to use a "1 die" strategy like they can now which allows them to scavenge dies that aren't suitable for server chips and put them into either TR or Ryzen as needed. There is probably virtually no wasted dies at this point so manufacturing losses are greatly minimized.

I'm not a CPU designer but I think the approach you're talking about makes more sense if there are more hops between dies OR if there are scaling issues with high(er) core counts with their current approach (I have no idea if there are).

Cygni
Nov 12, 2005

raring to post

Combat Pretzel posted:

So about MCM latencies of TR/EPYC, I suppose it would have been better to have a memory controller as separate entity on the IF and have the CCX groups be autonomous?

My understanding from the ServeTheHome coverage is that hitting infinity fabric is a bad thing, so adding that to every single memory call probably wouldn't be great for latency, I imagine.

As it is, 2 socket Epyc has four options when stuff is in memory:

Local to the core (great latency, great bandwidth),
One hop on IF away through another core on the package (worse latency, same bandwidth),
One hop on IF away through a core on the other package (worse latency, worse bandwidth),
Two hops on IF away through a core on the other package (worst latency, worse bandwidth).

An on package memory controller would basically cut the top and bottom cases off, and make all the calls the middle two. If NUMA aware OS didn't exist, that might be worth it. But with NUMA, and most calls being the first two options, the current solution on Epyc is probably much more efficient.

But I'm just an armchair dork, that could all be wrong.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
From all I've read, it sounds like cross-CCX L3 cache accesses are the royal pain in the rear end, and probably one cause for funny latencies. Having a memory controller, maybe with some of its own cache, as separate entity on the IF, and disabling cross-CCX L3 accesses, things might be less bad? Just sounds sorta idiotic to access L3 cache on another CCX at memory speeds. Might as well just hit the memory directly.

I'm also just armchairing.

Cygni posted:

If NUMA aware OS didn't exist, that might be worth it. But with NUMA, and most calls being the first two options, the current solution on Epyc is probably much more efficient.
NUMA is nice for workloads that can wait a while. But on things like gaming where you need to get frames out as fast as possible, if for some reason there's no thread scheduling capacities for the CCX that handles the memory region most of the thread's poo poo is allocated in, there's a problem.

I suppose it doesn't matter so much for the current bunch of Ryzens, since as you say, the bandwidth is high between a pair of CCXs, but I'm eyeing at TR, and it sounds like I don't want it, if gaming is part of the workload for it.

Combat Pretzel fucked around with this message at 00:19 on Jul 10, 2017

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
I'm curious if the current version of InfinityFabric phy can support PCIe Gen4 speeds (8GHz, 16GT/s), because it would be pretty impressive if they could jump right into the Gen4 ecosystem. There's a few new features they'd have to implement to get to the Gen4 spec of course.

Have there been any blurbs on how fast the IF links can go? I've seen things like "512Gps for GPUs" but it doesn't say the width.

wargames
Mar 16, 2008

official yospos cat censor

priznat posted:

I'm curious if the current version of InfinityFabric phy can support PCIe Gen4 speeds (8GHz, 16GT/s), because it would be pretty impressive if they could jump right into the Gen4 ecosystem. There's a few new features they'd have to implement to get to the Gen4 spec of course.

Have there been any blurbs on how fast the IF links can go? I've seen things like "512Gps for GPUs" but it doesn't say the width.

It doesn't need to, remember AM4 is suppose to be supported for 5 years.

Sinestro
Oct 31, 2010

The perfect day needs the perfect set of wheels.

priznat posted:

I'm curious if the current version of InfinityFabric phy can support PCIe Gen4 speeds (8GHz, 16GT/s), because it would be pretty impressive if they could jump right into the Gen4 ecosystem. There's a few new features they'd have to implement to get to the Gen4 spec of course.

Have there been any blurbs on how fast the IF links can go? I've seen things like "512Gps for GPUs" but it doesn't say the width.

It's 32 bytes wide, so it's exactly that fast. 16GT/s * 32 bytes/T = 512 GB/s.

Cygni
Nov 12, 2005

raring to post

Combat Pretzel posted:

NUMA is nice for workloads that can wait a while. But on things like gaming where you need to get frames out as fast as possible, if for some reason there's no thread scheduling capacities for the CCX that handles the memory region most of the thread's poo poo is allocated in, there's a problem.

I suppose it doesn't matter so much for the current bunch of Ryzens, since as you say, the bandwidth is high between a pair of CCXs, but I'm eyeing at TR, and it sounds like I don't want it, if gaming is part of the workload for it.

Yeah, honestly, I don't expect TR single client gaming performance to be that notable. There are barely any games that even use more than 4 threads, let alone 32 threads, hence the R5 1600X and R7 1800X being basically identical in games. So I imagine a NUMA aware OS will just stick all the processes/data on one die+memory bank and call it a day... so its all back to clockspeeds as per usual. The rumored top of the line TR 1950X is 3.4ghz base, so i imagine gaming will be basically identical to other much cheaper mainstream Ryzen chips in that clock range.

Multi-client or virtualized stuff, and of course rendering/content production, will probably be TR's strong suit. For just straight gaming though, the Coffee Lake stuff coming next month will probably be the hot ticket until Cannonlake/Zen2.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Oh, I have things to occupy all 16 cores once a while. What I'm more concerned about is when there will be situations when NUMA is going to be a problem, i.e. mismatch between core a thread is running on and where memory allocations are. At a certain point depending on memory pressure, allocations will start to cross NUMA memory regions. If the IF bandwidth between CCX pairs is indeed so much worse as speculated, this is going to be noticeable in some form.

On the other hand, if IF is 32 bytes wide and can run an PCIe speeds, as said a couple of posts above, why the hell are there even bandwidth issues that depend on RAM speed?

Combat Pretzel fucked around with this message at 11:23 on Jul 10, 2017

PC LOAD LETTER
May 23, 2005
WTF?!
Its a latency issue not a bandwidth issue with inter CCX data transfers.

inkwell
Dec 9, 2005

Combat Pretzel posted:

On the other hand, if IF is 32 bytes wide and can run an PCIe speeds, as said a couple of posts above, why the hell are there even bandwidth issues that depend on RAM speed?

I'm sure im out of my depth here but as i understand it the infinity fabric memory speed bottleneck is more latency than throughput.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
So uh, if it's super high bandwidth, why does it get hosed over by the memory clock? Sounds like its data transfers are gated by the memory controller? If so, why?! I guess I kind of fail to see why it gets punished so hard versus Intel's ring bus.

Combat Pretzel fucked around with this message at 16:50 on Jul 10, 2017

wargames
Mar 16, 2008

official yospos cat censor

Combat Pretzel posted:

So uh, if it's super high bandwidth, why does it get hosed over by the memory clock? Sounds like its data transfers are gated by the memory controller? If so, why?! I guess I kind of fail to see why it gets punished so hard versus Intel's ring bus.

Because IF is half the speed of ram this is a hard setting in Zen1 so if you increase ram speed you increase IF.

Combat Pretzel posted:

Oh, I have things to occupy all 16 cores once a while. What I'm more concerned about is when there will be situations when NUMA is going to be a problem, i.e. mismatch between core a thread is running on and where memory allocations are. At a certain point depending on memory pressure, allocations will start to cross NUMA memory regions. If the IF bandwidth between CCX pairs is indeed so much worse as speculated, this is going to be noticeable in some form.

On the other hand, if IF is 32 bytes wide and can run an PCIe speeds, as said a couple of posts above, why the hell are there even bandwidth issues that depend on RAM speed?

Also Zen1 doesn't use Numa for CCX talk it uses a new scheduler that microsoft and AMD came up with to account for the slight higher CCX talk.

inkwell
Dec 9, 2005

Combat Pretzel posted:

So uh, if it's super high bandwidth, why does it get hosed over by the memory clock? Sounds like its data transfers are gated by the memory controller? If so, why?! I guess I kind of fail to see why it gets punished so hard versus Intel's ring bus.

i suspect this was a design choice to enable them to easily scale it up and down (multi die threadripper/epyc packages, and symmetrically disabling cores for lower end ryzen parts) and PROBOABLY to make it easier to maintain coherence while doing so or some crap but tbh i really have no idea what im talking about at this point

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

wargames posted:

It doesn't need to, remember AM4 is suppose to be supported for 5 years.

What does "supported" mean? Warranty? BIOS updates? Some of their chips will use it? All of their chips will use it?

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!

wargames posted:

Also Zen1 doesn't use Numa for CCX talk it uses a new scheduler that microsoft and AMD came up with to account for the slight higher CCX talk.
The NUMA stuff is speculation about Threadripper and EPYC.

wargames
Mar 16, 2008

official yospos cat censor

Subjunctive posted:

What does "supported" mean? Warranty? BIOS updates? Some of their chips will use it? All of their chips will use it?

That in 5 years zen2 or zen3/whatever will come out on am4/am4+

Combat Pretzel posted:

The NUMA stuff is speculation about Threadripper and EPYC.

But threadripper is single socket so Numa doesn't come into play, EPYC will require Numa and the special amd scheduler because two sockets and CCX talk.

Cygni
Nov 12, 2005

raring to post

TR will likely be dealing with NUMA, because we have 2 separate memory controllers each with a dual channel connection to their local DIMM banks, and a higher latency connection via IF between them. Would be advantageous for the OS to know that.

Someone asked AT's editor about it:

https://twitter.com/RyanSmithAT/status/870598439993720832

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
It's a single socket, but things look awfully like it being a Frankensocket (two smashed together). So yeah, NUMA.

Adbot
ADBOT LOVES YOU

Sinestro
Oct 31, 2010

The perfect day needs the perfect set of wheels.
The issue with IF as implemented in Zen 1 is that the memory controller also handles generating the clock for the bus, and that clock is at the same rate as the one that goes out to the DDR4 PHY. So in the most pessimistic case of DDR4-2133, it's limited to just 68GB/s, and in the perfect case of DDR4-3200 you're still only getting 102GB/s. Hopefully future revisions will fix that... frankly inexplicable design choice.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply