Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!
You know an interesting and kind of depressing thought, but maybe TR exists solely because AMD doesn't think they'll get enough orders for EPYC and are trying to recoup costs? I remember way back that there was no hint what so ever of a HEDT platform for AMD, HEDT and Threadripper were very, very recent and there is indication TR4 is a minimally repurposed from EPYC.

Adbot
ADBOT LOVES YOU

isndl
May 2, 2012
I WON A CONTEST IN TG AND ALL I GOT WAS THIS CUSTOM TITLE

FaustianQ posted:

You know an interesting and kind of depressing thought, but maybe TR exists solely because AMD doesn't think they'll get enough orders for EPYC and are trying to recoup costs? I remember way back that there was no hint what so ever of a HEDT platform for AMD, HEDT and Threadripper were very, very recent and there is indication TR4 is a minimally repurposed from EPYC.

Maybe the yields were so staggeringly high that they decided to just break them out into another product line?

Cygni
Nov 12, 2005

raring to post

It seems AMD planned from the start to use MCMs with Zen, so I imagine 2 and 4 way configs were always on the table and planned for. How they were targeted, branded, and priced was likely decided much further down the road, as it would be highly dependent on what sort of competition they were facing at roll out (something they would be unable to peg when design started out in 2012).

Basically I doubt 2 way was a last second idea. That doesn't mean it will be a good consumer product, we will just have to wait and see on that.

Kazinsal
Dec 13, 2011



FaustianQ posted:

You know an interesting and kind of depressing thought, but maybe TR exists solely because AMD doesn't think they'll get enough orders for EPYC and are trying to recoup costs?

I can't really explain my answer because of NDAs but this is definitely not the case.

Canned Sunshine
Nov 20, 2005

CAUTION: POST QUALITY UNDER CONSTRUCTION



Combat Pretzel posted:

So there's actually no Threadripper rocking it like it's 1998X?

Nope, it's even better. We all get to be happy with the 1950s like the GOP God intended.

Nam Taf
Jun 25, 2005

I am Fat Man, hear me roar!

Cygni posted:

Not sure if this has been posted before but wccftech posted a leaked lineup for TR:



This triggers me because it makes sense until the 1950/X which should be the 1960/X

It'd then follow the 19x0 where x is the 2nd digit of the core count. I mean it could be far better but at least it'd make sense.

Right now it's so close but then goes full AMD :allears:

Edit: missed the 1930X loving everything up.

EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!
I dunno, that list seems unrealistic because it supposes that say a 4+4 and a 3+3 would work nice with each other, that any it seems to be one too many SKUs. Like, what even is the point of the 1920? It's 200Mhz slower, but it's not like it won't clock right up there and to expect people not to OC on a HEDT platform is weird. It's also the only core count with 3 SKUs.

wargames
Mar 16, 2008

official yospos cat censor

Nam Taf posted:

This triggers me because it makes sense until the 1950/X which should be the 1960/X

It'd then follow the 19x0 where x is the 2nd digit of the core count. I mean it could be far better but at least it'd make sense.

Right now it's so close but then goes full AMD :allears:

Edit: missed the 1930X loving everything up.

what about the 1910? basically its all bad names.

Sidesaddle Cavalry
Mar 15, 2013

Oh Boy Desert Map

FaustianQ posted:

You know an interesting and kind of depressing thought, but maybe TR exists solely because AMD doesn't think they'll get enough orders for EPYC and are trying to recoup costs? I remember way back that there was no hint what so ever of a HEDT platform for AMD, HEDT and Threadripper were very, very recent and there is indication TR4 is a minimally repurposed from EPYC.

For what it's worth, AMD applied for the trademarks Ryzen and Threadripper in the same session towards the end of last year. It got a hilarious reaction from this thread about many things, from AMD marketing to 3d fantasy females on video cards again.

Scarecow
May 20, 2008

3200mhz RAM is literally the Devil. Literally.
Lipstick Apathy
Not to mention in the last month or so there has been reports that amds yields have been amazing

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Scarecow posted:

Not to mention in the last month or so there has been reports that amds yields have been amazing

By all account AMD's Ryzen yields are spectacular. It sounds like better than predicted, and it was predicted to be good in the first place. IIRC it was something absurd like more than 90% of dies are usable in some market segment.

AMD has managed to make a very affordable and a very scalable architecture which is just what they need right now. Once they get a new stepping and fix the few virtualization/NPT bugs it's good, memory stability has improved a lot with the new microcode from what I've been reading, still needs expensive RAM but pretty much works with appropriate memory. Zeppelin v2 is going to be pretty fearsome, there absolutely has to be a bottleneck given how quickly it hits a wall and with a little more clock speed it'll be even better.

Unless that's just the limit of GloFo's process. Surprisingly, GloFo has not hosed it up this time, whereas the GPUs... not sure if that says good things about Raja's high-level uarch calls over the last 4 years. Can AMD please get Jim Keller to design a GPU? (j/k he's too busy diving in a pool of money at Tesla)

Fat Polaris would have been great, bumping it from the roadmap for Q4 2016 Vega was such a terrible mistake. It might have given them a little more runway on making Vega work. To be fair it's not entirely his fault, execs at AMD thought that discrete GPUs were going away before he took over and really undersupported RTG, so I'm sure it's been an uphill battle, but drat, the outcome has not been good. Vega really needs to pull a rabbit out of the hat even just for short-term relevance, based on the Vega FE performance. It would really help if they could use the "faster than 1080" marketing line, so they're going to push it to the limit again.

Paul MaudDib fucked around with this message at 04:49 on Jul 7, 2017

FuturePastNow
May 19, 2014


Cygni posted:

Not sure if this has been posted before but wccftech posted a leaked lineup for TR:



Hmm, as a budget minded builder that fastest 12/24 model looks most tempting.

SamDabbers
May 26, 2003



Paul MaudDib posted:

Once they get a new stepping and fix the few virtualization/NPT bugs it's good,

As it turns out, the IOMMU + NPT bug is in KVM, not the silicon or microcode. Passing through the same GPU with NPT enabled under Xen does not exhibit the performance degradation. It has been acknowledged on the KVM mailing list, but a fix is still pending.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

SamDabbers posted:

As it turns out, the IOMMU + NPT bug is in KVM, not the silicon or microcode. Passing through the same GPU with NPT enabled under Xen does not exhibit the performance degradation. It has been acknowledged on the KVM mailing list, but a fix is still pending.

Oh wow, fantastic. Virtualization was not a good thing to have bugs in.

That's good because most of the homelab applications I can think of involve virtualization.

incoherent
Apr 24, 2004

01010100011010000111001
00110100101101100011011
000110010101110010
Could they throw out a last min 18 core threadripper or has that left the station for this cycle?

SamDabbers
May 26, 2003



incoherent posted:

Could they throw out a last min 18 core threadripper or has that left the station for this cycle?

Considering that Threadripper is physically two 8-core Ryzen dies on the same package, 16 cores and 32 threads is the limit. The EPYC chips have four of the 8-core Ryzen dies on the same package, but that's not socket-compatible with Threadripper.

Cygni
Nov 12, 2005

raring to post

incoherent posted:

Could they throw out a last min 18 core threadripper or has that left the station for this cycle?

TR is two Ryzen dies on one package, so the best you will get is 16 cores. I doubt X399 boards will support the power delivery for 4 dies, so if you want more than 16 cores, you'll have to go with Epyc.

E: Beaten

Cygni fucked around with this message at 06:10 on Jul 7, 2017

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

incoherent posted:

Could they throw out a last min 18 core threadripper or has that left the station for this cycle?

It's a pair of 8C dies, so no, unless they start using Epyc rejects. Which almost certainly aren't enough volume since it's yielding so well, it would be outright creating a whole new SKU tier. Maybe in 6-12 months when they need to dump some stock as they move to Zeppelin V2.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
Speaking of Epyc, I could swear this page just went live a day or two ago.. I had checked Supermicro last week to see if they had any Epyc stuff.

https://www.supermicro.com/products/nfo/AMD_SP3.cfm

The ones with 24 hot swap U.2 NVMe :getin:

EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!

Sidesaddle Cavalry posted:

For what it's worth, AMD applied for the trademarks Ryzen and Threadripper in the same session towards the end of last year. It got a hilarious reaction from this thread about many things, from AMD marketing to 3d fantasy females on video cards again.

I know, I was there. Maybe it more confirms AMD new yields would be stupid amazing very early on, and effort in developing the TR4 platform would be minimal compared to potential returns. X299 is a fiasco so far, it's likely AMD hopes to drop TR4 into this as a better price/performance alternative with minimal hassle. I dunno, Vega FE has me thinking negatively about AMD's prospects lately.

I'm still wondering if Raven Ridge can be MCM'ed since both Ryzen and Vega support IF, would a mere MCM allow Vega dies to communicate as one, or would they just automatically XCF themselves?

Scarecow
May 20, 2008

3200mhz RAM is literally the Devil. Literally.
Lipstick Apathy
I would really love to know whats limiting zen re clock speeds, is it the fab process or is it a architecture limit that would take a redesign of some part of zen

Also if the infinity fabric can get faster

Kazinsal
Dec 13, 2011



My dream is for them to get Zen on a process that does 4.5-4.8 GHz depending on lottery, and then do 8 core chips with 16 GB of HBM on die. Infinity Fabric at HBM grade speeds? Yes please.

That is entirely a lofty dream though. My hopes are that Zen+ will be able to reliably hit 4.2 GHz with some extra volts and do 3200+ MHz DDR4 without a hitch. The difference in CPU performance on current Zen between 2400 MHz and 3200 MHz is absolutely astounding.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Kazinsal posted:

My dream is for them to get Zen on a process that does 4.5-4.8 GHz depending on lottery, and then do 8 core chips with 16 GB of HBM on die. Infinity Fabric at HBM grade speeds? Yes please.

CPUs will be connected to external memory for the near future. The latency of VRAM would suck pretty hard, even with HBM. It's what, 100 times higher than DDR4?

The HBM on the package is for the GPU, not the CPU.

BurritoJustice
Oct 9, 2012

Kazinsal posted:

My dream is for them to get Zen on a process that does 4.5-4.8 GHz depending on lottery, and then do 8 core chips with 16 GB of HBM on die. Infinity Fabric at HBM grade speeds? Yes please.

That is entirely a lofty dream though. My hopes are that Zen+ will be able to reliably hit 4.2 GHz with some extra volts and do 3200+ MHz DDR4 without a hitch. The difference in CPU performance on current Zen between 2400 MHz and 3200 MHz is absolutely astounding.

HBM is super high bus width super low clock, it's even further away from viable CPU use than GDDR which is another order of magnitude from DDR4.

Anarchist Mae
Nov 5, 2009

by Reene
Lipstick Apathy

Kazinsal posted:

My dream is for them to get Zen on a process that does 4.5-4.8 GHz depending on lottery, and then do 8 core chips with 16 GB of HBM on die. Infinity Fabric at HBM grade speeds? Yes please.

Also, 16GB? I'm going to need more than that.

IanTheM
May 22, 2007
He came from across the Atlantic. . .

Scarecow posted:

I would really love to know whats limiting zen re clock speeds, is it the fab process or is it a architecture limit that would take a redesign of some part of zen

Also if the infinity fabric can get faster

It really seems like it's the process, which is meant for low power and not really high clocks. A revision of it might be different, but I wonder if they'll both and instead focus on the next shrink instead.

PC LOAD LETTER
May 23, 2005
WTF?!

FaustianQ posted:

I'm still wondering if Raven Ridge can be MCM'ed since both Ryzen and Vega support IF, would a mere MCM allow Vega dies to communicate as one, or would they just automatically XCF themselves?
Probably technically possible and may just flat out work given the way AMD has designed the IF bus as a generic means of connecting various devices across their product line.

Also probably not feasible due to the cost. At least right now anyways. If MCM's/interposers get cheap enough you'll see a change of course.

I remember some leaks on TR/Epyc saying the manufacturing cost of dies and the packaging (ie. MCM substrate, IHS, testing, etc.) was around $120-140 which is actually pretty low if true. But still too high for a APU which are probably going to be mostly low cost/low end products destined for business machines with some mid range priced versions meant for entry level "gaming" systems. At $400 or so for entry level TR/Epyc AMD should still be able to make a nice profit, even after R&D costs are accounted for, on those so it makes sense on higher end products.

NewFatMike
Jun 11, 2015

It's totally possible that cost scales with IF/interposer connections. Sure there's a floor on the price, but I mean something using 8 CCXs vs maybe 1 CCX + or 4 Vega NCUs is probably going to be a little different price wise.

That's all just kind of thinking out loud, though. Low end pricing has to be something AMD have figured out of Navi is going to be fully modular like Zen, though.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Measly Twerp posted:

Also, 16GB? I'm going to need more than that.

That's just the L3 cache :getin:

But seriously I didn't realize HBM was slower than regular DDR4. Huh!

wargames
Mar 16, 2008

official yospos cat censor

Munkeymon posted:

That's just the L3 cache :getin:

But seriously I didn't realize HBM was slower than regular DDR4. Huh!

Current HBM2 on vega(poo poo card) is 900ish mhz but drops to 500 if you change anything.

EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!

PC LOAD LETTER posted:

I remember some leaks on TR/Epyc saying the manufacturing cost of dies and the packaging (ie. MCM substrate, IHS, testing, etc.) was around $120-140 which is actually pretty low if true. But still too high for a APU which are probably going to be mostly low cost/low end products destined for business machines with some mid range priced versions meant for entry level "gaming" systems. At $400 or so for entry level TR/Epyc AMD should still be able to make a nice profit, even after R&D costs are accounted for, on those so it makes sense on higher end products.

NewFatMike posted:

It's totally possible that cost scales with IF/interposer connections. Sure there's a floor on the price, but I mean something using 8 CCXs vs maybe 1 CCX + or 4 Vega NCUs is probably going to be a little different price wise.

That's all just kind of thinking out loud, though. Low end pricing has to be something AMD have figured out of Navi is going to be fully modular like Zen, though.

Yeah, thinking on this a bit, this does mean AMD is stuck with a floor of performance to properly recoup cost. Like, each individual block that would make up Navi or Zen2 must be of a minimum performance so that when scaled they provide proper return. I think, IMHO, this points to AMD keeping the 4 core CCX and use 1024SP shader complex for Navi.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Munkeymon posted:

That's just the L3 cache :getin:

But seriously I didn't realize HBM was slower than regular DDR4. Huh!

Throughput and latency are very different things but both can be described as "fast" or "slow". They're "overloaded" in the programming sense and it's really best to just avoid them to avoid confusion. Prefer something like "high-bandwidth" or "low-latency" instead.

(nerd alert)

PC LOAD LETTER
May 23, 2005
WTF?!

NewFatMike posted:

Low end pricing has to be something AMD have figured out of Navi is going to be fully modular like Zen, though.
My WAG is Navi will by default be a mid range targeted product as a single "sweet spot" (200-300mm I believe is the sweet spot for yields but that is old information) die and they'll scavenge dies or MCM packages as necessary for the low end versions. Its going to be the high end targeted version of Navi that could be really interesting.

If AMD can pull off making 2-4 GPU's on a MCM/interposer work well enough together to fake being a single die big GPU it'd be a big win I think. They'd be able to cover all the product lines with only 1 die from the foundry effectively and still keep production costs down which would be a slick hat trick to pull off. If they manage to pull it off well enough with Epyc/TR I really don't know why they wouldn't be able to with Navi.

Arzachel
May 12, 2012

BurritoJustice posted:

HBM is super high bus width super low clock, it's even further away from viable CPU use than GDDR which is another order of magnitude from DDR4.

Doesn't HBM actually have less latency than GDDR5 since the stacks are on-package?

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Arzachel posted:

Doesn't HBM actually have less latency than GDDR5 since the stacks are on-package?

I raised an eyebrow at that too, I thought it was lower than GDDR5, but I've never bothered to do the math. Maybe it's something about the relatively low clock speeds. Might be lower in *clock cycles* but higher in wall-clock time?

Regardless, either HBM or GDDR5 is still totally unsuitable for CPU usage. They rely on fast RAM, GPUs need the bandwidth and use a latency-hiding strategy to tolerate the latency. You run a massive number of threads (let's say 8-64x more than you have cores) which more or less live in L1. Most of them are sitting idle waiting for memory access to complete, when they are ready to execute they are scheduled and executed. So average throughput is fantastic but any given thread has atrocious latency.

Basic design principle of GPUs: with a sufficient number of threads, behavior approaches the theoretical average

NewFatMike
Jun 11, 2015

PC LOAD LETTER posted:

My WAG is Navi will by default be a mid range targeted product as a single "sweet spot" (200-300mm I believe is the sweet spot for yields but that is old information) die and they'll scavenge dies or MCM packages as necessary for the low end versions. Its going to be the high end targeted version of Navi that could be really interesting.

If AMD can pull off making 2-4 GPU's on a MCM/interposer work well enough together to fake being a single die big GPU it'd be a big win I think. They'd be able to cover all the product lines with only 1 die from the foundry effectively and still keep production costs down which would be a slick hat trick to pull off. If they manage to pull it off well enough with Epyc/TR I really don't know why they wouldn't be able to with Navi.

Don't you think they might target APU GPU dies and move up from that like the Zen CCX? That was kind of imagining (although maybe that ~200sqmm neighborhood fits that, but I feel like an APU with that size GPU package would be kinda big).

Cygni
Nov 12, 2005

raring to post

PC LOAD LETTER posted:

My WAG is Navi will by default be a mid range targeted product as a single "sweet spot" (200-300mm I believe is the sweet spot for yields but that is old information) die and they'll scavenge dies or MCM packages as necessary for the low end versions. Its going to be the high end targeted version of Navi that could be really interesting.

If AMD can pull off making 2-4 GPU's on a MCM/interposer work well enough together to fake being a single die big GPU it'd be a big win I think. They'd be able to cover all the product lines with only 1 die from the foundry effectively and still keep production costs down which would be a slick hat trick to pull off. If they manage to pull it off well enough with Epyc/TR I really don't know why they wouldn't be able to with Navi.

GPUs are theoretically more suited to multiple die packages too due to the parallel workload. Epyc's latency for going between the dies is fairly atrocious, and AMDs way of addressing that was essentially working around that reality (just like Intel did with the Pentium D). GPUs wouldn't deal with that quite as much. I mean heck, the concept you're describing for Navi is basically VSA-100. :v:

The problem Navi faces is the same one AMD has faced for years: Nvidia isn't just sitting around waiting for them to catch up. Volta is already at the reticle limit of a 12nm process, so I imagine Nvidia is as fully aware of the manufacturing constraints and future performance needs as AMD is.

EmpyreanFlux
Mar 1, 2013

The AUDACITY! The IMPUDENCE! The unabated NERVE!

NewFatMike posted:

Don't you think they might target APU GPU dies and move up from that like the Zen CCX? That was kind of imagining (although maybe that ~200sqmm neighborhood fits that, but I feel like an APU with that size GPU package would be kinda big).

This was my impression too, that they'd target something ~80-100mm˛ and scale from there, as they could then just use a single GPU die for APUs up into Datacenter products. This has the added benefit of massive yields and basically everything being manufactured as something usable, likely nothing would get tossed.

So APU's wouldn't be monolithic anymore either, they'd by MCM'ed and I'm not sure there is a particular disadvantage to this besides higher theoretical cost.

So APU Navi is 1 Zen2 CCX and 1 Navi CCX. A navi low end product is 1 to 2 Navi CCXs, midrange is 3-4 Navi CCX, high end is 4-6 CCX. Datacenter they straight up make a socket for and have something like 6-12 Navi CCX on it and allow customers to scale their workloads and provide for easy upgrades and flexibility.

wargames
Mar 16, 2008

official yospos cat censor

Cygni posted:

GPUs are theoretically more suited to multiple die packages too due to the parallel workload. Epyc's latency for going between the dies is fairly atrocious, and AMDs way of addressing that was essentially working around that reality (just like Intel did with the Pentium D). GPUs wouldn't deal with that quite as much. I mean heck, the concept you're describing for Navi is basically VSA-100. :v:

The problem Navi faces is the same one AMD has faced for years: Nvidia isn't just sitting around waiting for them to catch up. Volta is already at the reticle limit of a 12nm process, so I imagine Nvidia is as fully aware of the manufacturing constraints and future performance needs as AMD is.

but navi i think is planed for 7nm which was dev'd by ibm/samsung/gloflow.

Adbot
ADBOT LOVES YOU

NewFatMike
Jun 11, 2015

Cygni posted:

The problem Navi faces is the same one AMD has faced for years: Nvidia isn't just sitting around waiting for them to catch up.

Give it five years and they might like Intel did :v:

That's not really constructive, but maybe Nvidia will learn from Intel's mistakes/GPUs are further from diminishing returns than CPUs

FaustianQ posted:

This was my impression too, that they'd target something ~80-100mm² and scale from there, as they could then just use a single GPU die for APUs up into Datacenter products. This has the added benefit of massive yields and basically everything being manufactured as something usable, likely nothing would get tossed.

So APU's wouldn't be monolithic anymore either, they'd by MCM'ed and I'm not sure there is a particular disadvantage to this besides higher theoretical cost.

So APU Navi is 1 Zen2 CCX and 1 Navi CCX. A navi low end product is 1 to 2 Navi CCXs, midrange is 3-4 Navi CCX, high end is 4-6 CCX. Datacenter they straight up make a socket for and have something like 6-12 Navi CCX on it and allow customers to scale their workloads and provide for easy upgrades and flexibility.

Yeah, I man a TR/Epyc chip are really just big in general, but most of the cost from those are still silicon, right? The metal for the pins, package, substrate are all still relatively cheap, right? A big rear end package that's still cheap and performs really well is still a desirable thing.

If it turned out the Raven Ridge (or next generation) APUs were larger packages than usual (not likely since they still have to fit on AM4 mobos and cooling solutions) as long as they still hit price/performance/power targets the world keeps spinning.

What'd be really neat is if at some point we were able to customize CPU/GPU CCX combinations. If we got to work inside the TR4 footprint but could get some area based combination of CCXs, I'd go bananas.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply