Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us $3,400 per month for bandwidth bills alone, and since we don't believe in shoving popup ads to our registered users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Paul MaudDib
May 2, 2006

"Tell me of your home world, Usul"


Lazer Vampire Jr. posted:

Isn't that the rub though?

That if Intel can't even fart out Tiger Lake as fast as Zen 2 and Zen 3 stuff comes out,

ah yes I hope that Intel can keep up with Renoir production, the chip where 9 months after launch you still can’t even find a product with a 4800U anywhere

Adbot
ADBOT LOVES YOU

Malcolm XML
Aug 8, 2009

I always knew it would end like this.


Paul MaudDib posted:

ah yes I hope that Intel can keep up with Renoir production, the chip where 9 months after launch you still can’t even find a product with a 4800U anywhere

My guess is that they knew renoir was good but had trouble convincing OEMs to make orders for it -- reluctance/intel kickbacks/whatever

It turned out it's really loving good.

Now, for 5000 series the OEMs will put it in high volume products, thereby letting AMD order more

Malcolm XML
Aug 8, 2009

I always knew it would end like this.


The AMD Marketing dude was super happy that Dell made a laptop and I think this is why.

Can't get it in reviewers hands if it's not in a good laptop plus it gives them cred

Paul MaudDib
May 2, 2006

"Tell me of your home world, Usul"


Malcolm XML posted:

My guess is that they knew renoir was good but had trouble convincing OEMs to make orders for it -- reluctance/intel kickbacks/whatever

It turned out it's really loving good.

Now, for 5000 series the OEMs will put it in high volume products, thereby letting AMD order more

XMG has said that AMD can't even fill the orders they allocated months ago.

(allocation is like "order accepted", in other words it's not just super high demand meaning that vendors want more than anticipated, AMD can't even keep up with the quotas they previously agreed to.)

https://www.reddit.com/r/XMG_gg/com...en_in_xmg_core/

I have no doubt that Su's statement that "demand is higher than anticipated" is true but it also doesn't explain their inability to fulfill the pre-existing level of supply. I think there is a lot of contention for 7nm node space at the moment - consoles, renoir, desktop/server, and now a GPU launch using a very large die. That's one of the reasons I kind of think RDNA2 and retail APUs are going to firm up after the Zen3 products launch and a big chunk of that demand transitions to 5nm and opens up space on 7nm.

Paul MaudDib fucked around with this message at 21:27 on Sep 17, 2020

Cygni
Nov 12, 2005

raring to post



Yeah, and TSMC is still capacity constrained, which means AMD could either use their wafers to crank out more small Zen2+3 dies with high yields that go in high margin parts, or they could crank out the bigger Renoir dies with associated lower yields with lower margins. I think its a strategic decision. Knowing the demand now in retrospect, they probably would have shaved some Navi production and made more Renoir but its kinda too late now. Both are going out of production soon, if they aren't already.

e: paul beat me with his edit!

WhyteRyce
Dec 30, 2001



SwissArmyDruid posted:

...what is this bizarre colloquialism? "basting with sardine oil"?

That bitch Carol fuckin' Baskins

mobby_6kl
Aug 9, 2009

"You are the best poster... do not let anyone say otherwise."


Malcolm XML posted:

Read the title and the headings
Still makes no loving sense

Cygni posted:

Heres the AT article:

https://www.anandtech.com/show/1608...e-core-11th-gen

Disappointing IPC gains tbh, I really thought they would get more out of it vs Ice Lake.
Thanks, this is definitely a bit more detailed. I found these power charts to be the most interesting, although they probably don't cover all typical use cases:





In the first test, TG takes quite a bit less total power, in the second a bit more power and time. Too bad they didn't test it at 28W too, or something inbetween. It seems like a theoretical 6 or 8 core TG part at lower clocks might be able to match renoir here better.

ConanTheLibrarian
Aug 13, 2004


dis buch is late

Fallen Rib

evilweasel posted:

I mean this review is long as balls and often goes over my head but I see a lot of actually pretty negative stuff in here about it and their conclusion was kind of ambivalent, so I'm not seeing where you get that:

He moans about per-clock improvements but that's not really the point. Ice Lake had great IPC gains, but its clocks were dog poo poo. Tiger Lake fixes the frequency issue without doing much on the IPC front. It still results in a great single thread performance bump.


That said, as someone in the market for a new PC, I can't see myself waiting til 2022 or whenever it is that Intel finally release a Cove-based desktop CPU on sub-14nm.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.


Paul MaudDib posted:

XMG has said that AMD can't even fill the orders they allocated months ago.

(allocation is like "order accepted", in other words it's not just super high demand meaning that vendors want more than anticipated, AMD can't even keep up with the quotas they previously agreed to.)

https://www.reddit.com/r/XMG_gg/com...en_in_xmg_core/

I have no doubt that Su's statement that "demand is higher than anticipated" is true but it also doesn't explain their inability to fulfill the pre-existing level of supply. I think there is a lot of contention for 7nm node space at the moment - consoles, renoir, desktop/server, and now a GPU launch using a very large die. That's one of the reasons I kind of think RDNA2 and retail APUs are going to firm up after the Zen3 products launch and a big chunk of that demand transitions to 5nm and opens up space on 7nm.

Why in the hell would they dedicate wafers to a known-low volume product??? They are trying to build demand. They might have even deliberately erred low lol

IF it was bad they'd be stuck with a fuckload of bad chips

I think Lisa Su's real benefit is she's not loving going to do things to do things

Malcolm XML
Aug 8, 2009

I always knew it would end like this.


mobby_6kl posted:

Still makes no loving sense


It's a reference to Tiger King lmao

SwissArmyDruid
Feb 14, 2014



Malcolm XML posted:

It's a reference to Tiger King lmao

Oh yeah. I completely let that phenomena pass me by entirely.

Cygni
Nov 12, 2005

raring to post



https://videocardz.com/newz/intel-r...-roadmap-leaked

Rocket Lake not hitting the streets until March, no X299 refresh for the forseeable future.

DrDork
Dec 29, 2003
commanding officer of the Army of Dorkness

And the 14nm+++++++++++++++++++++++ train haven't brakes.

Volguus
Mar 3, 2009


So if I want to upgrade my x299 i7-7820X CPU my best bet would be to go with Threadripper? Maybe wait for the next version even? Or just buy i9-10920X now and wait 3 more years to see how things shake out?

Paul MaudDib
May 2, 2006

"Tell me of your home world, Usul"


Volguus posted:

So if I want to upgrade my x299 i7-7820X CPU my best bet would be to go with Threadripper? Maybe wait for the next version even? Or just buy i9-10920X now and wait 3 more years to see how things shake out?

basically the way it shakes out right now is that TR 3000 is in a price class above X299, they start at $1400 (24C/3960X) and the motherboards are $450-600. Or you can drop in a 10980XE for $999 if you can find one.

Depending on what you are doing, Epyc is worth serious consideration as well. It doesn't clock as high as TR can but actually you can get more cores for the same price, plus it has better memory performance, more PCIe lanes, etc. And at the low end if you just want the expansion capability they are willing to sell you lower core count versions that they don't offer as threadrippers - eg you can get a 16C or 8C Epyc and bring the cost down as low as $450. Motherboards run about the same as threadripper, they start around $500 and the ones you want are more like $600. Probably not going to find one with onboard sound but if you have a GPU you can run through that, or use a USB DAC/AMP.

https://www.youtube.com/watch?v=ZV7ooH5BD4w

If you don't need the expansion and just want the cores, get a 3900X or wait and see what the 5900X looks like.

Paul MaudDib fucked around with this message at 02:00 on Oct 7, 2020

D. Ebdrup
Mar 13, 2009



Ian Cutress has an article on DDR5 sub-timings and latencies that's worth reading.

The short of the long is that the memory wall is still a problem, 20 years after SMT was introduced to try and fix it.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.


D. Ebdrup posted:

Ian Cutress has an article on DDR5 sub-timings and latencies that's worth reading.

The short of the long is that the memory wall is still a problem, 20 years after SMT was introduced to try and fix it.

I'l be back after actually reading the article, but I'm assuming that bandwidth continues to increase much more quickly than latency decreases, so SMT continues to deliver increasing benefits as threads spend comparatively more time waiting for data?

Yup, here's hoping we get 4-way hyperthreading on X86 soon.

Twerk from Home fucked around with this message at 16:15 on Oct 9, 2020

D. Ebdrup
Mar 13, 2009



Twerk from Home posted:

I'l be back after actually reading the article, but I'm assuming that bandwidth continues to increase much more quickly than latency decreases, so SMT continues to deliver increasing benefits as threads spend comparatively more time waiting for data?

Yup, here's hoping we get 4-way hyperthreading on X86 soon.
Bandwidth is going to start experiencing asymptotic gain at some point too, as a result of the fixed latency.

POWER has 8-way SMT, and that hasn't helped it.
One saving grace is that according to the PPC64LE entry for the FreeBSD quarterly report, that I'm working on, porting to PPC64LE is very easy -so that's something.

D. Ebdrup fucked around with this message at 16:30 on Oct 9, 2020

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull


D. Ebdrup posted:

Bandwidth is going to start experiencing asymptotic gain at some point too, as a result of the fixed latency.

POWER has 8-way SMT, and that hasn't helped it.
One saving grace is that according to the PPC64LE entry for the FreeBSD quarterly report, that I'm working on, porting to PPC64LE is very easy -so that's something.

What, in your mind, does the ease of porting software to PPC64LE have to do with memory latency, I'm really curious how you made that leap

D. Ebdrup
Mar 13, 2009



BobHoward posted:

What, in your mind, does the ease of porting software to PPC64LE have to do with memory latency, I'm really curious how you made that leap
That's fair, I guess they weren't really meant to be related - it was more a remark about how, if we wanted 8-way SMT we could have it relatively soon and on a little-endian architecture (which amd64/x86_64 also is, meaning there's less issues with software higher up the stack), that would be one way to go since it doesn't seem like AMD or Intel have much interest in anything other than 2-way SMT.

Indiana_Krom
Jun 18, 2007
Net Slacker

SMT is ultimately a method of increasing processor utilization. 4-way or 8-way probably isn't looked at because AMD and Intel are already getting satisfactory utilization of their execution units with 2-way and going higher either doesn't yield any benefits or comes with a complexity/latency trade-off that isn't worth it.

D. Ebdrup
Mar 13, 2009



Indiana_Krom posted:

SMT is ultimately a method of increasing processor utilization. 4-way or 8-way probably isn't looked at because AMD and Intel are already getting satisfactory utilization of their execution units with 2-way and going higher either doesn't yield any benefits or comes with a complexity/latency trade-off that isn't worth it.
It's very very workload dependent, but as an example: gas, oil, nuclear and other simulations tend to favour platforms with very large memory pools and high memory bandwidth - both of which strengths of newest POWERNV architecture.

#2 and #3 on TOP500 is based on POWER9, only topped recently by an ARM supercomputer which achieves its memory bandwidth advantages through using HBM2 memory which has a distinct advantage of having up to 1TBps of bandwidth.
As a counterpoint, ThunderX2 with 4-way SMT has octo-channel DDR4 and only achieves 256GBps memory bandwidth.

D. Ebdrup fucked around with this message at 13:46 on Oct 10, 2020

PCjr sidecar
Jan 26, 2011

dude, you gotta end it on the rhyme



D. Ebdrup posted:

It's very very workload dependent, but as an example: gas, oil, nuclear and other simulations tend to favour platforms with very large memory pools and high memory bandwidth - both of which strengths of newest POWERNV architecture.

#2 and #3 on TOP500 is based on POWER9, only topped recently by an ARM supercomputer which achieves its memory bandwidth advantages through using HBM2 memory which has a distinct advantage of having up to 1TBps of bandwidth.
As a counterpoint, ThunderX2 with 4-way SMT has octo-channel DDR4 and only achieves 256GBps memory bandwidth.

Most of the FLOPS and memory bandwidth on #2/3 are on GPUs with HBM.

Balancing bandwidth/flops/latency/capacity for hpc is going to make for an interesting few years.

gradenko_2000
Oct 5, 2010



Lipstick Apathy

What would you do if you had eight-way simultaneous multithreading?

I'll tell you what I'd do, man: two games at the same time, man.

AARP LARPer
Feb 19, 2005




THE DARK SIDE OF SCIENCE BREEDS A WEAPON OF WAR


Buglord

You laugh, but I do this.

WhyteRyce
Dec 30, 2001



Back when I used to play a free Korean MMO I would have loved to have my stall up while playing an actually interesting game

K8.0
Feb 26, 2004

Her Majesty's 56th Regiment of Foot

Yeah I feel like that is a fairly common thing in the modern era. I've done it plenty of times with games with lovely startups. Just leave the fucker running from the second time I launch it until I'm done playing it, and if I feel like playing something else I do. My time is way too valuable to sit through 30 or 90 seconds of loading bullshit dozens of times.

ColTim
Oct 29, 2011


Something I've never quite understood is per core memory bandwidth. Naively I expected that to be bottlenecked by the RAM speed itself, but from testing on my old X99 6800K it looked like the per-core bandwidth topped out around 10-12GB/s, despite having quad-channel DDR4 3200 (with a theoretical bandwidth of ~100GB/s). I think it may have something to do with the cache performance as well; like regardless of the bandwidth of the RAM, each core can only load X cache lines per clock or something...

Things may be a bit different on the smaller client dies (probably higher per-core numbers there) but it definitely threw me for a loop!

D. Ebdrup
Mar 13, 2009



It's important to remember that every single bandwidth number you'll ever see is most likely the peak burst values.

It's also not nearly as simple as that because you also have to account for the OS scheduler - since nobody outside of Microsoft knows how the NT scheduler works, I'll have to give an example of something I know (I've studied ULE, but I'm not a scheduler wizard, just a docs committer):
In FreeBSD the scheduler deals in time-shares for a given process which are called a quantum. It is the schedulers job to distribute it so that the cost of moving a process from one core to another does not outweigh the time it takes for the process quantum to finish on that core, which as you can imagine is a constant balancing act that there is no clear answer to, especially when you have to factor in that you may also want to optimize for power usage of the CPU if you're on a mobile platform or if you're doing green computing, for example.

All of this also has to be balanced with how interactive you want the system to be. Adjusted (im)properly, FreeBSD can be so busy doing what you tell it to do that it'll take you an entire day just to login to the system on the console - and as I've tried that, I can categorically state that it's less than ideal on a production system, even if the system recovered fully without rebooting.

One way around this is to peg a process to a core (some call this setting process affinity), but that only works if it's single-threaded and I'm not sure that that describes most memory benchmark suites I know.

D. Ebdrup fucked around with this message at 22:58 on Oct 10, 2020

SwissArmyDruid
Feb 14, 2014



I am entirely guilty of playing MTGA while waiting for dungeon queue in FFXIV.

DrDork
Dec 29, 2003
commanding officer of the Army of Dorkness

SwissArmyDruid posted:

I am entirely guilty of playing MTGA while waiting for dungeon queue in FFXIV.

FFXIV basically requires a second monitor / system so you can be gaming while you game.

gently caress you, DPS roulette queues.

Beef
Jul 26, 2004


ColTim posted:

Something I've never quite understood is per core memory bandwidth. Naively I expected that to be bottlenecked by the RAM speed itself, but from testing on my old X99 6800K it looked like the per-core bandwidth topped out around 10-12GB/s, despite having quad-channel DDR4 3200 (with a theoretical bandwidth of ~100GB/s). I think it may have something to do with the cache performance as well; like regardless of the bandwidth of the RAM, each core can only load X cache lines per clock or something...

Things may be a bit different on the smaller client dies (probably higher per-core numbers there) but it definitely threw me for a loop!

Yea that looks low. How did you get to that number? The STREAM benchmark suite is typically pretty good at putting it through its paces.

As already pointed out:
- You need multiple threads on multiple cores to saturate your bandwidth. At least Xeons do. A single load-store buffer cannot hold enough in-flight memory operations, for example.
- If the cache is getting trashed, it's doing a ton of flushes and prefetches, which is memory bandwidth you won't see unless you use hardware counters. Good memory benchmarks will use uncached memory operations to avoid this.

ColTim
Oct 29, 2011


I used stressapptest which would dump out a bandwidth number (pretty sure it's the memory copy speed). From my (limited) understanding a decent chunk of the numbers are affected by things like what instructions are used (e.g. AVX/AVX2 will be faster than scalar ops, or even that whole REP MOVS thing).

Then there's the whole ringbus (and the number of "hops" or whatever) or mesh hierarchy which results in worse per-core bandwidth on the higher core count chips. Like I think Skylake-X is pretty dreadful on that front, at least going off of this.

Beef
Jul 26, 2004


A new experimental architecture: https://arxiv.org/abs/2010.06277
gently caress caches, massive threading.

Coffee Jones
Jul 4, 2004

16 bit? Back when we was kids we only got a single bit on Christmas, as a treat
And we had to share it!


K8.0 posted:

Yeah I feel like that is a fairly common thing in the modern era. I've done it plenty of times with games with lovely startups. Just leave the fucker running from the second time I launch it until I'm done playing it, and if I feel like playing something else I do. My time is way too valuable to sit through 30 or 90 seconds of loading bullshit dozens of times.

Ya, you’d think they’d pack as much loading as they could in those “Powered By Speed Tree” screens

I haven’t looked into this too closely, but the new Xbox is supposed to be able to suspend and load at the drop of a hat. Dump the contents of ram to disk and and reload something else.
Thing is, it’s a console, MS controls that single fixed hardware spec with no 3rd party drivers.

How possible is this kind of per-process suspension on a PC?

(Though as a developer, I can’t imagine working on some code that deals with datetimes or network and the OS says “you’re going into carbonite now, and when you come out, act as if nothing happened, LOL”

Coffee Jones fucked around with this message at 14:57 on Oct 16, 2020

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.


Coffee Jones posted:

Ya, you’d think they’d pack as much loading as they could in those “Powered By Speed Tree” screens

I haven’t looked into this too closely, but the new Xbox is supposed to be able to suspend and load at the drop of a hat. Dump the contents of ram to disk and and reload something else.
Thing is, it’s a console, MS controls that single fixed hardware spec with no 3rd party drivers.

How possible is this kind of per-process suspension on a PC?

(Though as a developer, I can’t imagine working on some code that deals with datetimes or network and the OS says “you’re going into carbonite now, and when you come out, act as if nothing happened, LOL”

It's completely possible, that's how Windows hibernating works. The Hiberfil.sys was the same size as your RAM, and you would dump memory to disk before going into hibernation. You could definitely do this to individual processes, it would just be limited by disk space & speed, mostly speed.

Saukkis
May 16, 2003

Unless I'm on the inside curve pointing straight at oncoming traffic the high beams stay on and I laugh at your puny protest flashes.
I am Most Important Man. Most Important Man in the World.

Coffee Jones posted:

How possible is this kind of per-process suspension on a PC?

For Linux there is a software called CRIU (Checkpoint/Restore In Userspace) to do that.

DrDork
Dec 29, 2003
commanding officer of the Army of Dorkness

Coffee Jones posted:

(Though as a developer, I can’t imagine working on some code that deals with datetimes or network and the OS says “you’re going into carbonite now, and when you come out, act as if nothing happened, LOL”

Thing is, I don't think it does ask the network stack to just pretend like nothing's happened. Quick Resume will still (AFAIK) cause network reconnections after you resume like you'd had any other network interruption, and how that's handled depends on the game.

For the rest of it, phones have been doing something similar for years, and emulators can do pretty much exactly the same thing. In that sense it's almost odd that PCs don't really have that ability yet. It'd certainly be a nice option to have for all the people with 1TB+ SSDs but only like 16GB RAM where you don't want to leave stuff running needlessly.

repiv
Aug 13, 2009



College Slice

I don't think the Xbox is doing suspension on a process level, the Xbox One already runs games inside their own virtual machine so it's probably suspending the entire (virtualized) OS instance

Adbot
ADBOT LOVES YOU

Coffee Jones
Jul 4, 2004

16 bit? Back when we was kids we only got a single bit on Christmas, as a treat
And we had to share it!


that makes sense, FWIW, they had Dave Cutler of NT kernel fame working on Xbox.

Anything to prevent Mechwarrior or Twilight hacks where an adversarially crafted save file is the first step to having a softmodded console.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply