Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
movax
Aug 30, 2008



karoshi posted:

Games won't support a feature for 0.001% of the market. In the productivity space support might trickle down from the Xeon line, like maybe AI apps that already support AVX512 on the ~cloud~. Or video encoding libraries used by the content providers.

If consoles donít support it (definitely CPU-wise), they wonít bother with it.

Also, yes, Zoom virtual backgrounds are the first application I have encountered where I simply cannot run it on a 2600K. 10 years and itís a goddamned virtual background feature on a videoconferencing application.

I am not a graphics guy and I understand they are cross-platform but... points at OpenGL even iGPUs should be able to trivially do that task, right!?!

Adbot
ADBOT LOVES YOU

gradenko_2000
Oct 5, 2010



Lipstick Apathy

If you have an RTX card you can use Nvidia Broadcast so that your GPU does the background replacement regardless of your CPU but... well...

WhyteRyce
Dec 30, 2001



When I found out about the AVX requirement I got a huge chuckle because it took a global pandemic to finally make it a thing that was useful for the average everyday person. Congrats Intel.

And then I chuckled again when I realized a lot of people had laptops that didn't support it.

gradenko_2000
Oct 5, 2010



Lipstick Apathy

WhyteRyce posted:

When I found out about the AVX requirement I got a huge chuckle because it took a global pandemic to finally make it a thing that was useful for the average everyday person. Congrats Intel.

And then I chuckled again when I realized a lot of people had laptops that didn't support it.

This is also why the next generation of Pentiums are getting AVX2 after having them gated behind the Core series for the longest time

It doesn't help though that Zoom's CPU support can be spotty and you might not still get their background to work even if your CPU is supposed to have those instructions. My Broadwell laptop and my Athlon 200GE couldn't do it.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING



Epiphyte posted:

What actually uses AVX-512 in home use?

This is rather off to the side, but I learned yesterday from the ARMv9 announcement that there is an emerging standard to vector ops (alternative to Intel's AVX/2/512 and Arm's own Neon) called SVE2 (Scalable Vector Extensions 2).

It allows vector ops on data of any width from 128 bits to 2048 bits, in 128 bit increments, on any CPU which implements SVE2, regardless of the native bit-ness of that CPU's vector circuitry. I assume that there are large performance hits for performing ops wider than the native width of your hardware -- as we saw with Zen/Zen+ CPUs which could dispatch AVX2 ops, but took two cycles to do so rather than one.

At the moment there is exactly one CPU in the world which implements this: the Fujitsu A64FX, which is used in the Fukagu supercomputer. But in about 18 months it will be supported by all new Arm CPUs, and it would be cool if everyone settled on this rather than a never-ending series of vendor-specific extensions. (Lol.)

repiv
Aug 13, 2009



I think the idea with SVE is the application queries the natural vector width of the hardware and works around that, rather than hard-coding a particular width and expecting the hardware to deal with it

In the docs the vector types are defined like "svfloat32_t", which tells you it's a vector of 32-bit floats but not how many there are, because that's unknown until runtime

mobby_6kl
Aug 9, 2009

"You are the best poster... do not let anyone say otherwise."



Let's do virtual backgrounds with 3D particle movement then

MaxxBot
Oct 6, 2003

you could have clapped

you should have clapped!!


The 11400 is the one actually good RKL SKU but Intel must be very uninterested in selling it since they didn't give it to any reviewers.

https://www.youtube.com/watch?v=upGjxnGaJeI

Paul MaudDib
May 2, 2006

"Tell me of your home world, Usul"


the pissing and moaning about AVX-512 is going to look bad in retrospect in a couple years. AMD is implementing it on Zen4 next year, meaning it'll finally be available in all product segments on both brands.

The problem has always been "why write codepaths for hardware that doesn't exist": previously it's only been in servers, which is why you only saw HPC get written around it. It got added to laptops about 18 months ago, but only on quad-core ultrabooks, which aren't exactly where you do tons of heavy vector math, and only Intel at that. This is the first time it's been available on the desktop outside the Skylake-X HEDT processors which were lol and had a ton of performance gotchas that no longer exist on the new implementations.

GPUs can't fully replace AVX, for example nobody has ever been able to port x264 or x265 with their heavily branching codepaths, instead everyone in that segment has been forced to use hardware ASIC/SIP accelerator cores which up until recently had significantly worse quality. Even today a deep motion search (eg veryslow) is still better than even the best NVENC cores (which approximate "medium" quality motion search. Which isn't to say that NVENC is bad but there are certainly workloads where you can't just drop in a GPU and call it a day.

sending it off to a GPU also adds a lot of latency, which can be bad for something like inferencing where the inference is part of some larger computation. like, I don't know, maybe if you wanted to have a game where each unit runs an inference to decide what they should be doing. maybe if you are making a lot of runs of it, you can batch it to amortize it across a lot of units of computation, but maybe not. and these use-cases aren't particularly a problem anymore with downclocking since that really no longer exists on ice lake or rocket lake. there is a reason that VNNI instructions were a specific focus to get added in AVX-512.

love the armchair experts (linus included) who think they know better than the experts at AMD and Intel who decided to write it and implement the instruction set. Even ARM, who had the opportunity to do it from scratch, is still doing NEON and SVE, because vector math is just very useful to have, as long as there's not a bunch of performance gotchas to using it.

Paul MaudDib fucked around with this message at 21:31 on Mar 31, 2021

Paul MaudDib
May 2, 2006

"Tell me of your home world, Usul"


movax posted:

If consoles donít support it (definitely CPU-wise), they wonít bother with it.

Also, yes, Zoom virtual backgrounds are the first application I have encountered where I simply cannot run it on a 2600K. 10 years and itís a goddamned virtual background feature on a videoconferencing application.

I am not a graphics guy and I understand they are cross-platform but... points at OpenGL even iGPUs should be able to trivially do that task, right!?!

you could trivially use a "virtual camera" which outputs some other video stream or some game as a virtual webcam, yes.

figuring out where your head is in realtime, as you move, is the difficult part of the problem here, not the compositing. and that task is much more akin to something like a video encoding motion search than a compositing task. And again nobody has ever built a version of x264 or x265 motion search that works well for GPU architectures, everyone uses fixed-function hardware accelerators if they want to encode on a GPU, but it is very amenable to AVX acceleration.

it's probably valid to point out they should have written a SSE fallback (assuming SSE did what they needed, AVX and AVX2 and AVX-512 have all added new instruction types that go above and beyond just vector width) but I don't think anyone notable really cared about it as a feature until 12 months ago. maybe streamers but it was always inferior to a greenscreen or a streaming camera with depth-of-field (at which point it's trivial, just composite over the areas where the depth of field is higher). it was probably a "nobody is going to use this anyway, why should we bother spending any time on it" and then lol

Paul MaudDib fucked around with this message at 21:21 on Mar 31, 2021

BaronVanAwesome
Sep 11, 2001

I will never learn the secrets of "Increased fake female boar sp..."

Never say never, buddy.
Now you know.
Now we all know.


Kazinsal posted:

So not much point in stepping up from my 8700K then, unless I want to spend $wtf on an 11900KF. Kinda disappointing.

priznat posted:

Yah it’s pretty good, what I got too. No rush to upgrade! It might go through 3 gpu generations in my system by the time I replace it!

repiv posted:

Same, I'll probably end up hanging on to this 8700K until DDR5 matures

What up 8700K forever gang

I'm also waiting for DDR5 and will make this 8700 into a bangin Plex server one day

SCheeseman
Apr 23, 2003



I've heard AVX-512 has some use cases for emulation software, nothing specific though.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

BaronVanAwesome posted:

What up 8700K forever gang

I'm also waiting for DDR5 and will make this 8700 into a bangin Plex server one day

Going from a 2500K I am used to long lived machines. My 2500K is now my unraid/plex machine, until it dies!

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull


Paul MaudDib posted:

love the armchair experts (linus included) who think they know better than the experts at AMD and Intel who decided to write it and implement the instruction set. Even ARM, who had the opportunity to do it from scratch, is still doing NEON and SVE, because vector math is just very useful to have, as long as there's not a bunch of performance gotchas to using it.

love it when an armchair expert tries to armchair-expert a dude who once worked for Transmeta on their (admittedly unusual) x86 compatible CPU

Torvalds has many faults, but you won't find many people better positioned to critique AVX512. His notorious "I hope AVX512 dies a painful death" rant was more or less immediately followed up by him admitting it was biased and performative, but he also had several actually interesting things to say about why AVX512 might not have been the greatest choice.

IMO: 512-bit was clearly a good idea in its original context, which was an ISA extension designed for a special narrow market, HPC. (Yes, that's right, Larrabee was for HPC first and foremost - the GPU thing was a side project that the team was enthusiastic about but management wasn't.)

But when it came time to push the Larrabee work into the mainstream x86 ISA, it's possible Intel should've reduced vector width. It's one thing to devote massive resources to SIMD when assuming the workload is nearly all SIMD, because HPC, but it's another when the applications are incredibly varied and relatively few can use SIMD.

Paul MaudDib
May 2, 2006

"Tell me of your home world, Usul"


BobHoward posted:

love it when an armchair expert tries to armchair-expert a dude who once worked for Transmeta on their (admittedly unusual) x86 compatible CPU

Torvalds has many faults, but you won't find many people better positioned to critique AVX512. His notorious "I hope AVX512 dies a painful death" rant was more or less immediately followed up by him admitting it was biased and performative, but he also had several actually interesting things to say about why AVX512 might not have been the greatest choice.

IMO: 512-bit was clearly a good idea in its original context, which was an ISA extension designed for a special narrow market, HPC. (Yes, that's right, Larrabee was for HPC first and foremost - the GPU thing was a side project that the team was enthusiastic about but management wasn't.)

But when it came time to push the Larrabee work into the mainstream x86 ISA, it's possible Intel should've reduced vector width. It's one thing to devote massive resources to SIMD when assuming the workload is nearly all SIMD, because HPC, but it's another when the applications are incredibly varied and relatively few can use SIMD.

and see there's nothing wrong with the point that maybe 512b is too wide (implementing it in 2 cycles is fine) but that's not the nuance he made in his original post, or that everyone cites him over. What people cite him on is "AVX-512 = bad", not "dual 512-bits is too wide, but the instructions are a step forward in many respects and furthermore..."

it's linus, he's a complete shithead in general (the "oh I'm just blunt, it's just the way I am! maybe you're just too thin-skinned!" is the same thing every toxic engineer/manager always says), but with internet culture the way it is, the soundbyte is all that matters. if he didn't think AVX-512 was a mistake he shouldn't have (true to his usual form) brashly stated exactly that in exactly as many words.

again, if AVX-512 was a mistake then AMD wouldn't be going ahead and implementing it too. They saw all the feedback and design flaws with the early implementations and went ahead and pursued it anyway. because it's worth pursuing in general, even if maybe you don't go for 1024 bits worth of vectors and you keep it so that it doesn't have to downclock.

but oh I guess linus worked on a failed processor that one time, he's smarter than Jim Keller and Lisa Su right? the people that have billions of dollars of revenue riding on these design decisions, they don't know what they're talking about!

linus is a self-declared "filesystems guy" too and yet he did the 'ZFS is a meme and nobody should use it, use BTRFS instead' thing too (just ignore the "not ready for production, may cause data loss" on half the features). What's that law about "when the news mis-reports some topic that you know about, you chuckle, but they're just as likely to mis-report on other topics and you don't realize it because you don't know about that topic"? Well, anyone who's used ZFS in production or knows the state of btrfs knows that Linus doesn't know what he was talking about there, and maybe it should give you pause when he opines on other things he thinks he's an expert on. He is a project manager, he is an engineer who works on kernel code, those are the things you should listen to him on.

not that any of this makes rocket lake good in general, it's obvious that AVX isn't the advantage Intel needs here, but it's going to be on both platforms next year whether people here like it or not, and we really should be moving past the stage where we have to care about whatever hyperbolic thing falls out of Linus's mouth this week, unless it's kernel-related. it doesn't matter what he thinks about global warming, it doesn't matter what he thinks about AVX-512, it's happening regardless of what he thinks.

anyway, I'm not armchair-experting anything, I'm deferring to the experts who think it's worth sinking a lot of money and silicon into implementing. Linus is the one who is making a positive claim that AVX-512 is benchmarkeetering. I strongly doubt Lisa Su is doing it just for a couple benchmark wins if it's not going to be something that actually sells processors.

Paul MaudDib fucked around with this message at 23:56 on Mar 31, 2021

Kazinsal
Dec 13, 2011






BobHoward posted:

love it when an armchair expert tries to armchair-expert a dude who once worked for Transmeta on their (admittedly unusual) x86 compatible CPU

man you dissed an intel product in the intel thread, that's the fuckin bat signal for paul to blow a big team blue load all over the thread that we'll be cleaning out of nooks and crannies for days

(unironically would love to hear more about the innards of transmeta's CPUs if you're allowed to talk about it. might make a good discussion for the non-Intel non-AMD thread if I can get around to finishing an OP for it)

Paul MaudDib
May 2, 2006

"Tell me of your home world, Usul"


Kazinsal posted:

man you dissed an intel product in the intel thread, that's the fuckin bat signal for paul to blow a big team blue load all over the thread that we'll be cleaning out of nooks and crannies for days

lol, who is defending an intel product? I'm defending an amd product here! The Intel one is kinda trash, but it seems likely AMD is going to do it better next year.

AMD presumably thinks so too, seeing as they invested a lot of money to do it. Think Lisa Su is the type to waste money on winning a few benchmarks, if it's not going to be something that actually sells processors?

anyway, if you don't want to read the forums then please don't, i wouldn't want that on my conscience

Paul MaudDib fucked around with this message at 23:57 on Mar 31, 2021

redeyes
Sep 14, 2002
I LOVE THE WHITE STRIPES!

Intel are cheaper and available so whatever.

LRADIKAL
Jun 10, 2001
$10


Fun Shoe

Paul MaudDib posted:

lol, who is defending an intel product? I'm defending an amd product here! The Intel one is kinda trash, but it seems likely AMD is going to do it better next year.

AMD presumably thinks so too, seeing as they invested a lot of money to do it. Think Lisa Su is the type to waste money on winning a few benchmarks, if it's not going to be something that actually sells processors?

anyway, if you don't want to read the forums then please don't, i wouldn't want that on my conscience

Despite all the interesting things you know, your predictable rants and boring, pedantic walls of text make these threads worse. Your desire to be correct outweighs the positives of your knowledge.

canyoneer
Sep 13, 2005


I only have canyoneyes for you


Linus Torvalds, who's that? Is he trying to be like the Tech Tips guy?

Palladium
May 8, 2012


MaxxBot posted:

The 11400 is the one actually good RKL SKU but Intel must be very uninterested in selling it since they didn't give it to any reviewers.

https://www.youtube.com/watch?v=upGjxnGaJeI

Now that B560 mobos also have fully unlocked memory OCing, the 11400F is screaming deal for games. I certainly would take that over a Ryzen 3600.

WhyteRyce
Dec 30, 2001



Linus is a smart guy who has done way more than I ever could but I've worked with enough engineers like him that I will on principal never agree to any argument made that attempts to appeal to his authority.

All of a sudden I'm transported back to some conference room having the most aggravating conversations

WhyteRyce fucked around with this message at 15:51 on Apr 1, 2021

Khorne
May 1, 2002

Goonstone Champ x2

Paul is ultimately correct about AVX512.

AVX512 sucked because of all the dumb decisions around its emerging standard and implementation.

AVX512, and vector extensions in general, are very useful and it's great we are going to see a unified standard across the stack and from both companies.

WhyteRyce posted:

Linus is a smart guy who has done way more than I ever could but I've worked with enough engineers like him that I will on principal never agree to any argument made that attempts to appeal to his authority.

All of a sudden I'm transported back to some conference room having the most aggravating conversations
There are lots of people who are talented in one domain and way more confident than they should be when speaking about things only partially within their domain. It does not help that the human brain abstracts away and reduces complexity so something like "after updating this complex fluid dynamics simulation to the new version it is not outputting data when using this custom module that used to work" gets answered with "it just uses the navier-stokes equation so compare the code to that" by a top-of-their-field phd physicist who is an alleged co-author of the software. Clearly it's an integration issue to anyone with the tiniest bit of software engineering experience and "check the physics equations which haven't changed in the code" is not a helpful thing to say to a clueless grad student attempting to get help.

Communication and correctly identifying what's going on are both real hard and are huge, constant problems in all aspects of life. I'm fairly sure we even suck at communicating directly with ourselves.

Khorne fucked around with this message at 19:47 on Apr 1, 2021

Icept
Jul 11, 2001



Khorne posted:

"it just uses the navier-stokes equation so compare the code to that" by a top-of-their-field phd physicist who is an alleged co-author of the software.

I've found that the key to dealing with these people or anyone who uses the phrase "it's just ..." is to put them solely in charge of fixing it.

Either they're right, and the thing gets fixed, or they have to adjust their attitude and start collaborating.

BlankSystemDaemon
Mar 13, 2009

System Access Node Not Found



Linus probably isn't the best person to talk about this, since he's an OS kernel developer.
It's extremely dubious whether AVX*, MMX, or even SSE is of use in a kernel, since almost everything you're doing is a matter of short runs where the latency is more important than throughput - and for the specific instructions with AVX*, MMX or SSE that you're trying to use, it will end up adding more time than just doing a regular calculation using the ALU or FPU.

Anything SIMD or vector-like can be done in userspace, and as an added bonus, this makes it easy to set the affinity so that those processes never get moved off the CPU by the scheduler.

SwissArmyDruid
Feb 14, 2014



BlankSystemDaemon posted:

Linus probably isn't the best person to talk about this, since he's an OS kernel developer.
It's extremely dubious whether AVX*, MMX, or even SSE is of use in a kernel, since almost everything you're doing is a matter of short runs where the latency is more important than throughput - and for the specific instructions with AVX*, MMX or SSE that you're trying to use, it will end up adding more time than just doing a regular calculation using the ALU or FPU.

Anything SIMD or vector-like can be done in userspace, and as an added bonus, this makes it easy to set the affinity so that those processes never get moved off the CPU by the scheduler.

Not emptyquoting. Besides, it's not like Linus is saying that Linux won't support AVX instructions, is he? As long as the operating system itself stays the hell out of the way of what is actually being done on said OS, everything's gravy. I don't think anyone ever expected Linux to use AVX for the kernel internals, after all.

ConanTheLibrarian
Aug 13, 2004


dis buch is late

Fallen Rib

His point was more around the alternative uses that AVX512 silicon could be put to. It's a fair question to ask, and Apple's approach of lots of execution units shows how it can pay off in a way that benefits multiple workload types.

shrike82
Jun 11, 2005



Aren't GPUs better for a lot of the use cases that AVX-512 was designed for? i guess there's the case where the CPU<->GPU travel time is too expensive for a real-time application but i wonder how often that's a bottleneck.

Updates for CUDA also seem a lot "cleaner" than the labyrinth of mapping Intel CPU to AVX support and coding specific paths for them

lurksion
Mar 21, 2013


Intel's math libraries use AVX512 and apparently it does bring good benefits to data science work.

SCheeseman posted:

I've heard AVX-512 has some use cases for emulation software, nothing specific though.
Apparently it will be making it into modern emulators e.g. yuzu, rpcs3, etc and will be quite useful for preserving accuracy with less slowdown?
https://www.reddit.com/r/emulation/comments/lzfpz5/what_are_the_implications_of_avx512_for_emulation/

And if AMD's implementing it it might not get orphaned like TSX on rpcs3

lurksion fucked around with this message at 00:00 on Apr 2, 2021

shrike82
Jun 11, 2005



lurksion posted:

Intel's math libraries use AVX512 and apparently it does bring good benefits to data science work.

Apparently it will be making it into modern emulators e.g. yuzu, rpcs3, etc and will be quite useful for preserving accuracy with less slowdown?
https://www.reddit.com/r/emulation/comments/lzfpz5/what_are_the_implications_of_avx512_for_emulation/

And if AMD's implementing it it might not get orphaned like TSX on rpcs3

that thread kinda highlights the various issues with it -

quote:

This is the issue with AVX-512; it's really a large family of loosely related instructions and should've been rolled out in smaller waves e.g. AVX-512A, 512B, etc. or even given different names. For example, BF16 is part of the AVX-512 suite despite seeing very bespoke implementations.

Instead, we have the current patchwork quilt of AVX-512 instruction support, and due to Intel's broken roadmap, we have quad-core laptop CPUs which support more AVX-512 instructions than their their desktop and server chips. It's not immediately obvious what CPU supports what; you need to consult the lookup table, where you see that Cooper Lake (Skylake) supports BF16 while Ice Lake (Sunny Cove) does not...but Cooper Lake, which is newer than Ice Lake, is missing IFMA, VBMI and 4FMAPs, which Ice Lake (Sunny Cove) has....

It's a goddamn mess.

quote:

Yeah, speaking off the record from professional experience: for our particular vector workloads on the hardware we happen to use itís faster to disable AVX512 because the change in thermals causes clock throttling that leads to a net performance reduction.

This is in a latency-critical system with a large number of processes though, and itís the other processes that create the net result.

So Iím hopeful for the future where AVX512 support has matured such that even this sort of case isnít a consideration. And itís easy to believe that the extensions are already a big win for suitable workloads.

quote:

Oh, most important. Rocket Lake successor, Alder Lake, is rumoured to NOT have AVX-512 support. And I have no idea about Meteor Lake. So, AVX-512 has a non-zero risk of being orphaned on desktop (This is actually the second time that Intel tried to introduce AVX-512 to consumers if you count the ill fated 10nm Cannonlake). Intel already announced AMX (Advanced Matrix Extensions) for the server Sapphire Rapids, and the HEDT line based on it will surely have it, too, in the same way that AVX-512 was supported on it while it took years in desktop.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull


Kazinsal posted:

man you dissed an intel product in the intel thread, that's the fuckin bat signal for paul to blow a big team blue load all over the thread that we'll be cleaning out of nooks and crannies for days

(unironically would love to hear more about the innards of transmeta's CPUs if you're allowed to talk about it. might make a good discussion for the non-Intel non-AMD thread if I can get around to finishing an OP for it)

If "allowed to talk about it" means you think I worked for Transmeta, just to be clear, I did not.

I don't know a ton about Transmeta's architecture, other than it was a VLIW machine. They relied on their "Code Morphing System," a JIT, to translate x86 code to this proprietary VLIW ISA. The combo of CPU and low level firmware functioned like a real x86 - the native ISA wasn't documented, and iirc they took steps to prevent you from even trying to run native code yourself.

Despite the protection, I recall people had some success at reverse engineering the native ISA.

SwissArmyDruid posted:

Not emptyquoting. Besides, it's not like Linus is saying that Linux won't support AVX instructions, is he? As long as the operating system itself stays the hell out of the way of what is actually being done on said OS, everything's gravy. I don't think anyone ever expected Linux to use AVX for the kernel internals, after all.

~Technically~ the OS does have to support AVX - the scheduler has to save and restore its registers when context switching.

I think you're not allowed to use AVX registers inside the kernel, since that permits them to avoid saving/restoring context for every system call. AVX registers hold enough data that it's an important optimization (syscalls need to be very low latency).

hobbesmaster
Jan 28, 2008



BobHoward posted:

If "allowed to talk about it" means you think I worked for Transmeta, just to be clear, I did not.

I don't know a ton about Transmeta's architecture, other than it was a VLIW machine. They relied on their "Code Morphing System," a JIT, to translate x86 code to this proprietary VLIW ISA. The combo of CPU and low level firmware functioned like a real x86 - the native ISA wasn't documented, and iirc they took steps to prevent you from even trying to run native code yourself.

Despite the protection, I recall people had some success at reverse engineering the native ISA.


~Technically~ the OS does have to support AVX - the scheduler has to save and restore its registers when context switching.

I think you're not allowed to use AVX registers inside the kernel, since that permits them to avoid saving/restoring context for every system call. AVX registers hold enough data that it's an important optimization (syscalls need to be very low latency).

Technically you are allowed but you need a drat good reason. https://yarchive.net/comp/linux/kernel_fp.html
Crypto code is about it iirc

repiv
Aug 13, 2009



BobHoward posted:

I think you're not allowed to use AVX registers inside the kernel, since that permits them to avoid saving/restoring context for every system call. AVX registers hold enough data that it's an important optimization (syscalls need to be very low latency).

The main syscall handler doesn't preserve the state of any floating point or SIMD registers yeah, but you are allowed to use SIMD/FP instructions in the kernel. It just needs to be wrapped in code that pushes/pops that state manually.

I know there's a bunch of AVX crypto code in the kernel and I think some of the software RAID stuff uses it as well.

e: oops had this page open for a while and didn't refresh

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE


shrike82 posted:

Aren't GPUs better for a lot of the use cases that AVX-512 was designed for? i guess there's the case where the CPU<->GPU travel time is too expensive for a real-time application but i wonder how often that's a bottleneck.

Updates for CUDA also seem a lot "cleaner" than the labyrinth of mapping Intel CPU to AVX support and coding specific paths for them
I've seen AVX512 used in image/video processing (resizing, bitdepth conversion, gamma curves, tone mapping, all the usual matrix math stuff) but the performance wasn't that impressive, at least not on the Skylake-X system it was developed on. IIRC it was like 30% faster than AVX2? You got like twice as many pixels per clock cycle as with AVX/AVX2 on paper in many cases, but it definitely didn't get twice as fast in practice. I don't write this kind of stuff myself, I just hang out with people who do, so this is hearsay and take it as you will.

Of course this sort of stuff can be done on a GPU as well but these image operations are usually part of some bigger processing pipeline and it gets obnoxious to transfer the image back and forth between CPU and GPU for each pipeline step depending on how it's implemented, and a lot of filters aren't written for GPU processing, so there's a lot of value in doing these things on the CPU still.

e: here's a spooky mix of C++ templates, C preprocessor macros and avx512 intrinsics if anyone is curious

TheFluff fucked around with this message at 02:37 on Apr 2, 2021

gradenko_2000
Oct 5, 2010



Lipstick Apathy

https://www.youtube.com/watch?v=oaB1WuFUAtw

some perspective here from Dr Ian Cutress about how Rocket Lake might be regarded as a win for Intel because it demonstrates that their design teams still have the chops to do new designs, since doing the "backport" of the 10nm cores into 14nm is not an easy thing to do, regardless of the actual performance, and that they're going to have to learn to do this sort of thing more often given that their plans involve working with other fabs beyond just their own.

I don't know if I really buy that reasoning, mostly because A. Rocket Lake was already late/delayed in the first place, and B. the reason why Intel is having to need to learn to work with other fabs is because they've had a hell of a time moving on from 14nm, though I thought the argument was interesting

Nomyth
Mar 15, 2013

And if a Nyto get a attitude
Pop it like it's hot
Pop it like it's hot
Pop it like it's hot


Everything's a win when you're a huge megacorp with too much inertia. If it wasn't a win, heads would be rolling

Cygni
Nov 12, 2005

raring to post



They are clearly at the absolute edge of what 14nm can give them, so I agree that I don't think the the architecture design team is really to blame honestly.

The design folks made Cannon Lake with the intention for it to launch on the original 10nm in 2015. That first iteration of 10nm pretty much completely failed as a node, so woops! Eat poo poo Cannon Lake, thanks for all the work design team, the thing is broken from day 1 thanks to manufacturing! Then they made Ice Lake and Tiger Lake, both of which are good performers architecturally but on 10nm V2, which while at least functional, never hit the targets it was supposed to hit. Ice Lake was supposed to launch in 2016... it still hasnt launched on server. Insane, and likely a result of the yields never ramping to make massive server dies profitable like planned. It also never hit the frequencies I believe they intended. So while better, still pretty much a huge let down by manufacturing.

I personally think Rocket Lake exists mostly because the design team would otherwise be sittin on their rear end, cause the manufacturing side is half a decade behind. So now you get "codesign", because Intel has realized that going all in and betting on the manufacturing folks to deliver is a bad idea in a world where each manufacturing improvement is going to get harder and harder and riskier and riskier.

Pretty much the worst thing that I think you can lay at the feet of design is spectre/meltdown, but that seems considerably mitigated if the nodes had actually gotten poo poo out on the intended schedule. Instead, they spent 6 years grafting additional cores to Skylake and doing band-aid fixes on a design that was supposed to have been replaced years ago.

(i likely dont know what im talking about, so take this whole thing as just a web forum rant)

gradenko_2000
Oct 5, 2010



Lipstick Apathy

https://www.youtube.com/watch?v=LYdHTSQxdCM

Gamers Nexus has a review up of the i5-11400, the non-overclockable Rocket Lake six-core, and it comes really dang close to a 5600X despite being a over hundred bucks cheaper

or, put another way, is significantly faster than a Ryzen 5 3600 on top of being 20-40 bucks cheaper

Ika
Dec 30, 2004
Pure insanity



gradenko_2000 posted:

https://www.youtube.com/watch?v=LYdHTSQxdCM

Gamers Nexus has a review up of the i5-11400, the non-overclockable Rocket Lake six-core, and it comes really dang close to a 5600X despite being a over hundred bucks cheaper

or, put another way, is significantly faster than a Ryzen 5 3600 on top of being 20-40 bucks cheaper

Interesting. Over in euro land the i7 11700 non K is already available for the same price as a 5600X as well. Wonder if prices will start to fall soon.

Adbot
ADBOT LOVES YOU

Fantastic Foreskin
Jan 6, 2013

A golden helix streaked skyward from the Helvault. A thunderous explosion shattered the silver monolith and Avacyn emerged, free from her prison at last.



As someone who only has man-on-the-street level knowledge of chip fab, can someone explain to me what exactly it means for a node/process to fail, and how one does it for 5 years straight?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply