Intel: lol

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Intel: lol

movax: Aug 30, 2008

karoshi posted:

Games won't support a feature for 0.001% of the market. In the productivity space support might trickle down from the Xeon line, like maybe AI apps that already support AVX512 on the ~cloud~. Or video encoding libraries used by the content providers.

If consoles don�t support it (definitely CPU-wise), they won�t bother with it.

Also, yes, Zoom virtual backgrounds are the first application I have encountered where I simply cannot run it on a 2600K. 10 years and it�s a goddamned virtual background feature on a videoconferencing application.

I am not a graphics guy and I understand they are cross-platform but... points at OpenGL even iGPUs should be able to trivially do that task, right!?!

# ? Mar 31, 2021 17:44

Adbot: ADBOT LOVES YOU

# ? Apr 16, 2024 17:49

gradenko_2000: Oct 5, 2010; HELL SERPENT; Lipstick Apathy

If you have an RTX card you can use Nvidia Broadcast so that your GPU does the background replacement regardless of your CPU but... well...

# ? Mar 31, 2021 17:46

WhyteRyce: Dec 30, 2001

When I found out about the AVX requirement I got a huge chuckle because it took a global pandemic to finally make it a thing that was useful for the average everyday person. Congrats Intel.

And then I chuckled again when I realized a lot of people had laptops that didn't support it.

# ? Mar 31, 2021 17:49

gradenko_2000: Oct 5, 2010; HELL SERPENT; Lipstick Apathy

WhyteRyce posted:

When I found out about the AVX requirement I got a huge chuckle because it took a global pandemic to finally make it a thing that was useful for the average everyday person. Congrats Intel.

And then I chuckled again when I realized a lot of people had laptops that didn't support it.

This is also why the next generation of Pentiums are getting AVX2 after having them gated behind the Core series for the longest time

It doesn't help though that Zoom's CPU support can be spotty and you might not still get their background to work even if your CPU is supposed to have those instructions. My Broadwell laptop and my Athlon 200GE couldn't do it.

# ? Mar 31, 2021 17:52

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Epiphyte posted:

What actually uses AVX-512 in home use?

This is rather off to the side, but I learned yesterday from the ARMv9 announcement that there is an emerging standard to vector ops (alternative to Intel's AVX/2/512 and Arm's own Neon) called SVE2 (Scalable Vector Extensions 2).

It allows vector ops on data of any width from 128 bits to 2048 bits, in 128 bit increments, on any CPU which implements SVE2, regardless of the native bit-ness of that CPU's vector circuitry. I assume that there are large performance hits for performing ops wider than the native width of your hardware -- as we saw with Zen/Zen+ CPUs which could dispatch AVX2 ops, but took two cycles to do so rather than one.

At the moment there is exactly one CPU in the world which implements this: the Fujitsu A64FX, which is used in the Fukagu supercomputer. But in about 18 months it will be supported by all new Arm CPUs, and it would be cool if everyone settled on this rather than a never-ending series of vendor-specific extensions. (Lol.)

# ? Mar 31, 2021 17:55

repiv: Aug 13, 2009

I think the idea with SVE is the application queries the natural vector width of the hardware and works around that, rather than hard-coding a particular width and expecting the hardware to deal with it

In the docs the vector types are defined like "svfloat32_t", which tells you it's a vector of 32-bit floats but not how many there are, because that's unknown until runtime

# ? Mar 31, 2021 18:56

mobby_6kl: Aug 9, 2009; by Fluffdaddy

gradenko_2000 posted:

AVX2 afaik

Let's do virtual backgrounds with 3D particle movement then

# ? Mar 31, 2021 20:12

MaxxBot: Oct 6, 2003; you could have clapped

you should have clapped!!

The 11400 is the one actually good RKL SKU but Intel must be very uninterested in selling it since they didn't give it to any reviewers.

https://www.youtube.com/watch?v=upGjxnGaJeI

# ? Mar 31, 2021 20:42

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

the pissing and moaning about AVX-512 is going to look bad in retrospect in a couple years. AMD is implementing it on Zen4 next year, meaning it'll finally be available in all product segments on both brands.

The problem has always been "why write codepaths for hardware that doesn't exist": previously it's only been in servers, which is why you only saw HPC get written around it. It got added to laptops about 18 months ago, but only on quad-core ultrabooks, which aren't exactly where you do tons of heavy vector math, and only Intel at that. This is the first time it's been available on the desktop outside the Skylake-X HEDT processors which were lol and had a ton of performance gotchas that no longer exist on the new implementations.

GPUs can't fully replace AVX, for example nobody has ever been able to port x264 or x265 with their heavily branching codepaths, instead everyone in that segment has been forced to use hardware ASIC/SIP accelerator cores which up until recently had significantly worse quality. Even today a deep motion search (eg veryslow) is still better than even the best NVENC cores (which approximate "medium" quality motion search. Which isn't to say that NVENC is bad but there are certainly workloads where you can't just drop in a GPU and call it a day.

sending it off to a GPU also adds a lot of latency, which can be bad for something like inferencing where the inference is part of some larger computation. like, I don't know, maybe if you wanted to have a game where each unit runs an inference to decide what they should be doing. maybe if you are making a lot of runs of it, you can batch it to amortize it across a lot of units of computation, but maybe not. and these use-cases aren't particularly a problem anymore with downclocking since that really no longer exists on ice lake or rocket lake. there is a reason that VNNI instructions were a specific focus to get added in AVX-512.

love the armchair experts (linus included) who think they know better than the experts at AMD and Intel who decided to write it and implement the instruction set. Even ARM, who had the opportunity to do it from scratch, is still doing NEON and SVE, because vector math is just very useful to have, as long as there's not a bunch of performance gotchas to using it.

Paul MaudDib fucked around with this message at 22:31 on Mar 31, 2021

# ? Mar 31, 2021 22:12

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

movax posted:

If consoles don�t support it (definitely CPU-wise), they won�t bother with it.

Also, yes, Zoom virtual backgrounds are the first application I have encountered where I simply cannot run it on a 2600K. 10 years and it�s a goddamned virtual background feature on a videoconferencing application.

I am not a graphics guy and I understand they are cross-platform but... points at OpenGL even iGPUs should be able to trivially do that task, right!?!

you could trivially use a "virtual camera" which outputs some other video stream or some game as a virtual webcam, yes.

figuring out where your head is in realtime, as you move, is the difficult part of the problem here, not the compositing. and that task is much more akin to something like a video encoding motion search than a compositing task. And again nobody has ever built a version of x264 or x265 motion search that works well for GPU architectures, everyone uses fixed-function hardware accelerators if they want to encode on a GPU, but it is very amenable to AVX acceleration.

it's probably valid to point out they should have written a SSE fallback (assuming SSE did what they needed, AVX and AVX2 and AVX-512 have all added new instruction types that go above and beyond just vector width) but I don't think anyone notable really cared about it as a feature until 12 months ago. maybe streamers but it was always inferior to a greenscreen or a streaming camera with depth-of-field (at which point it's trivial, just composite over the areas where the depth of field is higher). it was probably a "nobody is going to use this anyway, why should we bother spending any time on it" and then lol

Paul MaudDib fucked around with this message at 22:21 on Mar 31, 2021

# ? Mar 31, 2021 22:19

BaronVanAwesome: Sep 11, 2001; I will never learn the secrets of "Increased fake female boar sp..."

Never say never, buddy.
Now you know.
Now we all know.

Kazinsal posted:

So not much point in stepping up from my 8700K then, unless I want to spend $wtf on an 11900KF. Kinda disappointing.

priznat posted:

Yah it’s pretty good, what I got too. No rush to upgrade! It might go through 3 gpu generations in my system by the time I replace it!

repiv posted:

Same, I'll probably end up hanging on to this 8700K until DDR5 matures

What up 8700K forever gang

I'm also waiting for DDR5 and will make this 8700 into a bangin Plex server one day

# ? Mar 31, 2021 23:07

SCheeseman: Apr 23, 2003

I've heard AVX-512 has some use cases for emulation software, nothing specific though.

# ? Mar 31, 2021 23:15

priznat: Jul 7, 2009; Let's get drunk and kiss each other all night.

BaronVanAwesome posted:

What up 8700K forever gang

I'm also waiting for DDR5 and will make this 8700 into a bangin Plex server one day

Going from a 2500K I am used to long lived machines. My 2500K is now my unraid/plex machine, until it dies!

# ? Mar 31, 2021 23:27

BobHoward: Feb 13, 2012; The only thing white people deserve is a bullet to their empty skull

Paul MaudDib posted:

love the armchair experts (linus included) who think they know better than the experts at AMD and Intel who decided to write it and implement the instruction set. Even ARM, who had the opportunity to do it from scratch, is still doing NEON and SVE, because vector math is just very useful to have, as long as there's not a bunch of performance gotchas to using it.

love it when an armchair expert tries to armchair-expert a dude who once worked for Transmeta on their (admittedly unusual) x86 compatible CPU

Torvalds has many faults, but you won't find many people better positioned to critique AVX512. His notorious "I hope AVX512 dies a painful death" rant was more or less immediately followed up by him admitting it was biased and performative, but he also had several actually interesting things to say about why AVX512 might not have been the greatest choice.

IMO: 512-bit was clearly a good idea in its original context, which was an ISA extension designed for a special narrow market, HPC. (Yes, that's right, Larrabee was for HPC first and foremost - the GPU thing was a side project that the team was enthusiastic about but management wasn't.)

But when it came time to push the Larrabee work into the mainstream x86 ISA, it's possible Intel should've reduced vector width. It's one thing to devote massive resources to SIMD when assuming the workload is nearly all SIMD, because HPC, but it's another when the applications are incredibly varied and relatively few can use SIMD.

# ? Mar 31, 2021 23:39

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

BobHoward posted:

love it when an armchair expert tries to armchair-expert a dude who once worked for Transmeta on their (admittedly unusual) x86 compatible CPU

Torvalds has many faults, but you won't find many people better positioned to critique AVX512. His notorious "I hope AVX512 dies a painful death" rant was more or less immediately followed up by him admitting it was biased and performative, but he also had several actually interesting things to say about why AVX512 might not have been the greatest choice.

IMO: 512-bit was clearly a good idea in its original context, which was an ISA extension designed for a special narrow market, HPC. (Yes, that's right, Larrabee was for HPC first and foremost - the GPU thing was a side project that the team was enthusiastic about but management wasn't.)

But when it came time to push the Larrabee work into the mainstream x86 ISA, it's possible Intel should've reduced vector width. It's one thing to devote massive resources to SIMD when assuming the workload is nearly all SIMD, because HPC, but it's another when the applications are incredibly varied and relatively few can use SIMD.

and see there's nothing wrong with the point that maybe 512b is too wide (implementing it in 2 cycles is fine) but that's not the nuance he made in his original post, or that everyone cites him over. What people cite him on is "AVX-512 = bad", not "dual 512-bits is too wide, but the instructions are a step forward in many respects and furthermore..."

it's linus, he's a complete shithead in general (the "oh I'm just blunt, it's just the way I am! maybe you're just too thin-skinned!" is the same thing every toxic engineer/manager always says), but with internet culture the way it is, the soundbyte is all that matters. if he didn't think AVX-512 was a mistake he shouldn't have (true to his usual form) brashly stated exactly that in exactly as many words.

again, if AVX-512 was a mistake then AMD wouldn't be going ahead and implementing it too. They saw all the feedback and design flaws with the early implementations and went ahead and pursued it anyway. because it's worth pursuing in general, even if maybe you don't go for 1024 bits worth of vectors and you keep it so that it doesn't have to downclock.

but oh I guess linus worked on a failed processor that one time, he's smarter than Jim Keller and Lisa Su right? the people that have billions of dollars of revenue riding on these design decisions, they don't know what they're talking about!

linus is a self-declared "filesystems guy" too and yet he did the 'ZFS is a meme and nobody should use it, use BTRFS instead' thing too (just ignore the "not ready for production, may cause data loss" on half the features). What's that law about "when the news mis-reports some topic that you know about, you chuckle, but they're just as likely to mis-report on other topics and you don't realize it because you don't know about that topic"? Well, anyone who's used ZFS in production or knows the state of btrfs knows that Linus doesn't know what he was talking about there, and maybe it should give you pause when he opines on other things he thinks he's an expert on. He is a project manager, he is an engineer who works on kernel code, those are the things you should listen to him on.

not that any of this makes rocket lake good in general, it's obvious that AVX isn't the advantage Intel needs here, but it's going to be on both platforms next year whether people here like it or not, and we really should be moving past the stage where we have to care about whatever hyperbolic thing falls out of Linus's mouth this week, unless it's kernel-related. it doesn't matter what he thinks about global warming, it doesn't matter what he thinks about AVX-512, it's happening regardless of what he thinks.

anyway, I'm not armchair-experting anything, I'm deferring to the experts who think it's worth sinking a lot of money and silicon into implementing. Linus is the one who is making a positive claim that AVX-512 is benchmarkeetering. I strongly doubt Lisa Su is doing it just for a couple benchmark wins if it's not going to be something that actually sells processors.

Paul MaudDib fucked around with this message at 00:56 on Apr 1, 2021

# ? Mar 31, 2021 23:58

Kazinsal: Dec 13, 2011

BobHoward posted:

love it when an armchair expert tries to armchair-expert a dude who once worked for Transmeta on their (admittedly unusual) x86 compatible CPU

man you dissed an intel product in the intel thread, that's the fuckin bat signal for paul to blow a big team blue load all over the thread that we'll be cleaning out of nooks and crannies for days

(unironically would love to hear more about the innards of transmeta's CPUs if you're allowed to talk about it. might make a good discussion for the non-Intel non-AMD thread if I can get around to finishing an OP for it)

# ? Apr 1, 2021 00:32

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

Kazinsal posted:

man you dissed an intel product in the intel thread, that's the fuckin bat signal for paul to blow a big team blue load all over the thread that we'll be cleaning out of nooks and crannies for days

lol, who is defending an intel product? I'm defending an amd product here! The Intel one is kinda trash, but it seems likely AMD is going to do it better next year.

AMD presumably thinks so too, seeing as they invested a lot of money to do it. Think Lisa Su is the type to waste money on winning a few benchmarks, if it's not going to be something that actually sells processors?

anyway, if you don't want to read the forums then please don't, i wouldn't want that on my conscience

Paul MaudDib fucked around with this message at 00:57 on Apr 1, 2021

# ? Apr 1, 2021 00:47

redeyes: Sep 14, 2002; by Fluffdaddy

Intel are cheaper and available so whatever.

# ? Apr 1, 2021 00:54

LRADIKAL: Jun 10, 2001; Fun Shoe

Paul MaudDib posted:

lol, who is defending an intel product? I'm defending an amd product here! The Intel one is kinda trash, but it seems likely AMD is going to do it better next year.

AMD presumably thinks so too, seeing as they invested a lot of money to do it. Think Lisa Su is the type to waste money on winning a few benchmarks, if it's not going to be something that actually sells processors?

anyway, if you don't want to read the forums then please don't, i wouldn't want that on my conscience

Despite all the interesting things you know, your predictable rants and boring, pedantic walls of text make these threads worse. Your desire to be correct outweighs the positives of your knowledge.

# ? Apr 1, 2021 02:06

canyoneer: Sep 13, 2005; I only have canyoneyes for you

Linus Torvalds, who's that? Is he trying to be like the Tech Tips guy?
:v:

# ? Apr 1, 2021 02:30

Palladium: May 8, 2012; Very Good
✔️✔️✔️✔️

MaxxBot posted:

The 11400 is the one actually good RKL SKU but Intel must be very uninterested in selling it since they didn't give it to any reviewers.

https://www.youtube.com/watch?v=upGjxnGaJeI

Now that B560 mobos also have fully unlocked memory OCing, the 11400F is screaming deal for games. I certainly would take that over a Ryzen 3600.

# ? Apr 1, 2021 03:07

WhyteRyce: Dec 30, 2001

Linus is a smart guy who has done way more than I ever could but I've worked with enough engineers like him that I will on principal never agree to any argument made that attempts to appeal to his authority.

All of a sudden I'm transported back to some conference room having the most aggravating conversations

WhyteRyce fucked around with this message at 16:51 on Apr 1, 2021

# ? Apr 1, 2021 16:49

Khorne: May 1, 2002

Paul is ultimately correct about AVX512.

AVX512 sucked because of all the dumb decisions around its emerging standard and implementation.

AVX512, and vector extensions in general, are very useful and it's great we are going to see a unified standard across the stack and from both companies.

WhyteRyce posted:

Linus is a smart guy who has done way more than I ever could but I've worked with enough engineers like him that I will on principal never agree to any argument made that attempts to appeal to his authority.

All of a sudden I'm transported back to some conference room having the most aggravating conversations

There are lots of people who are talented in one domain and way more confident than they should be when speaking about things only partially within their domain. It does not help that the human brain abstracts away and reduces complexity so something like "after updating this complex fluid dynamics simulation to the new version it is not outputting data when using this custom module that used to work" gets answered with "it just uses the navier-stokes equation so compare the code to that" by a top-of-their-field phd physicist who is an alleged co-author of the software. Clearly it's an integration issue to anyone with the tiniest bit of software engineering experience and "check the physics equations which haven't changed in the code" is not a helpful thing to say to a clueless grad student attempting to get help.

Communication and correctly identifying what's going on are both real hard and are huge, constant problems in all aspects of life. I'm fairly sure we even suck at communicating directly with ourselves.

Khorne fucked around with this message at 20:47 on Apr 1, 2021

# ? Apr 1, 2021 19:46

Icept: Jul 11, 2001

Khorne posted:

"it just uses the navier-stokes equation so compare the code to that" by a top-of-their-field phd physicist who is an alleged co-author of the software.

I've found that the key to dealing with these people or anyone who uses the phrase "it's just ..." is to put them solely in charge of fixing it.

Either they're right, and the thing gets fixed, or they have to adjust their attitude and start collaborating.

# ? Apr 1, 2021 21:26

BlankSystemDaemon: Mar 13, 2009

Linus probably isn't the best person to talk about this, since he's an OS kernel developer.
It's extremely dubious whether AVX*, MMX, or even SSE is of use in a kernel, since almost everything you're doing is a matter of short runs where the latency is more important than throughput - and for the specific instructions with AVX*, MMX or SSE that you're trying to use, it will end up adding more time than just doing a regular calculation using the ALU or FPU.

Anything SIMD or vector-like can be done in userspace, and as an added bonus, this makes it easy to set the affinity so that those processes never get moved off the CPU by the scheduler.

# ? Apr 1, 2021 21:28

SwissArmyDruid: Feb 14, 2014; by sebmojo

BlankSystemDaemon posted:

Linus probably isn't the best person to talk about this, since he's an OS kernel developer.
It's extremely dubious whether AVX*, MMX, or even SSE is of use in a kernel, since almost everything you're doing is a matter of short runs where the latency is more important than throughput - and for the specific instructions with AVX*, MMX or SSE that you're trying to use, it will end up adding more time than just doing a regular calculation using the ALU or FPU.

Anything SIMD or vector-like can be done in userspace, and as an added bonus, this makes it easy to set the affinity so that those processes never get moved off the CPU by the scheduler.

Not emptyquoting. Besides, it's not like Linus is saying that Linux won't support AVX instructions, is he? As long as the operating system itself stays the hell out of the way of what is actually being done on said OS, everything's gravy. I don't think anyone ever expected Linux to use AVX for the kernel internals, after all.

# ? Apr 1, 2021 21:57

ConanTheLibrarian: Aug 13, 2004; dis buch is late; Fallen Rib

His point was more around the alternative uses that AVX512 silicon could be put to. It's a fair question to ask, and Apple's approach of lots of execution units shows how it can pay off in a way that benefits multiple workload types.

# ? Apr 1, 2021 23:13

shrike82: Jun 11, 2005

Aren't GPUs better for a lot of the use cases that AVX-512 was designed for? i guess there's the case where the CPU<->GPU travel time is too expensive for a real-time application but i wonder how often that's a bottleneck.

Updates for CUDA also seem a lot "cleaner" than the labyrinth of mapping Intel CPU to AVX support and coding specific paths for them

# ? Apr 2, 2021 00:07

lurksion: Mar 21, 2013

Intel's math libraries use AVX512 and apparently it does bring good benefits to data science work.

SCheeseman posted:

I've heard AVX-512 has some use cases for emulation software, nothing specific though.

Apparently it will be making it into modern emulators e.g. yuzu, rpcs3, etc and will be quite useful for preserving accuracy with less slowdown?
https://www.reddit.com/r/emulation/comments/lzfpz5/what_are_the_implications_of_avx512_for_emulation/

And if AMD's implementing it it might not get orphaned like TSX on rpcs3

lurksion fucked around with this message at 01:00 on Apr 2, 2021

# ? Apr 2, 2021 00:52

shrike82: Jun 11, 2005

lurksion posted:

Intel's math libraries use AVX512 and apparently it does bring good benefits to data science work.

Apparently it will be making it into modern emulators e.g. yuzu, rpcs3, etc and will be quite useful for preserving accuracy with less slowdown?
https://www.reddit.com/r/emulation/comments/lzfpz5/what_are_the_implications_of_avx512_for_emulation/

And if AMD's implementing it it might not get orphaned like TSX on rpcs3

that thread kinda highlights the various issues with it -

quote:

This is the issue with AVX-512; it's really a large family of loosely related instructions and should've been rolled out in smaller waves e.g. AVX-512A, 512B, etc. or even given different names. For example, BF16 is part of the AVX-512 suite despite seeing very bespoke implementations.

Instead, we have the current patchwork quilt of AVX-512 instruction support, and due to Intel's broken roadmap, we have quad-core laptop CPUs which support more AVX-512 instructions than their their desktop and server chips. It's not immediately obvious what CPU supports what; you need to consult the lookup table, where you see that Cooper Lake (Skylake) supports BF16 while Ice Lake (Sunny Cove) does not...but Cooper Lake, which is newer than Ice Lake, is missing IFMA, VBMI and 4FMAPs, which Ice Lake (Sunny Cove) has....

It's a goddamn mess.

quote:

Yeah, speaking off the record from professional experience: for our particular vector workloads on the hardware we happen to use it�s faster to disable AVX512 because the change in thermals causes clock throttling that leads to a net performance reduction.

This is in a latency-critical system with a large number of processes though, and it�s the other processes that create the net result.

So I�m hopeful for the future where AVX512 support has matured such that even this sort of case isn�t a consideration. And it�s easy to believe that the extensions are already a big win for suitable workloads.

quote:

Oh, most important. Rocket Lake successor, Alder Lake, is rumoured to NOT have AVX-512 support. And I have no idea about Meteor Lake. So, AVX-512 has a non-zero risk of being orphaned on desktop (This is actually the second time that Intel tried to introduce AVX-512 to consumers if you count the ill fated 10nm Cannonlake). Intel already announced AMX (Advanced Matrix Extensions) for the server Sapphire Rapids, and the HEDT line based on it will surely have it, too, in the same way that AVX-512 was supported on it while it took years in desktop.

# ? Apr 2, 2021 01:25

BobHoward: Feb 13, 2012; The only thing white people deserve is a bullet to their empty skull

Kazinsal posted:

man you dissed an intel product in the intel thread, that's the fuckin bat signal for paul to blow a big team blue load all over the thread that we'll be cleaning out of nooks and crannies for days

(unironically would love to hear more about the innards of transmeta's CPUs if you're allowed to talk about it. might make a good discussion for the non-Intel non-AMD thread if I can get around to finishing an OP for it)

If "allowed to talk about it" means you think I worked for Transmeta, just to be clear, I did not.

I don't know a ton about Transmeta's architecture, other than it was a VLIW machine. They relied on their "Code Morphing System," a JIT, to translate x86 code to this proprietary VLIW ISA. The combo of CPU and low level firmware functioned like a real x86 - the native ISA wasn't documented, and iirc they took steps to prevent you from even trying to run native code yourself.

Despite the protection, I recall people had some success at reverse engineering the native ISA.

SwissArmyDruid posted:

Not emptyquoting. Besides, it's not like Linus is saying that Linux won't support AVX instructions, is he? As long as the operating system itself stays the hell out of the way of what is actually being done on said OS, everything's gravy. I don't think anyone ever expected Linux to use AVX for the kernel internals, after all.

~Technically~ the OS does have to support AVX - the scheduler has to save and restore its registers when context switching.

I think you're not allowed to use AVX registers inside the kernel, since that permits them to avoid saving/restoring context for every system call. AVX registers hold enough data that it's an important optimization (syscalls need to be very low latency).

# ? Apr 2, 2021 01:35

hobbesmaster: Jan 28, 2008

BobHoward posted:

If "allowed to talk about it" means you think I worked for Transmeta, just to be clear, I did not.

I don't know a ton about Transmeta's architecture, other than it was a VLIW machine. They relied on their "Code Morphing System," a JIT, to translate x86 code to this proprietary VLIW ISA. The combo of CPU and low level firmware functioned like a real x86 - the native ISA wasn't documented, and iirc they took steps to prevent you from even trying to run native code yourself.

Despite the protection, I recall people had some success at reverse engineering the native ISA.

~Technically~ the OS does have to support AVX - the scheduler has to save and restore its registers when context switching.

I think you're not allowed to use AVX registers inside the kernel, since that permits them to avoid saving/restoring context for every system call. AVX registers hold enough data that it's an important optimization (syscalls need to be very low latency).

Technically you are allowed but you need a drat good reason. https://yarchive.net/comp/linux/kernel_fp.html
Crypto code is about it iirc

# ? Apr 2, 2021 01:46

repiv: Aug 13, 2009

BobHoward posted:

I think you're not allowed to use AVX registers inside the kernel, since that permits them to avoid saving/restoring context for every system call. AVX registers hold enough data that it's an important optimization (syscalls need to be very low latency).

The main syscall handler doesn't preserve the state of any floating point or SIMD registers yeah, but you are allowed to use SIMD/FP instructions in the kernel. It just needs to be wrapped in code that pushes/pops that state manually.

I know there's a bunch of AVX crypto code in the kernel and I think some of the software RAID stuff uses it as well.

e: oops had this page open for a while and didn't refresh

# ? Apr 2, 2021 03:21

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

shrike82 posted:

Aren't GPUs better for a lot of the use cases that AVX-512 was designed for? i guess there's the case where the CPU<->GPU travel time is too expensive for a real-time application but i wonder how often that's a bottleneck.

Updates for CUDA also seem a lot "cleaner" than the labyrinth of mapping Intel CPU to AVX support and coding specific paths for them

I've seen AVX512 used in image/video processing (resizing, bitdepth conversion, gamma curves, tone mapping, all the usual matrix math stuff) but the performance wasn't that impressive, at least not on the Skylake-X system it was developed on. IIRC it was like 30% faster than AVX2? You got like twice as many pixels per clock cycle as with AVX/AVX2 on paper in many cases, but it definitely didn't get twice as fast in practice. I don't write this kind of stuff myself, I just hang out with people who do, so this is hearsay and take it as you will.

Of course this sort of stuff can be done on a GPU as well but these image operations are usually part of some bigger processing pipeline and it gets obnoxious to transfer the image back and forth between CPU and GPU for each pipeline step depending on how it's implemented, and a lot of filters aren't written for GPU processing, so there's a lot of value in doing these things on the CPU still.

e: here's a spooky mix of C++ templates, C preprocessor macros and avx512 intrinsics if anyone is curious

TheFluff fucked around with this message at 03:37 on Apr 2, 2021

# ? Apr 2, 2021 03:34

gradenko_2000: Oct 5, 2010; HELL SERPENT; Lipstick Apathy

https://www.youtube.com/watch?v=oaB1WuFUAtw

some perspective here from Dr Ian Cutress about how Rocket Lake might be regarded as a win for Intel because it demonstrates that their design teams still have the chops to do new designs, since doing the "backport" of the 10nm cores into 14nm is not an easy thing to do, regardless of the actual performance, and that they're going to have to learn to do this sort of thing more often given that their plans involve working with other fabs beyond just their own.

I don't know if I really buy that reasoning, mostly because A. Rocket Lake was already late/delayed in the first place, and B. the reason why Intel is having to need to learn to work with other fabs is because they've had a hell of a time moving on from 14nm, though I thought the argument was interesting

# ? Apr 2, 2021 05:15

Sidesaddle Cavalry: Mar 15, 2013; Oh Boy Desert Map

Everything's a win when you're a huge megacorp with too much inertia. If it wasn't a win, heads would be rolling

# ? Apr 2, 2021 06:22

Cygni: Nov 12, 2005; raring to post

They are clearly at the absolute edge of what 14nm can give them, so I agree that I don't think the the architecture design team is really to blame honestly.

The design folks made Cannon Lake with the intention for it to launch on the original 10nm in 2015. That first iteration of 10nm pretty much completely failed as a node, so woops! Eat poo poo Cannon Lake, thanks for all the work design team, the thing is broken from day 1 thanks to manufacturing! Then they made Ice Lake and Tiger Lake, both of which are good performers architecturally but on 10nm V2, which while at least functional, never hit the targets it was supposed to hit. Ice Lake was supposed to launch in 2016... it still hasnt launched on server. Insane, and likely a result of the yields never ramping to make massive server dies profitable like planned. It also never hit the frequencies I believe they intended. So while better, still pretty much a huge let down by manufacturing.

I personally think Rocket Lake exists mostly because the design team would otherwise be sittin on their rear end, cause the manufacturing side is half a decade behind. So now you get "codesign", because Intel has realized that going all in and betting on the manufacturing folks to deliver is a bad idea in a world where each manufacturing improvement is going to get harder and harder and riskier and riskier.

Pretty much the worst thing that I think you can lay at the feet of design is spectre/meltdown, but that seems considerably mitigated if the nodes had actually gotten poo poo out on the intended schedule. Instead, they spent 6 years grafting additional cores to Skylake and doing band-aid fixes on a design that was supposed to have been replaced years ago.

(i likely dont know what im talking about, so take this whole thing as just a web forum rant)

# ? Apr 2, 2021 07:45

gradenko_2000: Oct 5, 2010; HELL SERPENT; Lipstick Apathy

https://www.youtube.com/watch?v=LYdHTSQxdCM

Gamers Nexus has a review up of the i5-11400, the non-overclockable Rocket Lake six-core, and it comes really dang close to a 5600X despite being a over hundred bucks cheaper

or, put another way, is significantly faster than a Ryzen 5 3600 on top of being 20-40 bucks cheaper

# ? Apr 2, 2021 08:55

Ika: Dec 30, 2004; Pure insanity

gradenko_2000 posted:

https://www.youtube.com/watch?v=LYdHTSQxdCM

Gamers Nexus has a review up of the i5-11400, the non-overclockable Rocket Lake six-core, and it comes really dang close to a 5600X despite being a over hundred bucks cheaper

or, put another way, is significantly faster than a Ryzen 5 3600 on top of being 20-40 bucks cheaper

Interesting. Over in euro land the i7 11700 non K is already available for the same price as a 5600X as well. Wonder if prices will start to fall soon.

# ? Apr 2, 2021 12:43

Adbot: ADBOT LOVES YOU

# ? Apr 16, 2024 17:49

Fantastic Foreskin: Jan 6, 2013; A golden helix streaked skyward from the Helvault. A thunderous explosion shattered the silver monolith and Avacyn emerged, free from her prison at last.

As someone who only has man-on-the-street level knowledge of chip fab, can someone explain to me what exactly it means for a node/process to fail, and how one does it for 5 years straight?

# ? Apr 2, 2021 14:02

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Intel: lol

«‹›740 »