Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
FlapYoJacks
Feb 12, 2009
Because the x86 instruction set sucks.

Adbot
ADBOT LOVES YOU

Arzachel
May 12, 2012

ConanTheLibrarian posted:

I was surprised by this and looked up some performance comparisons. It's crazy how close A13 is to Intel and AMD's desktop CPUs. Does anyone have any insights regarding how they've squeezed that much performance out of ARM cores (especially considering its clocks are way lower than desktop CPUs)? E.g. a more efficient ISA?

Infinite money and being able to optimize around a very narrow TDP range and core count. I don't think the ISA matters much, especially since Apple are cracking ARM instructions into micro ops.

BlankSystemDaemon
Mar 13, 2009



NewFatMike posted:

Apple also make extremely performant ARM cores as well, and having locked down the hardware side, they have a distinct advantage to move to another CPU architecture.

ConanTheLibrarian posted:

I was surprised by this and looked up some performance comparisons. It's crazy how close A13 is to Intel and AMD's desktop CPUs. Does anyone have any insights regarding how they've squeezed that much performance out of ARM cores (especially considering its clocks are way lower than desktop CPUs)? E.g. a more efficient ISA?
Apple quite recently lost quite a few of their silicon people, so I wouldn't expect them to stay ahead. Quite a few of the former Apple employees are at NUVIA now.
Whether that'll translate to a HPC ARM core, we'll see - but I suspect that's the plan.

ConanTheLibrarian
Aug 13, 2004


dis buch is late
Fallen Rib
Seems like ARM in the data centre would be a lot more viable if the CPU cores performed like the A13's though.

repiv
Aug 13, 2009

ConanTheLibrarian posted:

I was surprised by this and looked up some performance comparisons. It's crazy how close A13 is to Intel and AMD's desktop CPUs. Does anyone have any insights regarding how they've squeezed that much performance out of ARM cores (especially considering its clocks are way lower than desktop CPUs)? E.g. a more efficient ISA?

I found Anandtech's comparisons and there's a pretty obvious hole in them - they use the default, lowest-common-denominator compiler target for their SPEC builds which means it only uses ancient SSE2 instructions, no AVX, no AVX512, not even SSE4.

They use those results to draw the comparison that the A13 outperforms a Xeon 8176 in single threaded performance but the gigantic SIMD units on the Xeon are barely being utilized.

Show me the A13 keeping up with Skylake in a heavily optimized SIMD workload and I'll be more impressed :v:

repiv fucked around with this message at 18:21 on Apr 28, 2020

ConanTheLibrarian
Aug 13, 2004


dis buch is late
Fallen Rib
Ok that's interesting alright. I was wondering if Apple had their own homebrew SIMD instructions, seems not. Still, a lot of applications wouldn't make use of wide instructions. I'd guess there could be a lot of other tasks that are hardware accelerated in x64 but not ARM.

repiv
Aug 13, 2009

Apple does have SIMD but it's the usual 128bit NEON instruction set common to ARM chips, same width as old school SSE on desktop. Zen2 and consumer Intel have 256bit units and Intel's big chips are up to 512bit.

movax
Aug 30, 2008

BobHoward posted:

I'm not a real expert on the topic, but I did do some experimental implementation work once on a Forward Error Correction (FEC) decoder for long haul fiber 100G networking. It was experimental in that my starting point was working ASIC source code, and I was asked to see if it was possible to port it to work at full 100G rate in FPGAs, for Reasons. I didn't succeed, the original design was too dependent on things ASICs do way better than FPGAs, and it was deemed not important enough to spend more effort on.

So, with that not-a-real-expert caveat, for 10G, I would not be surprised if short haul SFP gets away with no FEC required to make the physical layer reliable, while 10Gbase-T likely needs some. And FEC adds latency.

What was the original ASIC RTL on, 28 nm? You'd basically have to reach to get the fattest SerDes / transceivers on a given FPGA family to get to a point where you could keep up with that traffic at maybe a ~200 MHz clock rate.

D. Ebdrup posted:

The ARM Morello, which underlies the Neoverse based Graviton2 chip, is making huge waves in the server market too.
It is also the CPU thats being used for the capabilities-based CheriBSD, a soft-fork of FreeBSD (meaning code goes back to FreeBSD regularly) made by Cambridge with.
CHERI is notable for being the way to mitigate many of the foibles of C and C++ software, by applying hardware-based capabilities.

Not quite Graviton2 related but I recently learned about the AWS Nitro card / accelerator and that seems like cool kit. I had been bouncing around a project ideas for like the past 10 years of putting a LPC/SPI snoop module in to monitor system firmware for unknown changes, and I guess when you have as many servers as AWS and can homegrow everything, a custom network controller, supervisor IC and crypto acclerator is a no brainer.

gradenko_2000 posted:

is there such a thing as an ARM desktop CPU that you could buy and build into like a Intel Core CPU?

If you mean actually integrated as a core into the CPU, unless things have changed radically (and they probably have), we all have traces of StarFox for the SNES running in our PCHs. IIRC, the ME firmware runs on a Synopsys ARC, which is Argonaut RISC Core which traces its heritage back to Jez San, Argonaut games and the SuperFX chip for the Super Nintendo. Thanks trivia brain.

CFox
Nov 9, 2005

repiv posted:

I found Anandtech's comparisons and there's a pretty obvious hole in them - they use the default, lowest-common-denominator compiler target for their SPEC builds which means it only uses ancient SSE2 instructions, no AVX, no AVX512, not even SSE4.

They use those results to draw the comparison that the A13 outperforms a Xeon 8176 in single threaded performance but the gigantic SIMD units on the Xeon are barely being utilized.

Show me the A13 keeping up with Skylake in a heavily optimized SIMD workload and I'll be more impressed :v:

That's interesting, this is the first time I've heard about this in all of the A13 vs Desktop chips comparison. What are some common desktop applications that really take advantage of AVX/SSE4? I could see a future where all consumer devices run ARM while servers were still on x86 but maybe I'm missing something big (besides (legacy) application support for ARM of course) where still keeping x86 makes more sense for desktop/laptop.

movax
Aug 30, 2008

CFox posted:

That's interesting, this is the first time I've heard about this in all of the A13 vs Desktop chips comparison. What are some common desktop applications that really take advantage of AVX/SSE4? I could see a future where all consumer devices run ARM while servers were still on x86 but maybe I'm missing something big (besides (legacy) application support for ARM of course) where still keeping x86 makes more sense for desktop/laptop.

I mean, the biggest thing is legacy code — the Windows, Office, etc. codebase and APIs have been x86 for what, nearly 3 decades now? While parts of those applications are being re-written, there's absolutely core functionality (Excel still duplicates Lotus 1-2-3 bugs intentionally) that likely would require significant rewrite / revalidation in moving around. I've never hosed around with the ARM-based Office apps, but I do know the web apps suck and macOS Office still isn't feature parity with Windows, even after they vastly improved it.

OTOH, engineering software will never move off x86 — not a significant part of the market I know, but EDA tools, computation (things still running FORTRAN kernels), are stuck there. Could fit into your model where you access them remotely via ARM thin-client on a x86 server though.

Cygni
Nov 12, 2005

raring to post

In Apple's favor is that none of this discussion matters at all to the target market of a MacBook Air replacement with even better battery life and, as a bonus, your entire iOS back catalog of apps built in.

The rub is that they basically get everything from Intel at cost, so switching to their own arch (or AMD) doesn't necessarily help their bottom line. That sweetheart deal is dependent on Apple remaining a closed Intel shop and will end for all of their products the moment Apple announces an Arm MBA, so it is a big step for Apple to make. It will require going all in. And Intel has openly flexed at Apple before in the past with things like the original Ultrabook initiative, so both have signaled they are willing to go to war.

I assume they are also hesitant because of memories of getting stuck on their own proprietary dying architecture while the rest of the market clobbers them, too. But like everyone has been saying for 5+ years now, I think it is a matter of "when" not "if" Apple makes the jump, and a question of how many re$ource$ Apple is willing to send at the problem.

BlankSystemDaemon
Mar 13, 2009



movax posted:

Not quite Graviton2 related but I recently learned about the AWS Nitro card / accelerator and that seems like cool kit. I had been bouncing around a project ideas for like the past 10 years of putting a LPC/SPI snoop module in to monitor system firmware for unknown changes, and I guess when you have as many servers as AWS and can homegrow everything, a custom network controller, supervisor IC and crypto acclerator is a no brainer.
Chelsio sticks the 100Gbps T6 crypto accelerator on quite a few NICs nowadays, so it's not exactly an unknown even outside the biggest butt provider.
I believe it's how Netflix is able to serve ~200Gbps of TLS encrypted video streams per server, using FreeBSD with the ccr(4) driver on a few NUMA domains.

movax
Aug 30, 2008

D. Ebdrup posted:

Chelsio sticks the 100Gbps T6 crypto accelerator on quite a few NICs nowadays, so it's not exactly an unknown even outside the biggest butt provider.
I believe it's how Netflix is able to serve ~200Gbps of TLS encrypted video streams per server, using FreeBSD with the ccr(4) driver on a few NUMA domains.

Huh, for whatever reason I thought AWS had totally rolled their own but in retrospect, what you said makes a lot more sense. Or, even if they did, I think Chelsio licenses the Terminator as a SIP core that Amazon could have rolled into a single-chip solution if they wanted too.

karoshi
Nov 4, 2008

"Can somebody mspaint eyes on the steaming packages? TIA" yeah well fuck you too buddy, this is the best you're gonna get. Is this even "work-safe"? Let's find out!

repiv posted:

Apple does have SIMD but it's the usual 128bit NEON instruction set common to ARM chips, same width as old school SSE on desktop. Zen2 and consumer Intel have 256bit units and Intel's big chips are up to 512bit.

How good is ARM's variable length SIMD extension? It should enable silicon vendors to tailor their SIMD performance (SIMD ALU width) to their market without requiring software changes. Has anybody gone full HPC with those extensions yet?

BlankSystemDaemon
Mar 13, 2009



movax posted:

Huh, for whatever reason I thought AWS had totally rolled their own but in retrospect, what you said makes a lot more sense. Or, even if they did, I think Chelsio licenses the Terminator as a SIP core that Amazon could have rolled into a single-chip solution if they wanted too.
I mean, there's a lot of space in between our little corner of the internet and the hyperscalers of the world - it's no surprise they're doing all kinds of odd things that we'd never get the chance to even play with if it wasn't for their initiatives to standardize on various platforms, which they'll eventually be replacing with newer gear.
It's already starting to show up from Google et al, so in a number of years there's gonna be lot of retired gear showing up used. Will be interesting to see who gets their gubby little mits on it. I'm certainly gonna try, but it'll almost certainly be out of my price league.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

D. Ebdrup posted:

Chelsio sticks the 100Gbps T6 crypto accelerator on quite a few NICs nowadays, so it's not exactly an unknown even outside the biggest butt provider.
I believe it's how Netflix is able to serve ~200Gbps of TLS encrypted video streams per server, using FreeBSD with the ccr(4) driver on a few NUMA domains.

The 200G result was with encryption on the CPU.

Unrelated to that:
https://twitter.com/whataintinside/status/1255027517787901952?s=21

taqueso
Mar 8, 2004


:911:
:wookie: :thermidor: :wookie:
:dehumanize:

:pirate::hf::tinfoil:
needs audio

BlankSystemDaemon
Mar 13, 2009



PCjr sidecar posted:

The 200G result was with encryption on the CPU.

Unrelated to that:
https://twitter.com/whataintinside/status/1255027517787901952?s=21
Oh, I just checked the commit logs, you're right in that KTLS (the in-kernel TLS implementation Netflix made and upstreamed) doesn't make use of hardware offload via crypto(4), but I think it might be coming down the line with the massive update to the crypto(4) driver and entire framework, that's going into HEAD?

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
Comet Lake lineup has leaked, embargo date is April 30 at 6am pacific.



No real surprises IMO, basically maintained the current i3/i5/i7/i9 price tiers apart from a bit deeper discounts on the F and KF variants than before. The i3 line continues to be garbage, the i5s fall somewhere between "clocked too low to decisively beat the 3600" and "too expensive compared to the 3600", the i9 lineup is priced above the 3900X and you still get 2 less cores.

The 10700F looks like the winner there. $298 for an 8C16T with 4.6 GHz all-core turbo isn't bad, that's a nice part for gaming.

Khorne
May 1, 2002

Paul MaudDib posted:

No real surprises IMO, basically maintained the current i3/i5/i7/i9 price tiers apart from a bit deeper discounts on the F and KF variants than before. The i3 line continues to be garbage, the i5s fall somewhere between "clocked too low to decisively beat the 3600" and "too expensive compared to the 3600", the i9 lineup is priced above the 3900X and you still get 2 less cores.

I burst out laughing because they all have hyperthreading now. That particular artificial market segmentation tactic was dirty, and I'm glad some actual competition fixed that practice.

taqueso
Mar 8, 2004


:911:
:wookie: :thermidor: :wookie:
:dehumanize:

:pirate::hf::tinfoil:
I think you'll find that Intel's incredible R&D teams have revolutionized computing again and their hard work allows all processor markets access to this valuable technology which was invented by Intel the leader in processor innovations.

snickothemule
Jul 11, 2016

wretched single ply might as well use my socks
That G-5900 and 5920, 58w and $42, $52, priced to match the 3000G? Do they still shift those parts?

Also interesting that it's 125w or 65w, curious to what they'll really consume, 10 cores at 4.8 has to be thirsty surely.

Palladium
May 8, 2012

Very Good
✔️✔️✔️✔️

snickothemule posted:

That G-5900 and 5920, 58w and $42, $52, priced to match the 3000G? Do they still shift those parts?

Also interesting that it's 125w or 65w, curious to what they'll really consume, 10 cores at 4.8 has to be thirsty surely.

The only parts in that list standing a chance to draw at/below their rated TDP while sustaining their all-core boost are the 10400 and below. Those "65W" 8C/10C chips would be drawing north of 150W without power limited at the BIOS.

Cygni
Nov 12, 2005

raring to post

10400F vs 3600 will be very interesting at that price, too. Will also be interesting to see if Intel has the 14nm manufacturing capacity to actually bring all of these to market quickly, or if they end up mostly going to SIs and very rarely showing up in boxed form like a lot of the CLR parts.

Ihmemies
Oct 6, 2012

Do these require new motherboards or will they work with the same old motherboards meant for 8 and 9 series?

They're the same 14nm++++ chips anyways...

Cygni
Nov 12, 2005

raring to post

Ihmemies posted:

Do these require new motherboards or will they work with the same old motherboards meant for 8 and 9 series?

They're the same 14nm++++ chips anyways...

New socket. For the last decade, Intel has given you 2 generations per socket and thats it. So you can probably expect to get Rocket Lake on this same socket next year, and thats it. To be fair, after next year, both AMD and Intel will be forced to new sockets for DDR5.

gradenko_2000
Oct 5, 2010

HELL SERPENT
Lipstick Apathy
The low-end pricing seems... not great? 64 USD for a 4.0 Ghz 2c/4t Pentium or 122 USD for a 4c/8t Core i3 is going to be tough to justify against the Athlon 3000G, the 1600 AF, the 3100/3300X, and the 2400G / 3400Gs.

eames
May 9, 2009

Cygni posted:

New socket. For the last decade, Intel has given you 2 generations per socket and thats it. So you can probably expect to get Rocket Lake on this same socket next year, and thats it. To be fair, after next year, both AMD and Intel will be forced to new sockets for DDR5.

Yeah, there are rumors about a 7nm Meteor Lake floating around already, with a new socket again (LGA1700). I don’t think the launch Z370 boards would be able to handle the new -K parts, I expect those power consumption numbers to be quite up there.

EoRaptor
Sep 13, 2003



gradenko_2000 posted:

The low-end pricing seems... not great? 64 USD for a 4.0 Ghz 2c/4t Pentium or 122 USD for a 4c/8t Core i3 is going to be tough to justify against the Athlon 3000G, the 1600 AF, the 3100/3300X, and the 2400G / 3400Gs.

Intel offers a lot of side benefits like motherboard designs, marketing dollars, etc to OEM’s that make these prices pretty meaningless. You’ll find plenty of examples of entire computers that are somehow only double the list price of the CPU, and it’s because intel kicks so much back.

Fantastic Foreskin
Jan 6, 2013

A golden helix streaked skyward from the Helvault. A thunderous explosion shattered the silver monolith and Avacyn emerged, free from her prison at last.

Palladium posted:

The only parts in that list standing a chance to draw at/below their rated TDP while sustaining their all-core boost are the 10400 and below. Those "65W" 8C/10C chips would be drawing north of 150W without power limited at the BIOS.

Intel only rates TDP at base clocks anyways. By-and-large these numbers don't say anything at all about power consumption.

Cygni
Nov 12, 2005

raring to post

The OEM partners are starting to put the pages up and stuff for tomorrows (likely fairly boring) Comet Lake launch. Biostar is apparently going to to try to get back into higher end boards, but still has some excellently bad copy writing. Huge image:

https://www.biostar.com.tw/event/z490/img/Z490%20Series.jpg

Some fav portions:

If i pay more, can i get them over protected instead?





I hate it when my motherboard dilates time.





The everyday temp scale everyone uses for VRMs, Kelvin.





Another excellent graph scale.





So much to love in this image. I think i like the misaligned text on the monitor the most.

Cygni
Nov 12, 2005

raring to post



the sata plugs... what

gradenko_2000
Oct 5, 2010

HELL SERPENT
Lipstick Apathy
the z490 is the high-end board, right? is there gonna be an equivalent to the h310?

Crunchy Black
Oct 24, 2017

by Athanatos
Unless the 'rona gets way worse in China and continues to interrupt supply chains, count on an ARM Macbook this year.

/year of linux on the desktop, etc.

trilobite terror
Oct 20, 2007
BUT MY LIVELIHOOD DEPENDS ON THE FORUMS!

Crunchy Black posted:

Unless the 'rona gets way worse in China and continues to interrupt supply chains, count on an ARM Macbook this year.

/year of linux on the desktop, etc.

Tbh, the thing spurring me to get a PC build done now rather than later is concern for the economy cratering and loving up supply chain/retail or similar (I honestly haven’t a clue, but states are starting mass layoffs now in a way that we hadn’t even seen in March as everybody starts running out of $$$$ to make payroll and poo poo).

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

repiv posted:

I found Anandtech's comparisons and there's a pretty obvious hole in them - they use the default, lowest-common-denominator compiler target for their SPEC builds which means it only uses ancient SSE2 instructions, no AVX, no AVX512, not even SSE4.

They use those results to draw the comparison that the A13 outperforms a Xeon 8176 in single threaded performance but the gigantic SIMD units on the Xeon are barely being utilized.

Show me the A13 keeping up with Skylake in a heavily optimized SIMD workload and I'll be more impressed :v:

I'm on record elsewhere in the forums as thinking that ARM Macs are most likely a bit further away than this year. I believe that to do it, they'll need to have a plan to quickly transition the entire product line, including the Mac Pro (just as they did when it was PowerPC -> Intel). Unless they're OK with the (i)Mac Pro being a loss leader in a big way, it's hard to imagine how they decide it's worth it. On those products, Intel and AMD get to amortize their non-recurring engineering costs over orders of magnitude more volume than Apple will ever have for "Pro" desktop Macs, so even though Apple is paying Intel way over the marginal cost of production on workstation chips, they probably would do even worse with in-house chips.

That said, who knows, maybe they've decided it's time to spend some of that cash mountain. If they do, I want to point out some things which counter your argument about SIMD performance.

The first is that AVX512 doesn't matter all that much. There's a reason why Intel goes so far as to power gate a bunch of the hardware and turn the clock up when full-width instructions aren't in use, even though this mode-switching creates nasty startup/shutdown performance penalties. Programs which only use SSE2 are vastly more common than programs which use AVX512. (furthermore, most SSE2 programs don't even use it for SIMD, just as a saner way of doing scalar FP than x87)

Second is that Apple provides the Accelerate framework, a library which provides a ton of the algorithms SIMD is most frequently used for. If your application can get what it needs from Accelerate, you don't have to worry about hand-coding with intrinsics or ASM, or optimizing for the specific processor in the user's machine - Apple did it for you. Any software which uses this will, upon being ported to ARM, automatically get the benefits of Apple's optimization work for ARM. (Possibly even unported software - ISTR that back in 2005, PowerPC apps which called Accelerate got the benefit of invoking a native x86 backend when run under emulation on an Intel Mac.)

Third is that Apple is one of the few owners of an ARM architectural license, and is already known to be using the privileges it confers to add custom ARM ISA extensions to their Axx series chips for purposes like accelerating neural network code. That plus Accelerate as a frontend (to avoid needing to give apps direct access to custom, nonstandardized, and likely undocumented instructions) means that even if you think standard ARM SIMD is inherently terrible, Apple isn't completely limited by it.

Fourth is that, to be honest, AVX512 has (mostly) proven to be a wet fart outside OS-provided libraries like Accelerate. As I understand it, adoption is much worse than AVX256, which in turn is much worse than SSE. Most people who want really high performance on the type of math it's good at tend to push it out to GPGPU, and Apple will have plenty of options there (since they're designing their own GPUs too). AVX512's existence is more about Intel trying to keep the x86 ISA front and center (because Intel wants everything to be about x86) rather than it being the best solution for everything it tries to address.

(Also, Intel being Intel, AVX512 adoption got hurt by Intel dribbling out new AVX instructions piecemeal across many years, and instead of committing to support being cumulative, tried to use feature disable bits on many of them in their insane market segmentation game. In order to attract use outside of libraries like Accelerate, a SIMD ISA extension should be as close to universally supported as possible. Needing to figure out a maze of feature detection and fallback paths makes it so much harder for ISVs to deploy to random end user machines, so they don't bother unless there's a really compelling reason.)

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
I mean, why not just have an A13 on board their motherboard? What’s unit production cost on an ARM chip, even a ridiculously fat/wide one like A13? Maybe 30 bucks?

Hell, put it on an add-in card for older systems. Apple can do cryptographic signing so that it won’t boot without talking to a TPM containing their signature or some poo poo.

Think of it as the return of the 286 add on card.

eames
May 9, 2009

Modern Macs already boot from an Apple designed ARM chip. One could make an argument that the Intel CPUs in Macs are already co-processors. :)

The co-processor approach is interesting to fantasize about, particularly with small x86-64 chips as co-processors that only spin up to natively run legacy code. This would retain backwards compatability while teaching users to avoid the old "inefficient" (primarily battery lifetime) architecture.
I imagine the current suppliers wouldn't be too pleased about this but there are others that wouldn't mind selling a few cheap, small, low power x86-64 cores as custom chips.

eames fucked around with this message at 09:58 on Apr 30, 2020

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

BobHoward posted:

I'm on record elsewhere in the forums as thinking that ARM Macs are most likely a bit further away than this year. I believe that to do it, they'll need to have a plan to quickly transition the entire product line, including the Mac Pro (just as they did when it was PowerPC -> Intel). Unless they're OK with the (i)Mac Pro being a loss leader in a big way, it's hard to imagine how they decide it's worth it. On those products, Intel and AMD get to amortize their non-recurring engineering costs over orders of magnitude more volume than Apple will ever have for "Pro" desktop Macs, so even though Apple is paying Intel way over the marginal cost of production on workstation chips, they probably would do even worse with in-house chips.

That said, who knows, maybe they've decided it's time to spend some of that cash mountain. If they do, I want to point out some things which counter your argument about SIMD performance.

The first is that AVX512 doesn't matter all that much. There's a reason why Intel goes so far as to power gate a bunch of the hardware and turn the clock up when full-width instructions aren't in use, even though this mode-switching creates nasty startup/shutdown performance penalties. Programs which only use SSE2 are vastly more common than programs which use AVX512. (furthermore, most SSE2 programs don't even use it for SIMD, just as a saner way of doing scalar FP than x87)

Second is that Apple provides the Accelerate framework, a library which provides a ton of the algorithms SIMD is most frequently used for. If your application can get what it needs from Accelerate, you don't have to worry about hand-coding with intrinsics or ASM, or optimizing for the specific processor in the user's machine - Apple did it for you. Any software which uses this will, upon being ported to ARM, automatically get the benefits of Apple's optimization work for ARM. (Possibly even unported software - ISTR that back in 2005, PowerPC apps which called Accelerate got the benefit of invoking a native x86 backend when run under emulation on an Intel Mac.)

Third is that Apple is one of the few owners of an ARM architectural license, and is already known to be using the privileges it confers to add custom ARM ISA extensions to their Axx series chips for purposes like accelerating neural network code. That plus Accelerate as a frontend (to avoid needing to give apps direct access to custom, nonstandardized, and likely undocumented instructions) means that even if you think standard ARM SIMD is inherently terrible, Apple isn't completely limited by it.

Fourth is that, to be honest, AVX512 has (mostly) proven to be a wet fart outside OS-provided libraries like Accelerate. As I understand it, adoption is much worse than AVX256, which in turn is much worse than SSE. Most people who want really high performance on the type of math it's good at tend to push it out to GPGPU, and Apple will have plenty of options there (since they're designing their own GPUs too). AVX512's existence is more about Intel trying to keep the x86 ISA front and center (because Intel wants everything to be about x86) rather than it being the best solution for everything it tries to address.

(Also, Intel being Intel, AVX512 adoption got hurt by Intel dribbling out new AVX instructions piecemeal across many years, and instead of committing to support being cumulative, tried to use feature disable bits on many of them in their insane market segmentation game. In order to attract use outside of libraries like Accelerate, a SIMD ISA extension should be as close to universally supported as possible. Needing to figure out a maze of feature detection and fallback paths makes it so much harder for ISVs to deploy to random end user machines, so they don't bother unless there's a really compelling reason.)

Tbf it’s less market segmentation and more a desperate attempt by the isa team to bolt on any (mostly ml) perf tweaks on 14nm skl core after the process team missed 10nm by 3 years.

Adbot
ADBOT LOVES YOU

Mr.Radar
Nov 5, 2005

You guys aren't going to believe this, but that guy is our games teacher.
Buildzoid is livestreaming a teardown of a Z490 board on his Twitch channel.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply