Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Epiphyte posted:

What actually uses AVX-512 in home use?

This is rather off to the side, but I learned yesterday from the ARMv9 announcement that there is an emerging standard to vector ops (alternative to Intel's AVX/2/512 and Arm's own Neon) called SVE2 (Scalable Vector Extensions 2).

It allows vector ops on data of any width from 128 bits to 2048 bits, in 128 bit increments, on any CPU which implements SVE2, regardless of the native bit-ness of that CPU's vector circuitry. I assume that there are large performance hits for performing ops wider than the native width of your hardware -- as we saw with Zen/Zen+ CPUs which could dispatch AVX2 ops, but took two cycles to do so rather than one.

At the moment there is exactly one CPU in the world which implements this: the Fujitsu A64FX, which is used in the Fukagu supercomputer. But in about 18 months it will be supported by all new Arm CPUs, and it would be cool if everyone settled on this rather than a never-ending series of vendor-specific extensions. (Lol.)

Adbot
ADBOT LOVES YOU

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

gradenko_2000 posted:

That's a six-core/12-thread part that still turbos up to 3.2 GHz even with the T-model low-TDP variant. It's honestly a lot of CPU to be doling out for driving spreadsheets

Before I became an addict and went ham on BOINC, the T series was my drug of choice for a handful of upgrade cycles (i3-6100T was my last before I caught the fever for MORE CORE). I haven't paid any attention in a couple years, but I'm sure they're still pretty great as efficient workhorses. No one ever talks about them.

Edit: Just looked up the i3-10100T. The only issue I see is the ridiculously small amount of cache, but I guess that's how the market segmentation crumbles.

mdxi fucked around with this message at 05:50 on Apr 9, 2021

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Bofast posted:

Somewhere in the world, the one person running spreadsheets in Linux on a PS3 just felt a shiver down their spine :D

That's not me, but I did once boot Linux on my Dreamcast. I just wanted to be able to say that I had run it on a SuperH CPU, because I've always been a sucker for weird architectures.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Perplx posted:

It's weird that it's easy to slap a desktop cpu in a laptop, but making a new high wattage power brick is a total non starter and you have to use 2 off the shelf bricks.

I'm not an EE, but my first guess would be that there are limits to the power you can push through a passively-cooled (and, in fact, sealed-in-plastic) transformer before it starts setting everything on fire.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

repiv posted:

Do we know if, for example, an 8C8c CPU will show up as 16 physical cores that can all be scheduled on simultaneously, or are the big and little cores mutually exclusive?

I think ARM big.LITTLE can work either way, with most designs being mutually exclusive but some (like Apples) exposing all the cores at once.

In ARM big.LITTLE machines, this is a both a hardware and software issue, and there are 3 possibilities:

https://en.wikipedia.org/wiki/ARM_big.LITTLE#Run-state_migration

I think all ARM hardware has implemented HMP (your first option, and the most "complete" view of a heterogenous processor) for a while now.

On the software side, it's all been a non-issue for years on Linux -- driven by the use cases of phones and Chromebooks. But after watching the Windows scheduler cripple perfectly ordinary x86 CPUs because what do you mean a computer can have more than 4 cores, AMD?, I have no doubt Microsoft will balls-up this rather more complex issue in some hilarious way.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

I'm all for motherboards being mechanically and electrically simplified, and very much in favor of getting rid of the 2-part 24-pin power connector and it's huge wad of wires. Please bring on the single 12V standard (and force AMD to use it too).

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Nomyth posted:

Wait, so the motherboards are getting simpler? I heard some complaining earlier about how all the voltage conversion was getting pushed down onto motherboard OEMs to get 3.3V and other rails

You can view it either way I suppose, but to be honest I was thinking slightly out of context anyway. My experience with single-voltage mobos is in large datacenters, where everything that a machine doesn't need gets thrown off the board, so it's physically simpler. But of course, now that I'm thinking about it, that won't be true of DIY mobos.

There is a lot of stuff on DIY mobos that I'd be happy to ditch, but that's completely subjective.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

DrDork posted:

And both of them are suffering from the reality that the x86_64 platform is hemmed in by the need to support ancient poo poo Windows.

FTFY, except that both versions actually say the exact same thing.

Mac and iOS users don't have to give a poo poo about legacy, because Apple has and will put in the work on the system side to make migrations close to painless, or even invisible where possible.

Linux users (and here I really mean "datacenter operators", who are the silent and invisible 80,000 pound gorillas in the room -- individual Linux users as a slice of x86_64 users are but noise in anyone's sales numbers) don't give too much of a poo poo because the kernel and compilers will be ported to anything halfway interesting; it's just a question of how fast and completely it can happen due to documentation quality. Hyperscalers have been writing their own drivers, etc., for a long time, when needed. They're functionally able to jump ship as soon as something with a better ROI actually comes along... but x86 has been the broad-spectrum ROI king of the hill for a long time.

It's really Microsoft who are now pinned to x86_64 (and vice versa) because of MS's refusal to commit to moving their ecosystem forward in a meaningful, cohesive, top-to-bottom way, as Apple has. I think their longstanding (and formerly justified) hubris that the CPU market would cater to them has led to them being caught rather flat-footed in a world where that market might actually be volatile with respect to architecture, and software portability is more than just a topic for research papers from the 1970s.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Rinkles posted:

do i understand correctly, that the new intel cpus should be price competitive with AMD's (i'm looking at the $200-300 tier), but motherboard costs are likely to more than make up for any savings?

Issues of processor features and relative performance aside, and focusing just on the motherboard part of the question, Zen 3/Ryzen 5000 is the end of the line for socket AM4 motherboards. And there is only the rumored/presumptive stacked cache variant left in the Ryzen 5000 series. AM4 was supposed to reach EOL in 2020, but we all know how that went.

When Zen4 (which I assume will be Ryzen 6000 but also maybe it won't depending on which rumor-mongers you believe, because tech journalism is now a double ourobouros of reddit and twitter making GBS threads into each other's mouths forever) arrives, it'll be on the AM5 socket and everybody has to buy new motherboards no matter who their CPU vendor of choice is for the upcoming generation.

And I don't believe AMD has made any commitments as to the longevity of AM5. There is no known-safe, long-term play right now.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Sidesaddle Cavalry posted:

Excessive moore's-law-still-alive-cpu-boomer smugness ITT

Didn't you hear? Pat Gelsinger saved Moore's law by adding the transistors on stacked chips, but only dividing by the area of the floor layer.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Twerk from Home posted:

What voltages have people been clocking 7nm AMD at?

Since the 3000 series introduced the ability to set PPT in the BIOS (or Ryzen Master if you're on Windows), I think most people don't bother with setting voltages by hand anymore. You just tell the CPU how many watts it is allowed to use, and the power/frequency/thermal management algorithm constantly adjusts clocks for each core to what it can push within that envelope.

You can literally watch a chip speed up when its intake air gets cooler, opening up more thermal overhead. It's rad.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING


ASRock would never design something so inelegant and wonky.

They would have implemented it as an interposer/daughterboard/riser which attaches to the CPU socket on your existing motherboard, acting primarily as a PCI and power passthru. Daughterboard has a steel backplate and a system of standoff/buttresses supporting it, which attach to the case at the ITX layout mobo screw holes. CPU, cooler, and RAM socket into the daughterboard.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

BlankSystemDaemon posted:

Whom you know personally, and who's also your uncle? :v:

There was a really nice and interesting conversation going on here before you decided to wade in and BOFH it up.

If you've got countervailing technical or historical information, then say so.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

My fave PC from that era was an Intergraph TD4 that I picked up used on Ebay. Dual PPros running at 90MHz.

It was a great coding station because playing my poo poo-quality, stolen anime MP3s would eat 70% of one CPU, leaving the other one free to run Emacs and render websites in Netscape (which, back then, were all sets of giant tables with the default bevelled cell borders turned off, because DIV hadn't been invented yet).

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Hasturtium posted:

That falls under the purview of the Alder Lake Thread Director, an integrated microcontroller whose entire responsibility is ensuring the right workloads go to the correct chips. It requires integration with the OS scheduler - to my knowledge it’s only supported by Windows 11 and the Linux 5.18 kernel. Anandtech has a pretty good write up on Alder Lake that goes into more detail.

5.18 might be when Alder Lake support went in, but heterogenous multiprocessing has been a well-solved problem in the mainline Linux kernel -- and iOS -- for years. As usual, it's just Windows that needs to play catch-up.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Twerk from Home posted:

Do you have any idea if Android is using mainline kernel thread scheduling across its widely deployed big.LITTLE uarches, or if it tends to use non-upstreamed patches, or it varies by manufacturer? Android phones at this point tend to have 3 different types of core, which is pretty wild.

I know that Android supported big.LITTLE (which was the first "real", commercial implementation of something like this) before mainline Linux did. But I don't know the exact history of how it got folded into mainline, or how the two have maintained parity since then.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Rinkles posted:

Is there a most likely bottleneck when I'm playing a game, the framerate's uncapped but GPU utilization is low, and each CPU core isn't close to being maxed out (based on afterburner)? Specifically it's happening with some emulated PS3 games. RPCS3 is CPU intensive, so I'm pretty sure it's not on the GPU side. Something some extra L2 cache would help with?

I know this has been said many times in many threads in sh/sc, but the PS3's architecture is a fundamentally poor fit for emulation on pretty much any "normal" CPU. It's not just slower, because you're writing a (for example) Z80 in software and then running that on a host CPU and then running Z80 code on top of that. A Z80 isn't an x86 CPU, but it shares a lot of structural commonalities with them.

The PS3's Cell, on the other hand, is more like a single-core PowerPC with seven coprocessors hanging off of it, each of which is something like an AVX unit, but also something like a shader engine, but also something like a DSP. But really there's eight of them, but really there's not because one is walled off by/for the system software. And these coprocessors can act independently, or they can be chained together in serial.

All this because Ken Kutaragi had a giant boner for "elegant" architecture and ignored that the graveyard of CPU design is littered with elegant hardware designs which would rely in a really smart compiler to take up the slack of dealing with the weirdness and/or ideological purity of the hardware engineers -- a thing that never really pans out the way over-confident designers want it to.

So to successfully emulate a Cell, you have to correctly implement this odd design, which maps poorly onto commodity hardware, and then you've got to cope with the fact that the compiler was never as good as Sony wanted it to be, and the PS3 largely underperformed the expectations that were set for it because of this. And then there's the fact that games are notoriously full of dirty hacks and cheats, resulting in emulators needing to be what's called "bug-for-bug compatible" with specific games which relied on discovered quirks of the physical hardware, which were likely not documented (on either end; either the hacks, or the discovered quirks).

It's just a perfect storm for lovely emulation performance.

TL;DR I have no idea if L2 will help; I just want people to accept that PS3 emulation will probably keep being worse than they wish it were, for longer than they expect.

P.S. The best thing the PS3 did was help ensure that the PS4 and PS5 are based on commodity hardware. Sony loves to unlearn painful lessons and slip back into Dunning-Krugerspace though, so eventually we're guaranteed to see a playstation built on top another esoteric, possibly-homegrown platform.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Kibner posted:

Since Zen 2 (the Ryzen 3<xxx> series)

FTFY. Not to be pendantic, but because the Zen (core name) / Ryzen (CPU product name) thing has gotten increasingly confusing as time has gone on. Especially now that they aren't guaranteed to move upward in lockstep in the mobile and embedded product lines.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

I was just reading an article about Pat Gelsinger admitting that Intel was gonna axe more business lines. And that reminded me that I was doing some deliveries today, and as I came over a hill I saw 6 or 7 very large cranes (not tower cranes, but the tracked kind with two-segment booms) in the mid distance.

Turns out they were all at Intel Fab 11X, which appears to be under heavy reconstruction/reconfiguration. Either that or it's just being taken apart, but that seems improbable since Wikipedia says it was upgraded to 14mm two years ago. But then again, it used to (and according to Intel's website still is, lol) where Optane happens.

Edit: It's being overhauled for Foveros: https://www.hpcwire.com/2021/05/03/intel-invests-3-5-billion-in-new-mexico-fab-to-focus-on-foveros/

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

BlankSystemDaemon posted:

The problem is, if you're doing CPU heavy workloads on a server, it's almost certainly either multithreaded or it benefits heavily from CPU caches (or, more likely, both), meaning you can't move the threads around as that causes the high-performance cache-line to be invalidated.

Also, modifications to the scheduler to add the heuristics to properly take advantage of the energy savings of using heterogenus multi-processing would add many thousands of lines of code to the hot path in the scheduler, likely offsetting any benefit gained in terms of added cputime used by the server during execution.

And if you're edging into HPC (even the weird, DIY-grade pseudo-HPC that I do) you very quickly learn to take direct control of things in a lot of ways that are contrary to general, modern OS use.

  • Don't even get the scheduler involved so far as your application is concerned. The rest of this list is mostly how you do that.
  • Is your application not FPU-bound? Then restrict it to running exactly as many instances as you have threads of execution.
  • Is it FPU-bound (AKA AVX)? Now you're down to running as many copies as you have physical cores
  • Oh, but is it multi-core aware within a single process? Then your work-queue manager needs to consider that and only run as many concurrent processes as can be supported on the machine
  • Don't enable swap at all. It's better to have a process OOM-killed and restarted from a checkpoint than to have many processes grind to a halt due to paging
  • Do you have a program which actively uses L3 rather than having things passively end up there due to management by the CPU/OS? Better know how much that program wants, and don't oversubscribe your cache either, because that gets nasty real fast (looking at you, Rosetta suite, wanting 4MB of L3 and running 1/3 as fast when you can't have it all to yourself)
  • Are you also doing GPU compute? Remember to leave a thread available for the process which is feeding the GPU

So yeah, heavy duty compute is a fundamentally different regime than human-scale, user-task oriented compute is. I believe heterogenous systems are fantastic for computing machinery which will be used by people. I also believe it'll be a minute before we work out how to make them a good and efficient fit for more backend things, even if we ignore HPC.

As a much more concise example, think about trying to set up meaningful auto-scaling rules for a K8S pod whose CPUs are not homogenous.

AMD's answer at the moment is to make Epycs in two variants: CPUs with full-fat cores for performance; and CPUs with smaller "C" core for things that will benefit from density rather than power. The only difference between the two is physical size due to amount of L3 included, and so far there are no products which blend the two. I don't know what Intel's plan is.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Dr. Video Games 0031 posted:

I'm not sure what faster memory does for you in productivity workloads though.

"Productivity" is a bullshit word that reviewers use to cover anything that isn't "running a video game demo", so the answer is everything from "nothing at all" to "a whole lot", depending.

What I can tell you from my experience is that sci/eng workloads tend to be about throughput (because you're doing the same operations on big piles of data), and memory speeds matter. Once or twice I have done BIOS upgrades but forgotten to reset RAM speeds (from stock 2133MT/s to my RAM's rated 3200MT/s). That translated to a 10-15% reduction in workrate over about 24h.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

redeyes posted:

Whats odd is i swear on my Win 10 box, the E cores always seem to be loaded with browser processes, and the P cores only seem to get used when Im doing something that is acutally something. On Win 11, it seemed opposite. I dunno which is better. Also i notice that the hyperthreaded virtual cores are almost never loaded where as before with some of the older Intels, they were constantly.

Browsers have a strong preference for getting things rendered as quickly as possible, because people get pissy when browsers are slow. Makes sense to me that you'd throw that at a P core. Are you sure that the causal relationship there is "Win 10 vs Win 11", and not "older browser version vs newer version which might P/E core-aware and making explicit scheduling/affinity requests"?

As for SMT, don't think of it as a "real" core and a "virtual" core. It's all real, marketing obfuscation aside. Each physical core is two fully pipelined and independently-schedulable ALUs, accompanied by a single FPU thar can only be scheduled by one of the ALUs at any given time. (Or possibly, is only ever usable by one of the pair of ALUs?)

A lot of tasks which were formerly well-suited to just using an ALU can now be handed to an E core, which takes less power - - and in the best case lets a whole P core stay idled. Sound like things are working as intended.

mdxi fucked around with this message at 19:36 on Oct 21, 2022

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

VorpalFish posted:

I thought the whole 2x alu sharing 1x fpu was a quirk specific only to AMD Bulldozer cpus, no? Don't think that's the case for Intel HT.

I was papering over some complexity when I said "fully pipelined ALUs". Yes, there are two (or more; now that we're in it I will point out that newer POWER variants can do SMT-8) of everything except FPUs in the current-gen SMT designs.

But I also wanted to be succinct and try make things easier to grapple with, so I chose to elide discussion of the fetch/decode units, etc, etc., etc. and just say "ALU", since that's the functional unit doing most of the work.

You can check this for yourself by doing some simple tests with runs of the Stockfish chess engine (which does not use floating-point math, and will scale up to the available number of ALUs) versus runs of OpenFOAM CFD (which is very much FPU-bound and will scale to the number of FPU/vector units on a die).

The construction cores were.... something else. I wasn't paying a lot of attention to PC CPUs during that period, and the little bit of background reading I've done since getting back into it makes them sound like a terrible mistake. A bet that workloads would evolve in a certain way, which (catastrophically) did not pan out.

After the dust had settled a bit, Intel implemented HT, which didn't initially work super-awesomely for them either. Over time, Intel refined the approach into something that worked pretty well for most workloads. With the Zen cores, AMD shipped an SMT implementation which (if I remember correctly from reviews) was a little better than what Intel had at the time. At this point, five years on from the release of Ryzen, I think things are pretty much at parity across brands.

mdxi fucked around with this message at 23:14 on Oct 21, 2022

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Potato Salad posted:

why the gently caress is Intel being allowed to slow down its subsidized rollout of domestic fab capacity

how are we not holding them at the point of a bayonet on this

this is a deeply crucial national security vulnerability


Methanar posted:

I just don't know how we got to the point that the overwhelming majority of the global semiconductor manufacturing base was built within artillery range of NK and China. This problem should have been obvious 20 years ago to the military.

:capitalism:

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Methylethylaldehyde posted:

On a modern processor with per core opportunistic clock boosts, keeping the silicon super cool can allow it to boost higher,

As a really good (if very atypical) example of this, I have a machine in my garage which runs 24/7. The CPU is a Ryzen 5950X running in Eco Mode (65W), with a Noctua C14S, in a Fractal Pop Air case. Last night it was down around 19degF and my garage was right at freezing. This morning that CPU was running just shy of 4GHz all-core with a Tctl of 53C. In the moment it had effectively infinite thermal overhead and was going as balls-out fast as the wattage limits would allow.

Edit: currently, the ambient temp has warmed up a bit and the workload has shifted to 16 threads loaded instead of 32, and it is now running at 3GHz with Tctl of 67C.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Thunder Bay x86 + Movidius VPU SOC, which turned out to (theoretically) be Arm A53 + VPU, now cancelled before production.

https://www.phoronix.com/news/Intel-Thunder-Bay-Cancelled

Just mentioning this here because it's so weird to swerve from "we'll make an integrated x86 computer vision solution", to (one supposes) "oops turns out that Arm works better", to nothing.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Clearly it's time for Raja and Phil Harrison to jointly found a startup that will innovate failure in the graphics and gaming spaces. Nobody else has their experience in loving up the things they're supposed to be experts at!

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Craptacular! posted:

People poo poo on Raja

I understand that he was a great engineer, but I think there's overwhelming evidence that he rose to his level of incompetence as a director of engineering teams.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Gordon Moore is dead, so computers can now only gain performance by being larger and consuming more energy.

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

wet_goods posted:

Remember when he said “this is the bottom” like three quarters ago

Is Ghostty a fniancial analyst for Intel?

Adbot
ADBOT LOVES YOU

mdxi
Mar 13, 2006

to JERK OFF is to be close to GOD... only with SPURTING

Fab 9 is now open, co-located with Fab 11X in New Mexico.

https://www.techpowerup.com/318257/intel-opens-fab-9-foundry-in-new-mexico

Apparently it does something something EMIB Foveros something chiplets etc.?

Fab 11X had previously been the home of Optane (rest in piss), and I legit have no idea what's going on there now.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply