|
Question born of lurid curiosity: I have a Core i9-7940x and it's been great. But outside of fixing the AVX clock scaling issue with Rocket Lake, what additional AVX512 improvements have been made with this new implementation? edit: Also, will AVX512 optimizations for Rocket Lake at least partially apply to my Skylake-X chip? Hasturtium fucked around with this message at 15:49 on Apr 10, 2021 |
# ¿ Apr 10, 2021 14:27 |
|
|
# ¿ Apr 27, 2024 01:13 |
|
Kazinsal posted:e: Sarcasm aside, did anyone ever do an OP for a non-x86 CPU architectures thread? Or is there even really enough interest in having one? As someone who has been lusting after one of Raptor Computing's POWER9 setups for years, that would be welcome. Anybody wants to start talking about the ghosts of SPARC and MIPS, SMT4 and SMT8, and what server-targeted ARM is like, I'd be there.
|
# ¿ Jun 18, 2021 03:06 |
|
Kazinsal posted:Awesome. I started working on an OP a while ago but lost what I had in a power blip so I’ll probably start a new thread this evening and write about the architectures I know about and let other people contribute primers for the ones I don’t. Thank you, link it here. I'd love to find a happy medium between "set up a cluster of Raspberry Pi's" and "drop three grand for a POWER setup" that doesn't involve raiding eBay for a closeout server from over a decade ago, or somebody's microATX Amiga non-starter.
|
# ¿ Jun 18, 2021 03:26 |
|
Thanks! I chipped in a quick blurb about Power to get the ball rolling.
|
# ¿ Jun 18, 2021 12:38 |
|
I’d love to hear someone compare the M1 to Power9, both in terms of relative performance and to compare their respective embraces of instruction-level versus thread-level parallelism. They are built on wildly different processes and for different markets, but it would still be illuminating.
Hasturtium fucked around with this message at 03:56 on Aug 15, 2021 |
# ¿ Aug 14, 2021 16:32 |
|
Fantastic Foreskin posted:without accounting for the business reasons driving the design process it's an apples to oranges kind of deal.
|
# ¿ Aug 14, 2021 17:49 |
|
Arivia posted:Speaking of x86/x64 being hemmed in by ancient poo poo, I saw an offhand reference the other day that you could still run code from like 8088s on today's processors. I know there was something like the x86 chips these days basically popping themselves through the various modes really quickly at startup so you're out of real mode and protected mode and into running actual x64 code. But I figured that at some point Intel or AMD must have gone "seriously we can just get rid of the poo poo for running programs from like 1990" to clean things up - I know you can't run 16bit executables any more, but still having a bunch of legacy mode support in the CPU feels like a big waste of time. Am I mixing things up/totally wrong? I don’t think there’s anything nominally preventing you from running 16-bit apps in real mode, though Intel cut gate A20 support with Haswell so most DOS memory extenders don’t work any more in protected mode. You can also (sorta) run Win9x, barring the lack of drivers for just about anything made since 2006. Windows imposing limits on 16-bit protected mode code in newer versions (like for legacy program installers) doesn’t necessarily speak to what the CPUs themselves can do. By and large they really can run a TON of old code.
|
# ¿ Aug 16, 2021 00:54 |
|
BurritoJustice posted:Multicore enhancement on Rocketlake is absurd, my friends 11900k is reporting 283w package power out of the box in P95. Its a Z590 ROG HERO SUPER GAMER (etc), but it's completely unmodified bios settings other than XMP.. I know it's a power virus but that is absurd it's letting it run that high. It was only managing 4.5GHz too (though I believe with AVX-512). I haven't seen much in the way of apples-to-apples comparisons, but do the high end Rocketlake chips actually draw comparable power to the Skylake HEDT platform on average? Or more? I'm having trouble imagining an eight core chip pumping out more heat than my 7940x running at alleged stock clocks. Hasturtium fucked around with this message at 18:25 on Aug 18, 2021 |
# ¿ Aug 18, 2021 18:15 |
|
Twerk from Home posted:Rocket Lake has AVX512, right? So Intel technically wins that. Rocket Lake does have AVX-512 - it's one of the only areas where its performance cheerfully zoomed away from Comet Lake. Prior to that, a limited earlier subset of the instructions was available in Skylake-X for the Xeon and HEDT markets. If AMD does successfully entrench AVX-512 by bringing it to the mass market, it'll be a surprise.
|
# ¿ Aug 19, 2021 22:15 |
|
BlankSystemDaemon posted:Too bad the SMT implementation was a loving disaster to the point that even Intel admitted and removed it, and only reintroduced SMT in Nehalem when they'd finally manged to do it properly. Do you have info on the Netburst SMT implementations? I remember it being a mixed bag under the best of circumstances but would like a refresher with knowledge of better practices this far out.
|
# ¿ Jan 22, 2022 18:04 |
|
Dr. Video Games 0031 posted:It really depends on the game and the review setup. I've seen reviews that put the 5600X slightly ahead of the 12400 in both gaming performance and power efficiency, like these: https://www.techspot.com/review/2392-intel-core-i5-12400/, https://www.techpowerup.com/review/intel-core-i5-12400f/ I have a perfectly dumb question: for the purpose of gaming, would a 12400 be a better general choice than a 7940x? Just wondering if the per-core grunt and lower latency between cores would elevate it above fourteen cores of Skylake-X justice.
|
# ¿ Feb 23, 2022 02:19 |
|
Intel still comes across as scattered and reactive when it pulls poo poo like this and I’m glad they’re getting some flak for it. Is there any kind of decent write up on the differences between that earlier subset of the spec and what’s made it to Rocket Lake before being snuffed out piecemeal with Alder?
|
# ¿ Mar 15, 2022 13:13 |
|
Boat Stuck posted:Is it normal for Alder Lake to run, like, really hot? I think the 180W is probably a bigger contributing factor to heat than the 280mm rad, but I’m also managing temps in the mid-60s on a stock 7940x with a 240mm AIO, so something is amiss. Definitely start with a repaste, then see if you need to go UEFI spelunking to lower the power target.
|
# ¿ Mar 15, 2022 20:46 |
|
RME posted:Mostly just a technical curiosity but: how does the scheduler(? Or whatever is responsible) figure out what to throw at the e cores anyways That falls under the purview of the Alder Lake Thread Director, an integrated microcontroller whose entire responsibility is ensuring the right workloads go to the correct chips. It requires integration with the OS scheduler - to my knowledge it’s only supported by Windows 11 and the Linux 5.18 kernel. Anandtech has a pretty good write up on Alder Lake that goes into more detail.
|
# ¿ Jun 20, 2022 13:49 |
|
mdxi posted:5.18 might be when Alder Lake support went in, but heterogenous multiprocessing has been a well-solved problem in the mainline Linux kernel -- and iOS -- for years. As usual, it's just Windows that needs to play catch-up. Oh, sure. Intel is late to this party, and so far AMD's been a no-show. Agner Fog went into some detail regarding why Intel's implementation needs special consideration - between the lack of SMT on little cores and overall scope of architectural difference beyond "bigger chip make go faster than little," Intel's made itself a strange bed to lie in this round. The BIG-only Alder models feel like a solid generational improvement that's juiced with enough power to reduce effective power efficiency despite a nice process improvement. The BIG.little ones feel like a weirdly reactionary response to emerging trends - disabling AVX-512 with no option to reuse it is going to bite their efforts to spur adoption hard. Even beyond the lack of support in the E cores, maybe they're afraid throwing power at the architecture and then factoring in AVX-512 will blow way past the bigger power envelopes they're already struggling with?
|
# ¿ Jun 20, 2022 17:15 |
|
VorpalFish posted:The power envelopes seem almost entirely based on getting bigger number in cinebench. For the 240w cpus you lose something like 8% performance dropping to 125w. Yeah, it's nuts. AMD's doing this too - the 5800x eats half again as much power compared to the 5700x, and wins out by somewhere between 5-8% in benchmarks. Unlocking that last ten percent of performance from modern silicon is very expensive. Your second point is true - the chip won't push beyond the power envelope, but clocks will sag to accommodate that. I recently sold a 7940x where I got to test that quite a bit, and for what I was getting out of the chip the heat was positively bleary. After living with it for four years in north Texas I cried uncle, sold it, and replaced it with a 5700x.
|
# ¿ Jun 20, 2022 17:31 |
|
JawnV6 posted:gosh, i knew "hide it behind ACPI" was kinda silly and while it's understandable that certain problems aren't amenable to a pure HW solution? that strikes me as overengineered It’s a problem of the differences between the P cores and E cores - the former support SMT, the latter don’t, they have substantially different performance characteristics from each other besides that, and the result is that an OS scheduler alone is unlikely to make good decisions in distributing workloads across them. It’s over-engineered because the chip’s been lashed together from two disparate CPU families instead of a more traditional BIG.little arrangement. And the worst problem in my eyes is that the E cores have been nudged to excessive clock speeds to goose performance, which defeats the ostensible power-saving purpose of their inclusion outside of providing extra threads for benchmarks. I’d genuinely like to see how an underclocked, undervolted 12600K would do compared to most configurations in the wild. Hasturtium fucked around with this message at 00:14 on Jun 22, 2022 |
# ¿ Jun 21, 2022 23:16 |
|
Winifred Madgers posted:Step 1: Give it less volts. To OP wondering about CPU undervolting: it's... kinda tricky. In days of yore, when I was undervolting an AMD FX-8320, it literally involved stepping down the core voltage by small increments and testing until I found a point where it became unstable, then nudging the voltage up slightly, testing again to see if it was stable, and then hopefully calling it a day. This approach may still work on a modern CPU - hop into the BIOS, see where your core voltages lie, and start gently adjusting voltages and seeing what happens. redeyes posted:Went from 3900x to 12700k. Intel is FAR better for various reasons. I am using a DAW studio type computer and Intel is raping AMD in latency out of the box. There are many variables but still, Intel rules the roost for this. Jesus, goon, I get you're excited, but dial it back. Hasturtium fucked around with this message at 03:49 on Jun 29, 2022 |
# ¿ Jun 29, 2022 03:46 |
|
mobby_6kl posted:I'm curious about the power considering how much Alder Lake needs to hit the last hundreds of mhz. Maybe they'd at least run the E-cores at appropriate voltage as someone mentioned before. I think someone here (Paul?) confirmed that the little cores are running on the same 12V line as the big ones. Since the little cores aren’t operating at an efficient voltage, Intel decided to juice the clocks to get what performance they could out of them, and with Raptor Lake they’re literally doubling down on the strategy. In the future - and especially for mobile-targeted chips - I’d expect Intel to adopt BIG.little with appropriate voltages to suit the strengths of each, but that will likely require a socket change to accomplish.
|
# ¿ Jun 29, 2022 19:56 |
|
What’s the per-clock difference between Rocket and Alder Lake outside of AVX-512, where Intel apparently decided that because little Alder cores couldn’t run it, no Alder cores should run it? Just kills me, is all. I’m wondering if it’d be worthwhile for a machine chiefly concerned with crunching video to skip Alder this time and save a little money with Rocket Lake.
|
# ¿ Aug 20, 2022 05:52 |
|
PBCrunch posted:MMX instructions were introduced on the fourth (!) process node version of the Pentium. You said it all better than I could have. One Weird Trick I remember reading, years after I could have used that information, was that with beefier cooling P55C still supported 3.3V signaling and could work on older motherboards, especially since they used a remapped multiplier. I wish I'd known that when I stumbled on a dual socket Digital Pentium 90 on a curb back around 2002 - a pair of Pentium MMXes would have made it a decently spry Shoutcast server in a corner of my room. It is weird how the Pentium name lost its luster with the Pentium 4 and was then kept around as the rung above their Celeron-named chips for a decade and a half. I think deprecating those two product lines is going to bite them in the rear end - "drat it, why did I buy this low end desktop with an INTEL PROCESSOR?"
|
# ¿ Sep 17, 2022 00:09 |
|
Rinkles posted:Are the Quicksync (QSV) presets not available because I'm using my dGPU? This is Handbrake. That appears to be the case, if your setup prevents the IGP from being used while a dGPU is in use. Some motherboards and configurations are more accommodating than others.
|
# ¿ Sep 24, 2022 18:55 |
|
Twerk from Home posted:A little fiddling with PBO curves and you can reduce power usage by 50W without giving up a lick of performance. If you're willing to sacrifice 3-5% you can probably cut power usage in half. Amen. I’m pretty well computationally set for a while on my desktops, but if I decide to get ambitious again I would never run one of these chips at fireball stock settings. Cutting the stock max voltage in half, still getting 90% of the performance, and not outrageously heating up my Texas house is a no-brainer for me.
|
# ¿ Sep 28, 2022 16:07 |
|
Palladium posted:its funny when intel NICs used to be regarded as 100% bulletproof and realtek was the crappy alternative, when I never ever had any problems with realtek based stuff from audio to USB wifi Seems like a reflection of Intel stagnating for too long before working to turn the tide, and of Realtek becoming so ubiquitous that they were pressured into a level of baseline competence. Their kit 20-ish years ago was genuinely awful, as evidenced by a legendary bitching session in the comments of the FreeBSD driver for their NICs.
|
# ¿ Oct 23, 2022 16:56 |
|
HalloKitty posted:Maybe I can conjure up some bad feelings: On the bad: I remember the ECS K7S5A, a motherboard so awful it went through at least five revisions. A friend of mine got a special on one at Fry’s which literally wouldn’t work with one class of CPUs or another because the wrong kind of resistors were soldered onto part of the board. Even a “good” one was flaky, and then most of them died of capacitor plague. VIA was just half a rung above SiS, and we only put up with them because there wasn’t another high profile supporter of AMD chipsets for Slot/Socket A for years. Those were fun times with poo poo hardware. On the good: I managed to snag a DFI SB600-C motherboard on eBay for like $20 a while back. It was a Sandy/Ivy Bridge board with five vanilla PCI slots and PCI Express x16, and led to me falling into a wormhole of trying to make various jacked up MS-DOS versions work with PCI audio and a preposterously overpowered CPU for the purpose. Ended up giving it to a friend, but the build quality was basically perfect for what it was, outside of never getting mini-PCIe storage working. DFI is still out there doing good work.
|
# ¿ Oct 26, 2022 16:52 |
|
Cygni posted:I've definitely built more systems with K7S5A's than any other single motherboard. Maybe a few hundred, all told? I would take small contracts in college and go in and out of Frys repeatedly for the "1 per household" insanely cheap sub $80 Duron+K7S5A combos to fill them. The worst part is that while the K7S5A was flaky, like you pointed out, it was actually BETTER than most other Socket A boards out at the time. It feels like there was always some major issue or other until the later Nforce 2 boards became mature. It really seems like the Slot A boards with Irongate chipsets (which, IIRC, were essentially AMD-blessed VIA ones) were more predictable than the slew of socket A boards that came later. They weren’t necessarily better - AGP was so awful and conditional on the platform I stuck with PCI graphics on my Athlon 500, forever ago - but there was a consistency to the experience. The KT series by VIA in all its permutations and inconsistencies was like an old war wound that only faded after I’d spent a blessed decade plus not worrying over them. Remember how badly they got along with Creative sound cards? I haven’t forgotten. Nforce2 had quirks, but for running a regular Windows XP box with an Athlon XP 2400+ and a 6600GT, my Gigabyte board was stable and pretty hassle-free. Compared to what came before it felt like I was sitting pretty. But hey! In terms of brain-melting boards that were everywhere for a while as an indictment of capitalism, at least we aren’t talking about the FIC VA-503+ back on Super7.
|
# ¿ Oct 26, 2022 20:15 |
|
Yeah, the very first PC my family bought was a Compaq Pentium 90 and sported a whopping 150W power supply. Considering the demands of that 1995 configuration it was pretty generously over spec.
|
# ¿ Oct 30, 2022 17:26 |
|
Cygni posted:https://twitter.com/BIOSTAR_Global/status/1589596025349554177 If you’ve got a factory floor that needs legacy PCI controller cards and serial connectors, this very much would be peak performance. I’d be interested to know how they implemented the legacy PCI bus here; from what I understood playing around with an industrial DFI SB600-C (itself a Sandy/Ivy board replete with PCI slots), everybody more or less stopped bothering with legacy PCI and simply used bridge chips for consumer market PCI slot backwards compatibility starting around Haswell on the Intel side and Bulldozer on AMD.
|
# ¿ Nov 7, 2022 17:55 |
|
WhyteRyce posted:it has to be a bridge chip because Intel stopped putting native PCI into the chipset some time ago I figured it had to be a bridge chip, yeah. It would be nice to think some enterprising firm built a reliable bridge that could handle the peak 666MB/second five vanilla PCI slots could drive. That’d amount to, what, PCIe 3.0 x1 with some generous headroom? Bigger question is whether it would accurately handle some PCI weirdnesses like port addressing or DMA behavior; there was some lamenting over on VOGONS that more recent PCIe bridge solutions weren’t allowing access to Yamaha OPL3 chips on elderly sound cards and the like. It makes me wonder about the reality of the economics of this sector and how much any of that comes into play for industrial purposes. Lord knows something like this would have been welcome at a lab where I used to work - virtualization and device passthrough for electron microscope controller boards that never got driver support past Windows 2000 was a concern all the way back in 2009… Hasturtium fucked around with this message at 19:30 on Nov 7, 2022 |
# ¿ Nov 7, 2022 19:24 |
|
hobbesmaster posted:Skylake to Coffee Lake are still very common for embedded and extended availability applications. Jokes aside, Intel has a lot of experience with and fab space for 14nm. This is 100% correct. That’s not a board intended for high CPU performance in absolute terms anyway - looks like a 4+2 VRM setup. Cutting edge is a lot less important for these roles than “setup and configure once, then run without incident indefinitely.”
|
# ¿ Nov 8, 2022 18:02 |
|
SourKraut posted:Apple should have just updated the MP to use these instead of the burning dumpster fire that is the ASi Mac Pro. They were eager to move onto in-house solutions for PR, but the bet on an M2 Extreme / four M2 Max chips all working in tandem fell apart due to manufacturing problems. Thus, the Mac Pro 2023 is the fallback: literally an Ultra Studio with PCIe 4 support enabled by switching, no external GPU support, and 64GB of non-upgradable RAM starting at seven thousand dollars. I know you know how insane this is, but I’m still aghast.
|
# ¿ Jul 2, 2023 20:44 |
|
Twerk from Home posted:When was this? Both Athlon XPs and 64s looked really efficient against Pentium 4s. Hell, K6es weren't that hot unless you were trying to overclock the poo poo out of them. K6 also had the problem of a pipeline all of four stages deep. It just didn’t scale well to higher clocks, and I’m a little impressed they got up to 550MHz with K6-3+ and its sizable cache. But the efficiency advantage has bounced between Intel and AMD repeatedly - Netburst was power-drinking trash versus K7 and K8 positively pantsed it, Conroe/Core 2 put Intel back on top until things leveled out again with Phenom II, and then Intel made a huge jump on efficiency with Sandy Bridge that they maintained over AMD’s construction cores until Ryzen came back around and started competing on efficiency while Intel was stuck on 14nm for half a decade plus. My growing suspicion is that the don’t-call-it-Atom-descended E cores will eventually be grown to supersede the P cores in future Intel designs, as the latter is grossly less efficient in real terms for everything but SIMD-heavy work. Zen4c was an interesting recent development - I’ll be interested in seeing where things shake out in the next five years.
|
# ¿ Jul 4, 2023 18:12 |
|
DoombatINC posted:That's our Intel babyyy, right as brands like Minisforum and Beelink are popularizing the ultra small form factor for consumers they pull up stakes and leave the market It’s amazing - they’ve been the flagbearer for the sector, to the point that a recent knockoff low-end brand is called ATOPNUC, and they’re kiboshing the line. The flop sweat is alarming.
|
# ¿ Jul 11, 2023 21:31 |
|
BobHoward posted:The 'construction machine' Family 15h chips - Bulldozer and its successors Piledriver, Steamroller, and Excavator - were all pretty bad. Yes - clustered modular threading/CMT didn’t pan out for AMD. The siloed integer/cache complexes only shared a prefetch unit, a decode unit (which was re-duplicated for Steamroller to improve performance, then rolled back to one for Excavator for power savings), and the FPU. The FPU was dual-issue and capable of handling two 128-bit values at once, and at least for Excavator those could be ganged together for 256-bit work, as those last chips finally added support for AVX2. With lower IPC and clocks failing to scale to the levels needed for competitive performance (partly due to AMD's process disadvantage with Global Foundries), the chips were only capable in well-threaded and integer-heavy niches. I knew a few people who kept them around for munching DVD rips and cheap build servers, and I swore by my FX-8320 as a quirky but decent workhorse, but Intel's chips outclassed them for general workloads. Simple as that. As a weird side note/epilogue, there’s a dirt cheap mini PC made by ATOPNUC available now, the MA90, featuring an AMD A9 9400 - a single module/dual thread Excavator from 2016 with a Radeon R5. I obtained it for the princely sum of $86 before taxes, as it is cheaper than a Pi-alike for running Pi-Hole, and in my initial testing it will not surprise you to learn, dear goons, that it is slow. Hasturtium fucked around with this message at 02:53 on Jul 12, 2023 |
# ¿ Jul 12, 2023 01:26 |
|
Klyith posted:I still use an A9-something craptop. And get this, most of the time I limit the CPU speed to 70%. I absolutely believe it. Last night I got YouTube running on Ubuntu and it’s amazing how hard those two cores work to just sit on a webpage and play 720p video. If you haven’t installed h264ify, that will probably help on the IGP, but I haven’t exhaustively verified that. It skins my nose a little that this thing is running single-channel memory, but I happen to have two 4GB SO-DIMMs left over from an old project. Later today I’ll swap RAM and see what a difference it makes. I’ll verify that it took at least 20 minutes to compile GZDoom, stock.
|
# ¿ Jul 12, 2023 13:57 |
|
Kazinsal posted:I would love to see how long it takes to compile binutils and gcc on this thing It’s pokey on vanilla Ubuntu, though it picked up a smidge switching to Xubuntu + lightdm. I fear how pokey this would be trying to lug Windows 10 around - just having a few Firefox tabs open makes it chug. I am… tempted by your proposal. Let me kick it around a bit first. Also, genuinely impressed that dual channel memory doesn’t seem to have made much difference.
|
# ¿ Jul 13, 2023 01:50 |
|
Kazinsal posted:If you want something fairly reproducible for comparison then a buddy of mine has a script set that can produce gcc cross toolchains easily: https://github.com/travisg/toolchains Y’know, this would be riotously funny to run and compare on my eight core Power9 versus this thing. 32 threads of screaming ppc64le versus… this.
|
# ¿ Jul 13, 2023 02:07 |
|
Kazinsal posted:That’d be brilliant. The script can only guess at 2-way SMT so for 4-way SMT you’ll have to pass -j32 as well. Just FYI, in poking through the script the command used to determine -j’s default value accurately returned 32 on the Power9. I’ll let you know numbers after the grinding has concluded, but the A9 basically feels like a Core 2 Duo, and the Power9 is closer to a 3950x.
|
# ¿ Jul 13, 2023 12:53 |
|
Kazinsal posted:That’d be brilliant. The script can only guess at 2-way SMT so for 4-way SMT you’ll have to pass -j32 as well. All right, numbers have been run. The time needed to compile the x86_64 version of the GCC 13.1.0 + binutils toolchain with the helpful link you posted: IBM Power9, 8 cores 32 threads (-j 32), 32GB DDR4-2666 dual-channel, Samsung 870 256GB SATA SSD real 9m45.233s user 124m55.872s sys 3m58.117s AMD A9 9400, 2 cores(ish) 2 threads (-j 4 - don't ask), 8GB DDR4-1600 dual-channel, 128GB generic SATA M.2 real 80m30.175s user 127m45.480s sys 17m51.778s If anything, I'm a little surprised the A9 didn't take even longer.
|
# ¿ Jul 13, 2023 21:21 |
|
|
# ¿ Apr 27, 2024 01:13 |
|
Paul MaudDib posted:well, getting away from that is what keller did with the royal core series. "rentable units" are supposedly the replacement for hyperthreading in the royal core series, starting with arrow lake. it's not quite clear what "rentable unit" means, but presumably some kind of execution resource that can be allocated to a thread, or a shared unit between multiple threads that is shared in a module (like FPUs in CMT)? But either way they are thinking about managing that balance of registers/visibility complexity/etc bloating area per core etc. This is all really interesting, thank you. Can you link to some resources so I can read more about this?
|
# ¿ Aug 2, 2023 18:48 |