|
SwissArmyDruid posted:Paul, you're more familiar with Intel than I am, are you aware of any functional benefits to die-stacking a la Foveros with regards to power consumption? Because I think there's still an AMD patent from a handful of years back when we were still speculating that the IO die was going to be an interposer that Zen chiplets and an HBM bump were stacked onto. foveros is a big mystery to me as well, Intel hasn't said a lot in public to extrapolate from, and the problems with stacking multiple compute dies are pretty obvious in terms of thermals / etc. obviously a lot of the power consumption from infinity fabric isn't inherent to the protocol itself, AMD uses monolithic dies with infinity fabric attaching various parts and that is fine. the power consumption comes from having to run a beefier PHY to overpower the parasitic inductance/capacitance of the bigger+longer wires that go off the die, through the interposer, and back on. I'm unclear what exactly Foveros has to offer here vs AMD's interposer technology but I think that's going to be the relevant metric - how Foveros decreases those parasitics, because that is directly related to how much power you need to drive them. it may be that the innovation here is that because Foveros is an "active interposer" technology that you need to drive it a lot less hard - because it's not driving a big giant wire with lots of parasitics, it's jumping the microbump (which, granted, will still be a lot more parasitics than just a trace inside a monolithic die) and then right into another transistor inside the active interposer, so the only "trace" involved is crossing the microbump. I would raise a speculative guess that the active-interposer stuff that AMD has been talking about is functionally equivalent to Foveros from a design perspective here. Cygni posted:Thats a good point, I hadnt considered the little cores being off on their own ~Shame chiplet~. There is gonna be a lot of complexity in this next wave of chiplet/tile, big-little, fighting ARM, fully SoC tom foolery were gonna be entering. "shame chiplet" it's just a guess and I suppose that's maybe not as definite as I think it is. I just think servers will definitely want "all-big" configurations and if that chiplet exists then it becomes trivial to offer an enthusiast package with all-big as well. that would be consistent with how AMD has utilized the enthusiast lineup as a binning pressure relief for the server lineup so far. but at the same time, there will be a power penalty to big chiplet+little chiplet. I know the ideal is that a lot of time the "big" chiplet would be gated off but who knows how possible that will really be. I guess it really depends on the performance, supposedly the new Tremont cores are pushing Skylake-esque performance already and maybe if you have something like that you only power up the big chiplets on really big sustained tasks and just let the little cores handle the day-to-day. mixed chiplets let you avoid that penalty, you could power down one chiplet entirely unless it's a really big workload, while also still being able to gate the big cores on the mixed chiplet if there's not a whole lot to work on. But I doubt servers are going to bite heavily on that since they don't care about idle power, servers are specced at some reasonable approximation of full load. I guess talking it through, perhaps one all-big + one mixed chiplet would be an ideal configuration here. Maybe they don't do an all-little chiplet at all and just do an all-big chiplet and a mixed chiplet (and then APUs). That gives you better increments as far as powering the thing up, you have "one chiplet up, little only", "one chiplet up, all cores up", and "mixed chiplet up + big chiplet up". edit: I think "all-big" and "all-small" (or mixed chiplets with disabled big cores) is also going to be important going forward for segregating caches to prevent timing side-channels, because it seems obvious at this point that shared cache in a speculative architecture is a bottomless pit of vulnerabilities. the fix is going to be segregating "secure tasks" where you would be concerned with data leakage onto a slower, secure chiplet that does much less speculation and ideally shares as little cache as possible between threads, likely with no SMT/hyperthreading (since that is also a bottomless pit of vulnerabilities), while letting CPU-intensive tasks that don't need security run on faster cores that do speculation/etc. this maps precisely onto the "big.LITTLE" model that both companies are now embracing - perhaps minus the OoO/speculation that Intel has adopted lately in Silvermont/Goldmont/Tremont. You just need to do the work of whitelisting which tasks are probably fine to run on a faster, insecure core. AMD's model where you have multiple basically independent chiplets, with their own caches, that just happen to share a memory controller (but no cache on the memory controller itself) fits this concept quite well. And they can also mix architectures as well, since the only thing the IO die or CCDs care about is talking Infinity Fabric to the other side, all the caching is completely self-contained on the chiplet. Paul MaudDib fucked around with this message at 03:57 on May 8, 2021 |
# ? May 8, 2021 03:36 |
|
|
# ? Apr 25, 2024 00:45 |
|
I want to go the the
|
# ? May 8, 2021 03:44 |
|
please don't bring back jaguar for the little cores
|
# ? May 8, 2021 07:10 |
|
What's the theoretical use case for the shame chiplet? Running all the Windows / OS / services / background stuff on it and devoting the big boy package to the foreground application?
|
# ? May 8, 2021 09:12 |
|
Icept posted:What's the theoretical use case for the shame chiplet? Running all the Windows / OS / services / background stuff on it and devoting the big boy package to the foreground application? theoretically that's possible, sure. but: 1) that requires placing trust in an OS scheduler to do its job properly. windows sometimes already doesn't do a terribly good job on a relatively homogenous chiplet design like ryzen and requires manual intervention with threads/core affinity 2) you get 12-16 threads on a mid-range ryzen and they're all good threads lol
|
# ? May 8, 2021 09:49 |
|
Agreed... but why are AMD/Intel pursuing it for desktop CPUs? It makes sense for mobile because 80% of the time you're just texting or whatever so there's no reason to burn battery on the big cores. Or is it just because the desktop CPUs are derived from a common stack with an APU/laptop focus so the smol cores got to happen just by association?
|
# ? May 8, 2021 09:56 |
|
Anime Schoolgirl posted:please don't bring back jaguar for the little cores your vile wishes can't blot out my pure love for the $50 Kabini AM1 combo
|
# ? May 8, 2021 10:51 |
|
zen4 is also supposed to introduce avx--512 support
|
# ? May 8, 2021 11:02 |
|
Paul MaudDib posted:zen4 is also supposed to introduce avx--512 support
|
# ? May 8, 2021 11:07 |
|
Icept posted:What's the theoretical use case for the shame chiplet? Running all the Windows / OS / services / background stuff on it and devoting the big boy package to the foreground application? Yes. Consider these approximations for Apple's M1 small cores relative to M1 big cores: Area: ~0.25x Power @ max freq: ~0.1x Perf @ max freq: ~0.33x The small cores have about 3.3x perf/W () and 1.3x perf/area. You wouldn't want a chip with nothing but the small cores since high ST performance is quite important for general purpose computing, but having some small cores is awesome. Using less energy to run all those lightweight system threads frees up power to run the threads you want to go fast on the big cores. That said, will AMD and Intel have small cores as good as Apple's? Seems very doubtful! Small cores are where you expect the advantages of a clean RISC architecture to be greatest, and Apple's been putting a lot of effort into their small core designs for a long time, while AMD and Intel have not. And will Microsoft have a scheduler as good at using small cores as Apple's? Also doubtful.
|
# ? May 8, 2021 12:08 |
|
Shame chiplet has to become the accepted nomenclature. BobHoward posted:The small cores have about 3.3x perf/W () and 1.3x perf/area. quote:And will Microsoft have a scheduler as good at using small cores as Apple's? Also doubtful. ConanTheLibrarian fucked around with this message at 12:36 on May 8, 2021 |
# ? May 8, 2021 12:34 |
|
Anime Schoolgirl posted:please don't bring back jaguar for the little cores I rather like my old E-350 bobcat based server that is still running in my living room, but I agree
|
# ? May 8, 2021 14:07 |
|
BobHoward posted:Yes. Consider these approximations for Apple's M1 small cores relative to M1 big cores: A smol core EPYC with AVX512 at half the area would mean 128 cores on 7nm, with 2048 6 Tera flops would need up to 2 reads and 1 write per fp32 op, demanding peak bandwidths of 48 Terabytes/s for reading and 24 Terabytes/s for writing. Would such a machine be a good ML training workhorse?
|
# ? May 8, 2021 14:13 |
|
I feel that until AMD can get Fabric/IO power down, small core and mixed chiplets just don't sound very appealing. Maybe if you add a small core cluster to the IO die and power down the IF links and chiplets, but then you probably have to fab it on a advanced node.
|
# ? May 8, 2021 15:02 |
|
Speculation: They could put the small cores on the IO chiplet and for low power operation just turn off the big cores/chiplet + IF bus and save power that way. Would give the small cores slightly better main system RAM latency too for a percent or 2 more performance. Since the small cores are supposed to be actually small + low power, and the IO die is already a fairly decent size and required for their chiplet approach, squeezing them on shouldn't be too onerous. Yeah the process for the IO die is different ("12"nm GF process) and not nearly as good as TSMC's 7nm or 6nm processes but it'll be good enough for some low power optimized lower priority cores at ~2-3Ghz which is likely all that is necessary.
|
# ? May 9, 2021 05:18 |
|
Wait, if the IMC is inside the IO die, will moving to DDR5 and its higher bandwidth increase power usage? That might necessitate dropping it down to 7nm.
|
# ? May 9, 2021 06:46 |
|
Maybe? edit: The IOD uses ~15W. I believe the IF bus power use is a bigger issue than the memory controller, especially on Epyc. We don't have any numbers for a DDR5 memory controller so all we could do is guess. From what I recall they're using GF's 12nm for the IOD right now because memory controller scaling is so abysmal with smaller nodes they get nearly no benefit while paying much higher costs and dealing with more supply constraint issues. Its always possible they could use TSMC's 10nm process instead if they really do have to use a more power efficient + smaller feature process. PC LOAD LETTER fucked around with this message at 08:27 on May 9, 2021 |
# ? May 9, 2021 08:18 |
|
Supposedly AMD is moving to a 6nm IO die at some point here.
|
# ? May 9, 2021 08:25 |
|
PC LOAD LETTER posted:Speculation: They could put the small cores on the IO chiplet and for low power operation just turn off the big cores/chiplet + IF bus and save power that way. Would give the small cores slightly better main system RAM latency too for a percent or 2 more performance. I don't think this will happen. Cores (especially small ones) occupy a surprisingly low proportion of the CPU's area. With small cores, cache would be the dominant feature. The IO die would have to be substantially larger to fit the compute elements. For reference, here's a Zen 2 die. Purple is L3, orange is L2, green is the core.
|
# ? May 9, 2021 10:43 |
|
ConanTheLibrarian posted:I don't think this will happen. Why would the small low power cores have to have the same or more amount of cache though? Would a L3 even make much sense if they don't have to hop over the IF bus to the system RAM? If its mostly doing back round or light duty stuff anyways won't the cache requirements be reduced a fair amount too? If the cache requirements are as high as the main CPU AND you need like 4 or 8 of them then yeah it starts to make less sense to put it on the IOD and it becomes more sensible to put them on the main die with the higher power CPU's. edit: Or do a dedicated low power CPU die too of course. Either could work. PC LOAD LETTER fucked around with this message at 13:51 on May 9, 2021 |
# ? May 9, 2021 13:39 |
|
PC LOAD LETTER posted:Why would the small low power cores have to have the same or more amount of cache though? Would a L3 even make much sense Cache and cache architectures are overwhelmingly important to the performance of modern CPUs. You don't dedicate 60%+ of your silicon to something if it ain't worth having. It might be true that a small/light duty CPU can get away with less $L3 (compared to, say, a core which is tasked with HPC workloads) and still feel performant, but I genuinely don't think you'd enjoy using a CPU with none. And that's before we get to the part where (I think) you'd need to re-architect the core fetching/scheduling logic. But I feel that there's a common issue with all suggestions that core types should be blended (in any combination). AMD's stated reason for doing things the way they have with their current chiplet designs is that decoupling the compute cores from the ancillary functions of the CPU reduces the unit size for lithography purposes -- you're now just fabbing repeating tiles of core/cache which can be sliced up for maximum yield. As soon as you start blending non-compute functionality back into compute dies, or blending core types on a die, or blending compute into the IO die, you've undone that advantage. If you're gonna go big.LITTLE and use chiplets, it makes the most sense to fab the little cores on their own wafers, for even higher yield. But IANAACPCE (I Am Not An AMD Capacity Planner/Computer Engineer) and plans do change, so this is all just guesswork based on AMD's previous statements.
|
# ? May 9, 2021 17:37 |
|
mdxi posted:Cache and cache architectures are overwhelmingly important to the performance of modern CPUs. You don't dedicate 60%+ of your silicon to something if it ain't worth having. IIRC Intel's Broadwell desktop parts from 2015 were still competitive with 10th gen Comet Lake not just because they had 6 MB of L3 cache, they even still had 128 MB of L4 cache
|
# ? May 9, 2021 17:56 |
|
My understanding was that for Zen the large L3's were there to help make up for deficiencies with their memory controller + the small added latency from the IOD + mitigate latency from moving things over the IF bus. All of that is important for performance of course but as a low power/low performance CPU would any of that be a priority? Particularly if 2 of those 3 issues could be eliminated by moving the little CPU cores to the IOD itself? Yeah more cache is going to be better but I don't see what makes the L3 so much more worth it vs say more L1 or L2 which I would assume would be more valuable performance-wise even if you were much more limited in how much you could cram in vs L3. I know Intel has a L3 with Tremont which is their 10nm low power chip....but its a much smaller one (4MB) and its shared across all 4 cores and it seems more relevant for its use in a SoC (to help coordinate things with the iGPU and chipset) rather than for straight CPU performance alone. Anyways, I'm not a chip designer either, but going by that example at worst if a L3 is really necessary to get reasonable performance with the little CPU cores it appears to be to a significantly lesser degree than with Zen so they wouldn't necessarily be stuck with blowing over half the die space on cache for low power/performance use. PC LOAD LETTER fucked around with this message at 18:06 on May 9, 2021 |
# ? May 9, 2021 18:03 |
|
Since we're just making poo poo up, why not replace one of the 8 BIG cores on a CCX with 4 small cores, and give them the exact same amount of cache to share?
|
# ? May 9, 2021 18:14 |
|
gradenko_2000 posted:IIRC Intel's Broadwell desktop parts from 2015 were still competitive with 10th gen Comet Lake not just because they had 6 MB of L3 cache, they even still had 128 MB of L4 cache I feel like this meme has been perpetuated by Anandtech running their benches on JEDEC memory. LRADIKAL posted:Since we're just making poo poo up, why not replace one of the 8 BIG cores on a CCX with 4 small cores, and give them the exact same amount of cache to share? The IO die pulls ~15W to drive the IF links which is probably more than you'd reasonably save using the small cores.
|
# ? May 9, 2021 20:34 |
|
For comparison, dual channel DDR4-3200 averages a ~51 GB/s transfer rate and ~60 ns latency. Zen 3's L3 cache, the largest and slowest of them, has a 600 GB/s transfer rate. I can't remember what L2 speeds are like but I know they're north of 2 TB/s, and L1 read is something like 4 TB/s with sub-nanosecond latency. Cache is *really* goddamn important.
|
# ? May 9, 2021 20:44 |
|
Arzachel posted:The IO die pulls ~15W to drive the IF links which is probably more than you'd reasonably save using the small cores. yeah, ryzen master tells me my zen2 idles at sub-1W or runs a bunch of firefox tabs or a game at sub-10W, but then the SOC drain itself just sits there at 15+ baseline constantly. there's a ton of savings to be made there, and next to no savings in the CPUs themselves. it's the price for having 1600+mhz ram, a dozen of pcie4 devices, etc, though.
|
# ? May 9, 2021 21:01 |
|
Arzachel posted:I feel like this meme has been perpetuated by Anandtech running their benches on JEDEC memory. As a former Broadwell and current coffee lake owner I agree it wasn't a particularly revolutionary turn, especially with it being a victim cache
|
# ? May 9, 2021 22:29 |
|
SourKraut posted:200mm fans are still not ideal though. I've had a silverstone 90° rotated case with 3 180mm fans on bottom and one 120mm on top for 8 years and my temps are better than anyone I've ever talked to. Almost nothing makes it through the dust shields, I clean inside like every 3 years.
|
# ? May 10, 2021 00:27 |
|
Quaint Quail Quilt posted:How so? What's all this about negative pressure? 200mm fans are great for quietly moving a good amount of air at a low static pressure. So if your case setup supports the hardware configuration you want and you're happy with temp and noise, then great! In a lot of situations though, their low static pressure will end up hurting someone's use case, such as if they have a radiator on the fan mount, the type of dust filters being used, etc. The 180/200 mm opening size is also less favorable for noise attenuation, so depending on the GPU and other components in the case, and where you place the case, you may end up hearing quite a bit more than if it were a 120/140 mm fan. Also, a lot of the cases I've seen that support 180 or 200mm fans, usually supported two 120 or 140mm in the same spot. Two 120mm fans won't give you the same level of airflow vs noise performance, but two 140mm fans will probably exceed a single 180/200mm fan as long as you get a PWM fan and don't mind spending some time to adjust the fan curve profile. It sounds like your case is just about the perfect setup for using 180/200mm fans though, if it truly supports three of them, because that's also the issue with most cases that do support them: It's usually just one spot within the case that can fit the fan size, so then you're stuck with either trying to use it as the sole discharge fan that is quiet and will move a lot of air and doesn't need a filter, but may not be in the best location in terms of airflow/thermodynamics, or otherwise using it as intake but losing airflow performance once you put a filter in front of it, and probably still needing at least one more fan somewhere to help maintain positive pressure.
|
# ? May 10, 2021 15:07 |
|
SourKraut posted:200mm fans are great for quietly moving a good amount of air at a low static pressure. So if your case setup supports the hardware configuration you want and you're happy with temp and noise, then great! I'm guessing he's talking about the.. RV05? FT05 is also 180s with rotated layout but it's 2 instead of 3. Either way I believe both cases make extremely effective use of the 180mm fans. They do bottom -> top airflow, probably with low impedance nylon filters (afaik silverstone filters are very good), and not much in the airflow path between the fans and the CPU/gpu. I believe both cases test very well for both acoustic efficiency and absolute cooling, even by more modern standards.
|
# ? May 10, 2021 15:33 |
|
Yeah, the RV05 and FT02 use that layout. They are not as good for GPU cooling as some modern cases but still the best or among the best CPU cooling, iirc. I'm using an FT02 right now but am wanting to switch to a O11-Mini once I can actually get GPUs again.
|
# ? May 10, 2021 16:30 |
|
The FT-02 is still probably my favorite case of all time, I REALLY wish silverstone had kept refreshing that style.
|
# ? May 10, 2021 17:32 |
|
man, this is a bummer of a comparison. https://www.asus.com/us/Displays-Desktops/Mini-PCs/All-series/Mini-PC-PN50/ https://www.asrock.com/nettop/AMD/DeskMini%20X300%20Series/index.asp#Overview the PN50 is a much nicer device overall (DP 1.4 support, USB 3.2 10gbps with DP 1.4 alt-mode, etc) but it's fundamentally a NUC-style device with a laptop processor and the limited boost behavior that entails. Also, Asus doesn't seem to be pushing to put Zen3 in it, the PN51 refresh only uses Lucienne (5700U/5500U/5300U) which is Zen2 again. The Deskmini X300 is better as a mini-PC and you could put a 5700G in it (assuming they update BIOS), with an actual noctua cooler, but it's only DP 1.2 and the IO kinda sucks in comparison. I have a DP 1.4 monitor and while I know I'm not gonna get super great fps in modern titles there are probably lightweight titles/older titles where I could go higher than DP 1.2 supports. And Zen3 would be preferable to Zen2, as would the quieter noctua cooling and the better boost behavior. One has the performance, the other has the IO to actually get it to the monitor. Not really sure I want to go all the way up to a full mITX board in a Mini Box M350 or something, but I guess that's the other option.
|
# ? May 15, 2021 03:53 |
|
https://www.anandtech.com/show/16677/amd-and-globalfoundries-update-wafer-share-agreement-through-2024quote:In what AMD/GloFo are calling the “A&R Seventh Amendment”, the updated amendment sets wafer purchase targets for 2022, 2023, and 2024. The full details on these targets are not yet available, however according to the 8-K filing, AMD expects to buy approximately $1.6 billion in wafers from GlobalFoundries in the 2022 to 2024 period. I can see continued production of Zen+ parts going well into 2022 what with all the increased demand and GPU shortage, since a Ryzen 2400G or Ryzen 1600AF is still plenty of horsepower for a basic computing or even mid-range gaming, but what would AMD even do with 12nm production well into 2023 and 2024?
|
# ? May 15, 2021 05:27 |
|
gradenko_2000 posted:https://www.anandtech.com/show/16677/amd-and-globalfoundries-update-wafer-share-agreement-through-2024 they likely have long-term support agreements on Zen2/Zen3 (especially Epyc) and those agreements may bind them to specific part revisions without ANY changes to sub-assemblies. 2024 is 5 years from 2019 (Zen2) and 3 years from 2021 (official Milan launch). AMD may be moving the majority of their production to IO dies based on TSMC 6nm soon, but some vendors will not want to re-qualify the parts even with a "shouldn't affect anything" change like swapping the IO die. You may think it behaves exactly the same but it'll be a slightly different microcode with a new quirk, slightly different behavior at thermal extremes, etc. I was reading something written by an automotive engineer who was complaining that they had to change some microcontroller for their vehicle thanks to the recent shortages and they'd done all this work to requalify on the new part and everything looked good, then the actual production samples they got were a slightly different revision and this one had a thermal protection that cut in a little sooner and so when they started making vehicles with them they started having all kinds of problems... so I'd imagine that some vendors write their supply agreements so that NOTHING can change, even if you think it's inconsequential it may not be to some user. also there is likely a long tail of production for the developing world. brazil or something is not going to pay $850 for the latest whiz-bang 5950X, but a $80 1600AF is right up their alley. tbh it's a little mystifying that they ever stopped production at all, especially after the shortages hit they should really have cranked it back up because right now they have garbage in that price range, it's like 200GE and 3000G and crap like that. the PS3 and XB360 had incredibly long production tails for exactly that reason. PS3 production only stopped in 2017. Paul MaudDib fucked around with this message at 05:44 on May 15, 2021 |
# ? May 15, 2021 05:35 |
|
Would they use 12nm for chipsets?
|
# ? May 15, 2021 10:24 |
|
Paul MaudDib posted:man, this is a bummer of a comparison. What resolution and refresh rate is your monitor, anyway? As far as I can see on Wikipedia, Displayport 1.2 already supports up to 17.28 Gbit/s, which should allow for 1080p @ 240 Hz, 1440p @ 165 Hz or 4K @ 75 Hz. https://en.wikipedia.org/wiki/DisplayPort#Resolution_and_refresh_frequency_limits
|
# ? May 15, 2021 14:33 |
|
Bofast posted:What resolution and refresh rate is your monitor, anyway? As far as I can see on Wikipedia, Displayport 1.2 already supports up to 17.28 Gbit/s, which should allow for 1080p @ 240 Hz, 1440p @ 165 Hz or 4K @ 75 Hz. Acer X34GS, 3440x1440 @ 180 hz. This is effectively full utilization of DP 1.4. DP 1.2 limits you to 100 Hz without an overclock. Not sure if you can combine DSC above that (I'd hope?) For the record, there are very lightweight titles where this would be perfectly fine even up to 180 Hz, Team fortress 2 at 3440x1440 never ate more than about 30% of a 1060 3GB for me for example, it was always CPU-bottlenecked,, and Zen3's CPU prowess would do well at that. Paul MaudDib fucked around with this message at 14:43 on May 15, 2021 |
# ? May 15, 2021 14:38 |
|
|
# ? Apr 25, 2024 00:45 |
|
Paul MaudDib posted:Acer X34GS, 3440x1440 @ 180 hz. This is effectively full utilization of DP 1.4. Ah, yeah, then it makes sense. Those high refresh rate ultrawide monitors do use a lot of bandwidth.
|
# ? May 15, 2021 14:54 |