|
What about all this cache, if you're not using the on-die GPU? I've an i7-2600, which always comes with a GPU. In the case of a Haswell version, would the additional cache go to waste?
|
# ? Sep 14, 2012 20:02 |
|
|
# ? Jun 8, 2024 06:05 |
|
Combat Pretzel posted:What about all this cache, if you're not using the on-die GPU? I've an i7-2600, which always comes with a GPU. In the case of a Haswell version, would the additional cache go to waste?
|
# ? Sep 14, 2012 20:21 |
|
I dunno. On Sandy and Ivy, the GPU shares L3 cache, noted on block diagrams as LLC (last-level cache), with the CPU cores via a ring network. But this looks to be an L4 cache (and an optional one, at that), so we can't really tell how it's connected yet.
|
# ? Sep 14, 2012 20:24 |
|
I'd rather use the built-in GPU for physics. In a gamer PC, this'd make more sense than looping your high performance graphics card through it for minor savings that may happen in infrequent idle periods.
|
# ? Sep 14, 2012 21:08 |
|
I'd rather use all the on-die cache as cache for all other tasks and buy the most ridiculous discrete GPU possible, really. (I don't now - 6950, I'm just voicing a viewpoint).
|
# ? Sep 14, 2012 22:19 |
|
The thing is, the current L1/L2/L3 cache on Sandy/Ivy is already near peak efficiency for most tasks. There are vanishingly few client workloads helped by more LLC than 8MB or more RAM bandwidth than dual-channel DDR3-1600. I'm not sure using that cache for everything would make a meaningful difference.
|
# ? Sep 14, 2012 22:21 |
|
Combat Pretzel posted:I'd rather use the built-in GPU for physics. In a gamer PC, this'd make more sense than looping your high performance graphics card through it for minor savings that may happen in infrequent idle periods. Great GPGPU performance a few generations from now would really make the whole idea of "GPGPU" as a separate category weird, at that point we'd need a different name for it, maybe just go back to referring to them as coprocessors or adopt a more generalized language like "APU" in the legitimate sense of the word. Intel, enough foreplay, hop in bed with nVidia in the desktop space and get some GPU-accelerated PhysX going on. Win/win! ignore competing interests elsewhere you're both gigantic
|
# ? Sep 14, 2012 22:22 |
|
Agreed posted:Great GPGPU performance a few generations from now would really make the whole idea of "GPGPU" as a separate category weird, at that point we'd need a different name for it, maybe just go back to referring to them as coprocessors or adopt a more generalized language like "APU" in the legitimate sense of the word. I know you have a PhysX drum to beat, and I have an vv Sorry, I meant "Why would Intel decide to accelerate PhysX?" HalloKitty fucked around with this message at 22:59 on Sep 14, 2012 |
# ? Sep 14, 2012 22:27 |
|
Since Nvidia bought out Aegia, and PhysX is one of the very few differentiating features between GeForces and Radeons, I'd have to call it a fat chance that Nvidia will transition GPU PhysX off CUDA or make it more cross-vendor-friendly. Now, if Havok built an OpenCL-accelerated version of their engine...
|
# ? Sep 14, 2012 22:51 |
|
Havok is Intel, IIRC, so fat chance here, too. God, this is so idiotic. It's time for Microsoft to introduce DirectPhysics.
|
# ? Sep 14, 2012 23:09 |
|
Surely DirectCompute and OpenCL can provide? I guess you mean a physics-specific framework based open one
|
# ? Sep 14, 2012 23:15 |
|
HalloKitty posted:Surely DirectCompute and OpenCL can provide? I guess you mean a physics-specific framework based open one If theres anything the world needs it's more GPU frameworks.
|
# ? Sep 14, 2012 23:25 |
|
Combat Pretzel posted:Havok is Intel, IIRC, so fat chance here, too. Why? Starting with Ivy Bridge, HD Graphics has OpenCL support.
|
# ? Sep 14, 2012 23:52 |
|
Yeah, but the last time Havok was mentioned in the same breath with OpenCL was 2009. Color me jaded, but I'd expect a proprietary implementation on HD Graphics. Because can't have them fools from AMD any concessions. Seriously, if Microsoft gave everyone a big gently caress you by supplying something DirectPhysics with a pretty performant software implementation, that'll also be available on Xbox 4pi, you might get a large share of game developers jumping on it, making the case for all manufacturers to supply their own accelerated/optimized drivers for it (AVX/AVX2 from Intel, whatever AMD has on their CPUs, CUDA from NVidia, SPP from AMD Graphics).
|
# ? Sep 15, 2012 00:03 |
|
I had hoped the tag about them being competitors in other markets would be sufficient to demonstrate that I wasn't serious about nVidia and Intel combining so that there would be CUDA cores on Intel processors, something neither company has the slightest interest in developing nor would mutually profit from, but probably should have gone with a good old . Mea culpa. I am quite serious, though, that I am excited for future GPGPU given how tightly integrated Intel has already made things and how impressive it should be in a few generations. The best of both worlds, a fantastic generalist CPU and a GPU that can be reified as needed for the sorts of calculations that GPU hardware are really, really good at, with absolutely minimal bandwidth limitations helping to reduce processing overheads dramatically.
|
# ? Sep 15, 2012 00:07 |
|
One thing that strikes me about the resource streamer and cache is that it's establishing a pattern of Intel engineering around problems AMD is trying to outright solve. In this case, it's reducing the performance hit from switching between CPU compute and GPU compute. Right now, AMD is all-out working on a Heterogeneous Systems Architecture, unifying CPU and GPU compute until you've got Windows running on an APU with single memory space addressing and C++ GPGPU integration and the ability to context switch on GPUs for GPGPU multitasking. Intel is just adding a shitload of highly engineered cache to minimize the impact of not having an HSA. Back when multicore was a maturing tech, AMD worked hard to increase core counts per chip. Intel got to a certain number of cores, said, "okay, that's fine," and then ramped up the per-core performance instead. (e.g. 12-core Magny-Cours Opteron vs. 8-core Beckton/Nehalem-EX Xeon or 8-core Bulldozer vs. 4-core Sandy Bridge) AMD painstakingly builds out a highly parallel GPU microarchitecture to serve the enterprise and HPC markets. Intel sticks fifty Pentiums on a chip. (GCN vs. Xeon Phi) I mean, I'm sure Intel is working on the same problems, but they keep getting these great results before they're solved, too.
|
# ? Sep 15, 2012 02:37 |
|
I'm still rocking my Nehalem i7 860, with Crossfire 5850s. I am super pumped for Haswell. I've been using this current machine, completely unchanged, since August 2009. I don't think technology has ever lasted that long for me. So I've had great luck with Intel's "tocks" and can only hope to continue this pattern. Ticks are for suckers!! (Or is it Tocks/Ticks? I always think it's the opposite until I look it up.)
|
# ? Sep 16, 2012 09:18 |
|
Factory Factory posted:AMD painstakingly builds out a highly parallel GPU microarchitecture to serve the enterprise and HPC markets. Intel sticks fifty Pentiums on a chip. (GCN vs. Xeon Phi) This, by the way, is probably my favorite "trick" Intel's pulled since reverting to the underlying Pentium 3/Pentium M architecture to develop what would eventually be the Core series. It's just so clever. When they're not being evil with lots of money, it's fun as heck to just watch their engineers seemingly take a really neat, almost playful approach to solving really complex problems. "Guys, guys, guys, listen - you know what's fast? A shitload of 22nm Pentiums, that's what! Let's just have a shitload of 22nm Pentiums! If we stick the Larrabee stuff on 'em, they'll run like bastards! We can do that, right? This is going to be awesome."
|
# ? Sep 16, 2012 10:27 |
|
Factory Factory posted:Bit more from AnandTech on Haswell's cache: That's a shitload of SRAM.
|
# ? Sep 16, 2012 19:53 |
|
printf posted:That's a shitload of SRAM.
|
# ? Sep 16, 2012 20:12 |
|
Alereon posted:DRAM, not SRAM, it's a 128MB DDR3 die connected to the CPU over a 512-bit bus via a silicon interposer (a slice of silicon that both the CPU die and DRAM die are bonded to, that is then bonded to the substrate). It would be way too expensive to fit 128MB of SRAM to a CPU.
|
# ? Sep 16, 2012 23:07 |
|
Alereon posted:DRAM, not SRAM, it's a 128MB DDR3 die connected to the CPU over a 512-bit bus via a silicon interposer (a slice of silicon that both the CPU die and DRAM die are bonded to, that is then bonded to the substrate). It would be way too expensive to fit 128MB of SRAM to a CPU. Aww I was hoping intel found a way to get high density sram working. Maybe they'll use Phase Change memory in the near future? Would be more useful than that digital radio stuff they've been pushing, but not nearly as cool. Always wondered why SRAM is so stupidly overpriced compared to dram (10-100x the price at the least) but only needs 6x times the amount of transistors.
|
# ? Sep 16, 2012 23:15 |
|
I'm not an expert, but I think it's because you have to build it on the CPU die for it to provide a performance benefit, and area on a CPU die is expensive. If you're going to put the SRAM off-die then you might as well just do something clever with DRAM like Intel did and save money. Part of me does wonder why they didn't just jump straight to 1GB of dedicated RAM though. SemiAccurate seems to think that if this works out Intel will just integrate the entire system's DRAM onto the processor package, and in light of the move towards soldered-on RAM in notebooks this doesn't seem so far-fetched. It's almost necessary in their quest to save every possibly milliwatt to get a reasonably performant Haswell system in 10W.Josh Lyman posted:So the CPU packaging will have a DRAM chip sitting next to the actual CPU core? Alereon fucked around with this message at 23:32 on Sep 16, 2012 |
# ? Sep 16, 2012 23:27 |
|
Alereon posted:I'm not an expert, but I think it's because you have to build it on the CPU die for it to provide a performance benefit, and area on a CPU die is expensive. If you're going to put the SRAM off-die then you might as well just do something clever with DRAM like Intel did and save money. Part of me does wonder why they didn't just jump straight to 1GB of dedicated RAM though. SemiAccurate seems to think that if this works out Intel will just integrate the entire system's DRAM onto the processor package, and in light of the move towards soldered-on RAM in notebooks this doesn't seem so far-fetched. It's almost necessary in their quest to save every possibly milliwatt to get a reasonably performant Haswell system in 10W. I fully expect the next-gen (+1 maybe) macbook airs/ultrabooks to have a SoC type thing where the ram--and possibly radio--are integrated on the cpu die. They aren't even removable, so there's no need to keep them off-die.
|
# ? Sep 17, 2012 00:13 |
|
The radio is definitely plausible. Intel demo'd an IP-block digital radio at IDF. The number of analog parts is small enough to stick it on an SoC, and they'd built up a demo chip with a dual-core Atom. The radio is even entirely configurable - the same block can be used for LTE, WiMax, 802.11, etc. as long as the correct software is loaded to it.
|
# ? Sep 17, 2012 00:42 |
|
Nostrum posted:(Or is it Tocks/Ticks? I always think it's the opposite until I look it up.) Tick is die shrink. It helps me remember that the I in 'tick' and a the two Is in 'die shrink'.
|
# ? Sep 18, 2012 18:55 |
|
AnandTech has a whole bunch of nerdwords about Intel's TSX extensions in Haswell. The basic idea is that the inefficiencies of threads that work on common data drop per-core performance by up to about 60% once you account for the overhead and latencies of making sure that race conditions are avoided, i.e. those where two threads want to work on the same data, should properly do so in sequence, and so that order of operations must be enforced. Current multithreading on shared data involved a process called "locking," which prevents other threads from touching data once any one thread has started on it. Locking is usually done very coarsely, so data that doesn't need to be locked often is anyway. It's possible to do fine locking, but this is complex and labor-intensive for developers. TSX extends two new interfaces for dealing with threaded operations on shared data: Restricted Transaction Memory (RTM) and Hardware Lock Elision (HLE). RTM-aware software is not backwards compatible with current stuff, but it basically allows developers to specify thread segments as atomic, i.e. it all executes or it all fails. The data is cached and operations executed in cache. If a change in the original data is detected at any point through the transaction, the execution is halted and thrown out, and then a fallback codepath is taken (usually current-style memory locking). HLE is more immediately useful. It works like RTM in that code executes until and unless a change in the base data is detected, but rather than letting the developer specify and prioritize him-/herself, the CPU handles it and runs everything atomic. Unfortunately, the fail condition is a little worse: a data collision causes the entire sequence to be re-run with traditional locking. But it will be immediately available to all software as long as the base multithreading/locking libraries the software relies on are updated to be HLE-aware. Currently, that includes gcc v4.8, Visual Studio 2012, Intel's own C compiler, and a glibc branch. Essentially, TSX's best case is to allow performance similar to fine locking with only the effort of coarse locking by developers, or to allow a small performance boost on fine-locked software. This is mostly an HPC innovation, since per-core performance drops most on lots-of-core systems, but even desktop quad-core workloads should see healthy gains in per-core performance in highly threaded tasks.
|
# ? Sep 22, 2012 05:07 |
|
Anandtech has posted their Haswell architecture analysis. I haven't read it yet, but I think this means it's time for me to start working on a new thread.
|
# ? Oct 5, 2012 19:59 |
|
Alereon posted:Anandtech has posted their Haswell architecture analysis. I haven't read it yet, but I think this means it's time for me to start working on a new thread. More or less Haswell is about taking advantage of die shrinks and wringing as much power efficiency as possible without compromising performance.
|
# ? Oct 5, 2012 20:41 |
|
Goon Matchmaker posted:More or less Haswell is about taking advantage of die shrinks and wringing as much power efficiency as possible without compromising performance. And further integration. Worked really freaking well with Ivy Bridge, I'll almost certainly be putting together a Haswell-based computer as my next build. Will definitely wait for the chipset to get put through the paces, and possibly the second stepping if that ends up happening. I can wait it out, Sandy Bridge, 16GB of fast-enough RAM and SSDs do not feel slow at all right now and I suspect probably still won't when Haswell first hits the shelves. The only thing I wish I had compared to Ivy Bridge is PCI-e 3.0
|
# ? Oct 5, 2012 21:42 |
|
Ivy Bridge and Haswell are mobile home runs though, which owns.
|
# ? Oct 5, 2012 22:27 |
|
Alereon posted:Anandtech has posted their Haswell architecture analysis. I haven't read it yet, but I think this means it's time for me to start working on a new thread. Maybe spell "Platform" correctly in the next one?
|
# ? Oct 5, 2012 23:29 |
|
Huh... I'm finally getting a chance to peruse the AnandTech article, and one thing just jumped out at me. On page 7, the table detailing the architecture's execution pipeline buffering at the scheduler-stage distinguishes the allocation queue architecture between Nehalem, SNB, and Haswell. Conroe isn't included because I guess they don't have the information. Nehalem and SNB list the Allocation Queue as having 28 whatever-units per thread. Haswell omits "per thread" and just lists 56. Hyperthreading is the only feature that makes sense as causing per-thread resource differences on Nehalem and SNB, since chips shipped with Hyperthreading on or off as a differentiating feature. But that seems to be gone with Haswell. Is Hyperthreading going to become default? A standard Intel architecture feature?
|
# ? Oct 6, 2012 01:52 |
|
Factory Factory posted:Maybe spell "Platform" correctly in the next one? Factory Factory posted:Hyperthreading is the only feature that makes sense as causing per-thread resource differences on Nehalem and SNB, since chips shipped with Hyperthreading on or off as a differentiating feature. But that seems to be gone with Haswell. Is Hyperthreading going to become default? A standard Intel architecture feature?
|
# ? Oct 6, 2012 03:08 |
|
They also said that this happened in Ivy Bridge already.
|
# ? Oct 8, 2012 13:41 |
|
That was a good article, even if I didn't catch all the more technical stuff. I got an i7 920 early on and have been looking for an excuse to upgrade. Since the software I use keeps updating to make more use of GPU I haven't felt the need to update, but Haswell looks to be worth it if I can hold off until next Summer.
|
# ? Oct 8, 2012 22:05 |
|
Searching on google about the roadmap on the high-end side of the CPUs, I stumbled over this article: http://www.brightsideofnews.com/news/2012/9/11/intel-desktop-roadmap-i7-3970k-coming-in-q42c-i7-4900-in-q3-2013.aspx Summary: i7-3970K in Q4 2012 (will prices go down for the existing ones?) Ivy Bridge-E in Q3 2013 Still on LGA 2011 socket and on X79 chipset. So, in theory, if one buys a 2011 MB today, they'll be able to upgrade next year around this time to the new CPU if it proves to be a major power boost. While Haswell doesn't look to be able to hold a candle to the extreme versions .
|
# ? Oct 22, 2012 08:03 |
|
Still rolling on my old Yorkfield and other hardware just as old, but just got my complimentary IVB i7-3770k from work. Pairing this up with a DZ77RE-75K and 240gb 520 SSD.
|
# ? Oct 22, 2012 22:30 |
|
rhag posted:So, in theory, if one buys a 2011 MB today, they'll be able to upgrade next year around this time to the new CPU if it proves to be a major power boost. While Haswell doesn't look to be able to hold a candle to the extreme versions . I'm pretty sure that all current X79 motherboards will be IVB-E capable. I don't expect prices to budge hugely on current LGA2011 i7s, nor would I expect IVB-E to really be that much faster, no more than IVB is faster than SNB. What are you doing that you're considering an LGA2011 setup, anyway? But regarding Haswell not being able to hold a candle... Remember how when the i7-2600K came out, it matched or beat the i7-980X in a majority of benchmarks? That hasn't stopped being a thing that can happen.
|
# ? Oct 22, 2012 22:57 |
|
|
# ? Jun 8, 2024 06:05 |
|
Factory Factory posted:But regarding Haswell not being able to hold a candle... Remember how when the i7-2600K came out, it matched or beat the i7-980X in a majority of benchmarks? That hasn't stopped being a thing that can happen. Not to mention, comparing one year's $1,000 chip to next year's $300 chip is a pretty harsh baseline
|
# ? Oct 23, 2012 02:13 |