Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
What about all this cache, if you're not using the on-die GPU? I've an i7-2600, which always comes with a GPU. In the case of a Haswell version, would the additional cache go to waste?

Adbot
ADBOT LOVES YOU

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice

Combat Pretzel posted:

What about all this cache, if you're not using the on-die GPU? I've an i7-2600, which always comes with a GPU. In the case of a Haswell version, would the additional cache go to waste?
I believe it does if you're not using the IGP, but chances are you'll be using the IGP a lot more on your next build. I would be really surprised if by the time Haswell comes out we don't have a seamless implementation of Lucid Virtu or something like it, allowing the IGP to drive your monitors while your videocard sits in long idle mode only powering up for strenuous graphics tasks. Haswell seems like it will have enough performance to drive web browsing, web gaming, and video playback/encode/transcode without the need to power up the card at all.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
I dunno. On Sandy and Ivy, the GPU shares L3 cache, noted on block diagrams as LLC (last-level cache), with the CPU cores via a ring network. But this looks to be an L4 cache (and an optional one, at that), so we can't really tell how it's connected yet.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
I'd rather use the built-in GPU for physics. In a gamer PC, this'd make more sense than looping your high performance graphics card through it for minor savings that may happen in infrequent idle periods.

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast
I'd rather use all the on-die cache as cache for all other tasks and buy the most ridiculous discrete GPU possible, really. (I don't now - 6950, I'm just voicing a viewpoint).

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
The thing is, the current L1/L2/L3 cache on Sandy/Ivy is already near peak efficiency for most tasks. There are vanishingly few client workloads helped by more LLC than 8MB or more RAM bandwidth than dual-channel DDR3-1600. I'm not sure using that cache for everything would make a meaningful difference.

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

Combat Pretzel posted:

I'd rather use the built-in GPU for physics. In a gamer PC, this'd make more sense than looping your high performance graphics card through it for minor savings that may happen in infrequent idle periods.

Great GPGPU performance a few generations from now would really make the whole idea of "GPGPU" as a separate category weird, at that point we'd need a different name for it, maybe just go back to referring to them as coprocessors or adopt a more generalized language like "APU" in the legitimate sense of the word.

Intel, enough foreplay, hop in bed with nVidia in the desktop space and get some GPU-accelerated PhysX going on. Win/win! ignore competing interests elsewhere you're both gigantic

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast

Agreed posted:

Great GPGPU performance a few generations from now would really make the whole idea of "GPGPU" as a separate category weird, at that point we'd need a different name for it, maybe just go back to referring to them as coprocessors or adopt a more generalized language like "APU" in the legitimate sense of the word.

Intel, enough foreplay, hop in bed with nVidia in the desktop space and get some GPU-accelerated PhysX going on. Win/win! ignore competing interests elsewhere you're both gigantic

I know you have a PhysX drum to beat, and I have an ATI AMD card (hey, 6950 unlocked to 6970, optimum value), but really, surely it would be better to have a common API accelerated, and not a vendor-specific one? Especially since the PhysX game list is tiny.

vv Sorry, I meant "Why would Intel decide to accelerate PhysX?"

HalloKitty fucked around with this message at 22:59 on Sep 14, 2012

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
Since Nvidia bought out Aegia, and PhysX is one of the very few differentiating features between GeForces and Radeons, I'd have to call it a fat chance that Nvidia will transition GPU PhysX off CUDA or make it more cross-vendor-friendly.

Now, if Havok built an OpenCL-accelerated version of their engine...

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Havok is Intel, IIRC, so fat chance here, too.

God, this is so idiotic. It's time for Microsoft to introduce DirectPhysics.

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast
Surely DirectCompute and OpenCL can provide? I guess you mean a physics-specific framework based open one

hobbesmaster
Jan 28, 2008

HalloKitty posted:

Surely DirectCompute and OpenCL can provide? I guess you mean a physics-specific framework based open one

If theres anything the world needs it's more GPU frameworks.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.

Combat Pretzel posted:

Havok is Intel, IIRC, so fat chance here, too.

Why? Starting with Ivy Bridge, HD Graphics has OpenCL support.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Yeah, but the last time Havok was mentioned in the same breath with OpenCL was 2009. Color me jaded, but I'd expect a proprietary implementation on HD Graphics. Because can't have them fools from AMD any concessions.

Seriously, if Microsoft gave everyone a big gently caress you by supplying something DirectPhysics with a pretty performant software implementation, that'll also be available on Xbox 4pi, you might get a large share of game developers jumping on it, making the case for all manufacturers to supply their own accelerated/optimized drivers for it (AVX/AVX2 from Intel, whatever AMD has on their CPUs, CUDA from NVidia, SPP from AMD Graphics).

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

I had hoped the tag about them being competitors in other markets would be sufficient to demonstrate that I wasn't serious about nVidia and Intel combining so that there would be CUDA cores on Intel processors, something neither company has the slightest interest in developing nor would mutually profit from, but probably should have gone with a good old :haw:. Mea culpa.

I am quite serious, though, that I am excited for future GPGPU given how tightly integrated Intel has already made things and how impressive it should be in a few generations. The best of both worlds, a fantastic generalist CPU and a GPU that can be reified as needed for the sorts of calculations that GPU hardware are really, really good at, with absolutely minimal bandwidth limitations helping to reduce processing overheads dramatically.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
One thing that strikes me about the resource streamer and cache is that it's establishing a pattern of Intel engineering around problems AMD is trying to outright solve. In this case, it's reducing the performance hit from switching between CPU compute and GPU compute.

Right now, AMD is all-out working on a Heterogeneous Systems Architecture, unifying CPU and GPU compute until you've got Windows running on an APU with single memory space addressing and C++ GPGPU integration and the ability to context switch on GPUs for GPGPU multitasking. Intel is just adding a shitload of highly engineered cache to minimize the impact of not having an HSA.

Back when multicore was a maturing tech, AMD worked hard to increase core counts per chip. Intel got to a certain number of cores, said, "okay, that's fine," and then ramped up the per-core performance instead. (e.g. 12-core Magny-Cours Opteron vs. 8-core Beckton/Nehalem-EX Xeon or 8-core Bulldozer vs. 4-core Sandy Bridge)

AMD painstakingly builds out a highly parallel GPU microarchitecture to serve the enterprise and HPC markets. Intel sticks fifty Pentiums on a chip. (GCN vs. Xeon Phi)

I mean, I'm sure Intel is working on the same problems, but they keep getting these great results before they're solved, too.

forbidden dialectics
Jul 26, 2005





I'm still rocking my Nehalem i7 860, with Crossfire 5850s. I am super pumped for Haswell. I've been using this current machine, completely unchanged, since August 2009. I don't think technology has ever lasted that long for me. So I've had great luck with Intel's "tocks" and can only hope to continue this pattern. Ticks are for suckers!!

(Or is it Tocks/Ticks? I always think it's the opposite until I look it up.)

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

Factory Factory posted:

AMD painstakingly builds out a highly parallel GPU microarchitecture to serve the enterprise and HPC markets. Intel sticks fifty Pentiums on a chip. (GCN vs. Xeon Phi)

This, by the way, is probably my favorite "trick" Intel's pulled since reverting to the underlying Pentium 3/Pentium M architecture to develop what would eventually be the Core series. It's just so clever. When they're not being evil with lots of money, it's fun as heck to just watch their engineers seemingly take a really neat, almost playful approach to solving really complex problems.

"Guys, guys, guys, listen - you know what's fast? A shitload of 22nm Pentiums, that's what! Let's just have a shitload of 22nm Pentiums! If we stick the Larrabee stuff on 'em, they'll run like bastards! We can do that, right? This is going to be awesome." :allears:

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Factory Factory posted:

Bit more from AnandTech on Haswell's cache:

The GPU will get up to 128 MB of dedicated cache. That's a lotta cache on a tiny chip.

E: Are you guys not excited? Because I'm loving PUMPED!!! :newlol:

That's a shitload of SRAM.

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice

printf posted:

That's a shitload of SRAM.
DRAM, not SRAM, it's a 128MB DDR3 die connected to the CPU over a 512-bit bus via a silicon interposer (a slice of silicon that both the CPU die and DRAM die are bonded to, that is then bonded to the substrate). It would be way too expensive to fit 128MB of SRAM to a CPU.

Josh Lyman
May 24, 2009


Alereon posted:

DRAM, not SRAM, it's a 128MB DDR3 die connected to the CPU over a 512-bit bus via a silicon interposer (a slice of silicon that both the CPU die and DRAM die are bonded to, that is then bonded to the substrate). It would be way too expensive to fit 128MB of SRAM to a CPU.
So the CPU packaging will have a DRAM chip sitting next to the actual CPU core?

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Alereon posted:

DRAM, not SRAM, it's a 128MB DDR3 die connected to the CPU over a 512-bit bus via a silicon interposer (a slice of silicon that both the CPU die and DRAM die are bonded to, that is then bonded to the substrate). It would be way too expensive to fit 128MB of SRAM to a CPU.

Aww I was hoping intel found a way to get high density sram working.

Maybe they'll use Phase Change memory in the near future? Would be more useful than that digital radio stuff they've been pushing, but not nearly as cool.

Always wondered why SRAM is so stupidly overpriced compared to dram (10-100x the price at the least) but only needs 6x times the amount of transistors.

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice
I'm not an expert, but I think it's because you have to build it on the CPU die for it to provide a performance benefit, and area on a CPU die is expensive. If you're going to put the SRAM off-die then you might as well just do something clever with DRAM like Intel did and save money. Part of me does wonder why they didn't just jump straight to 1GB of dedicated RAM though. SemiAccurate seems to think that if this works out Intel will just integrate the entire system's DRAM onto the processor package, and in light of the move towards soldered-on RAM in notebooks this doesn't seem so far-fetched. It's almost necessary in their quest to save every possibly milliwatt to get a reasonably performant Haswell system in 10W.

Josh Lyman posted:

So the CPU packaging will have a DRAM chip sitting next to the actual CPU core?
I don't know what it will physically look like (and Google Image Search isn't helping), but it's not going to be a DRAM chip like on your memory module. It will be a naked die (under the heat spreader and sealant), probably right next to the CPU core, and both of them will be sitting on top of a thin silicon sheet (the interposer).

Alereon fucked around with this message at 23:32 on Sep 16, 2012

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Alereon posted:

I'm not an expert, but I think it's because you have to build it on the CPU die for it to provide a performance benefit, and area on a CPU die is expensive. If you're going to put the SRAM off-die then you might as well just do something clever with DRAM like Intel did and save money. Part of me does wonder why they didn't just jump straight to 1GB of dedicated RAM though. SemiAccurate seems to think that if this works out Intel will just integrate the entire system's DRAM onto the processor package, and in light of the move towards soldered-on RAM in notebooks this doesn't seem so far-fetched. It's almost necessary in their quest to save every possibly milliwatt to get a reasonably performant Haswell system in 10W.
I don't know what it will physically look like (and Google Image Search isn't helping), but it's not going to be a DRAM chip like on your memory module. It will be a naked die (under the heat spreader and sealant), probably right next to the CPU core, and both of them will be sitting on top of a thin silicon sheet (the interposer).

I fully expect the next-gen (+1 maybe) macbook airs/ultrabooks to have a SoC type thing where the ram--and possibly radio--are integrated on the cpu die. They aren't even removable, so there's no need to keep them off-die.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
The radio is definitely plausible. Intel demo'd an IP-block digital radio at IDF. The number of analog parts is small enough to stick it on an SoC, and they'd built up a demo chip with a dual-core Atom. The radio is even entirely configurable - the same block can be used for LTE, WiMax, 802.11, etc. as long as the correct software is loaded to it.

canyoneer
Sep 13, 2005


I only have canyoneyes for you

Nostrum posted:

(Or is it Tocks/Ticks? I always think it's the opposite until I look it up.)

Tick is die shrink. It helps me remember that the I in 'tick' and a the two Is in 'die shrink'.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
AnandTech has a whole bunch of nerdwords about Intel's TSX extensions in Haswell.

The basic idea is that the inefficiencies of threads that work on common data drop per-core performance by up to about 60% once you account for the overhead and latencies of making sure that race conditions are avoided, i.e. those where two threads want to work on the same data, should properly do so in sequence, and so that order of operations must be enforced.

Current multithreading on shared data involved a process called "locking," which prevents other threads from touching data once any one thread has started on it. Locking is usually done very coarsely, so data that doesn't need to be locked often is anyway. It's possible to do fine locking, but this is complex and labor-intensive for developers.

TSX extends two new interfaces for dealing with threaded operations on shared data: Restricted Transaction Memory (RTM) and Hardware Lock Elision (HLE).

RTM-aware software is not backwards compatible with current stuff, but it basically allows developers to specify thread segments as atomic, i.e. it all executes or it all fails. The data is cached and operations executed in cache. If a change in the original data is detected at any point through the transaction, the execution is halted and thrown out, and then a fallback codepath is taken (usually current-style memory locking).

HLE is more immediately useful. It works like RTM in that code executes until and unless a change in the base data is detected, but rather than letting the developer specify and prioritize him-/herself, the CPU handles it and runs everything atomic. Unfortunately, the fail condition is a little worse: a data collision causes the entire sequence to be re-run with traditional locking. But it will be immediately available to all software as long as the base multithreading/locking libraries the software relies on are updated to be HLE-aware. Currently, that includes gcc v4.8, Visual Studio 2012, Intel's own C compiler, and a glibc branch.

Essentially, TSX's best case is to allow performance similar to fine locking with only the effort of coarse locking by developers, or to allow a small performance boost on fine-locked software. This is mostly an HPC innovation, since per-core performance drops most on lots-of-core systems, but even desktop quad-core workloads should see healthy gains in per-core performance in highly threaded tasks.

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice
Anandtech has posted their Haswell architecture analysis. I haven't read it yet, but I think this means it's time for me to start working on a new thread.

Goon Matchmaker
Oct 23, 2003

I play too much EVE-Online

Alereon posted:

Anandtech has posted their Haswell architecture analysis. I haven't read it yet, but I think this means it's time for me to start working on a new thread.

More or less Haswell is about taking advantage of die shrinks and wringing as much power efficiency as possible without compromising performance.

Agreed
Dec 30, 2003

The price of meat has just gone up, and your old lady has just gone down

Goon Matchmaker posted:

More or less Haswell is about taking advantage of die shrinks and wringing as much power efficiency as possible without compromising performance.

And further integration. Worked really freaking well with Ivy Bridge, I'll almost certainly be putting together a Haswell-based computer as my next build. Will definitely wait for the chipset to get put through the paces, and possibly the second stepping if that ends up happening. I can wait it out, Sandy Bridge, 16GB of fast-enough RAM and SSDs do not feel slow at all right now and I suspect probably still won't when Haswell first hits the shelves. The only thing I wish I had compared to Ivy Bridge is PCI-e 3.0 :effort:

Proud Christian Mom
Dec 20, 2006
READING COMPREHENSION IS HARD
Ivy Bridge and Haswell are mobile home runs though, which owns.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.

Alereon posted:

Anandtech has posted their Haswell architecture analysis. I haven't read it yet, but I think this means it's time for me to start working on a new thread.

Maybe spell "Platform" correctly in the next one? :v:

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
Huh... I'm finally getting a chance to peruse the AnandTech article, and one thing just jumped out at me. On page 7, the table detailing the architecture's execution pipeline buffering at the scheduler-stage distinguishes the allocation queue architecture between Nehalem, SNB, and Haswell. Conroe isn't included because I guess they don't have the information. Nehalem and SNB list the Allocation Queue as having 28 whatever-units per thread. Haswell omits "per thread" and just lists 56.

Hyperthreading is the only feature that makes sense as causing per-thread resource differences on Nehalem and SNB, since chips shipped with Hyperthreading on or off as a differentiating feature. But that seems to be gone with Haswell. Is Hyperthreading going to become default? A standard Intel architecture feature?

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice

Factory Factory posted:

Maybe spell "Platform" correctly in the next one? :v:
Fixed, thanks.

Factory Factory posted:

Hyperthreading is the only feature that makes sense as causing per-thread resource differences on Nehalem and SNB, since chips shipped with Hyperthreading on or off as a differentiating feature. But that seems to be gone with Haswell. Is Hyperthreading going to become default? A standard Intel architecture feature?
Hyperthreading already is a "standard" feature, present on all cores, just disabled on some for segmentation reasons. I don't think they'll change that. Moving it to 56 shared between both threads improves performance in cases where one thread is using resources more heavily.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
They also said that this happened in Ivy Bridge already.

davebo
Nov 15, 2006

Parallel lines do meet, but they do it incognito
College Slice
That was a good article, even if I didn't catch all the more technical stuff. I got an i7 920 early on and have been looking for an excuse to upgrade. Since the software I use keeps updating to make more use of GPU I haven't felt the need to update, but Haswell looks to be worth it if I can hold off until next Summer.

Volguus
Mar 3, 2009
Searching on google about the roadmap on the high-end side of the CPUs, I stumbled over this article:
http://www.brightsideofnews.com/news/2012/9/11/intel-desktop-roadmap-i7-3970k-coming-in-q42c-i7-4900-in-q3-2013.aspx

Summary: i7-3970K in Q4 2012 (will prices go down for the existing ones?)
Ivy Bridge-E in Q3 2013
Still on LGA 2011 socket and on X79 chipset.

So, in theory, if one buys a 2011 MB today, they'll be able to upgrade next year around this time to the new CPU if it proves to be a major power boost. While Haswell doesn't look to be able to hold a candle to the extreme versions :(.

Henrik Zetterberg
Dec 7, 2007

Still rolling on my old Yorkfield and other hardware just as old, but just got my complimentary IVB i7-3770k from work. Pairing this up with a DZ77RE-75K and 240gb 520 SSD.

:getin:

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.

rhag posted:

So, in theory, if one buys a 2011 MB today, they'll be able to upgrade next year around this time to the new CPU if it proves to be a major power boost. While Haswell doesn't look to be able to hold a candle to the extreme versions :(.

I'm pretty sure that all current X79 motherboards will be IVB-E capable. I don't expect prices to budge hugely on current LGA2011 i7s, nor would I expect IVB-E to really be that much faster, no more than IVB is faster than SNB. What are you doing that you're considering an LGA2011 setup, anyway?

But regarding Haswell not being able to hold a candle... Remember how when the i7-2600K came out, it matched or beat the i7-980X in a majority of benchmarks? That hasn't stopped being a thing that can happen.

Adbot
ADBOT LOVES YOU

canyoneer
Sep 13, 2005


I only have canyoneyes for you

Factory Factory posted:

But regarding Haswell not being able to hold a candle... Remember how when the i7-2600K came out, it matched or beat the i7-980X in a majority of benchmarks? That hasn't stopped being a thing that can happen.

Not to mention, comparing one year's $1,000 chip to next year's $300 chip is a pretty harsh baseline

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply