Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Should I be impressed by these numbers? I can't honestly tell. Actually, it seems more like they're bottlenecking things at the graphics card.

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Combat Pretzel posted:

Should I be impressed by these numbers? I can't honestly tell. Actually, it seems more like they're bottlenecking things at the graphics card.

I don't really trust anything from WCCFTech, especially this far out. I'd expect 5-10% better IPC, lower power usage, nice new chipset features, and better integrated graphics.

10% better IPC is better than we've seen from anything since Sandy Bridge over Nehalem, so that would be pretty big.

Ninja Rope
Oct 22, 2005

Wee.

Twerk from Home posted:

10% better IPC is better than we've seen from anything since Sandy Bridge over Nehalem, so that would be pretty big.

What changes in chip technology would lead to this?

EoRaptor
Sep 13, 2003

by Fluffdaddy

Ninja Rope posted:

What changes in chip technology would lead to this?

It's a "14nm" design with FinFET. It should have a huge amount of space on die for making IPC improvements possible, though I doubt any individual component contributes more than a percent or two.

Add it all up though:
Cache design
Memory controller
integer unit design / count
pipeline design / depth
vector unit design / count

You only need a tiny bit everywhere to make a big overall difference. This has been intels overall strategy for a number of years, even when they claim a brand new design, it's often reshuffling already extant compute units with maybe one or two section sporting something new.


I'm just hoping the K variants aren't crippled by some horrible marketing decision and get the full features set (virtualization, transactional memory, etc)

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Skylake's finally getting TSX and hardware lock elision, right? I suppose latter should make some sort of a positive difference in multithreaded computing apps.

PC LOAD LETTER
May 23, 2005
WTF?!
Has to be coded for to work though? Probably won't make a difference for more than a couple years if so.

That is why I was irritated by the TSX/HLE bugs in current CPU's that forced them to disable it. Its takes years for these new CPU features to get widespread support and use and they set things back quite a bit by screwing the pooch there.

If those leaked Skylake benches are correct than LOL still sticking with my 2600K for a while longer I guess.

edit:\/\/\//\/\/ Supposed to be but that ranks right up there with, 'All they need to do is recompile and this new feature should work!!' which almost never seems to happen.

PC LOAD LETTER fucked around with this message at 23:41 on Apr 26, 2015

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
I thought you could retrofit lock elision into existing apps via the system's locking primitives (--edit: or threading libraries, depending on your platform)? Obviously not as effective as specifically making direct use of the relevant instructions, but it should result in some difference?

PC LOAD LETTER posted:

If those leaked Skylake benches are correct than LOL still sticking with my 2600K for a while longer I guess.
Same for the non-K version.

Combat Pretzel fucked around with this message at 23:15 on Apr 26, 2015

mayodreams
Jul 4, 2003


Hello darkness,
my old friend
I am still on a 2500k and quite happy with it, but I think I might build a new system with a Skylake just due to a much updated platform from my Z68. Things like native USB3 and M2 support would be really nice for a new build, and most boards have much better integrated audio now too, which would also remove my aging and currently unsupported Asus PCI sound card from the equation.

Winks
Feb 16, 2009

Alright, who let Rube Goldberg in here?

mayodreams posted:

I am still on a 2500k and quite happy with it, but I think I might build a new system with a Skylake just due to a much updated platform from my Z68. Things like native USB3 and M2 support would be really nice for a new build, and most boards have much better integrated audio now too, which would also remove my aging and currently unsupported Asus PCI sound card from the equation.

I'm in the exact same boat. A USB3 controller that doesn't suck would be pretty awesome.

vv Not gonna mind increased performance either and replacing the C2D in another PC with my Sandy. Disappointing 3.1 isn't going to be in the first chipsets.

Winks fucked around with this message at 02:01 on Apr 27, 2015

PC LOAD LETTER
May 23, 2005
WTF?!
Thats understandable but you could just swap to a Z77 chipset mobo and get good USB3 for cheaper and keep your current CPU and RAM. It'd take something like integrated USB3.1 support to make me think of spending the extra cash for the CPU + DDR4.

Might just get a add in card instead really.

Sidesaddle Cavalry
Mar 15, 2013

Oh Boy Desert Map

PC LOAD LETTER posted:

Thats understandable but you could just swap to a Z77 chipset mobo and get good USB3 for cheaper and keep your current CPU and RAM. It'd take something like integrated USB3.1 support to make me think of spending the extra cash for the CPU + DDR4.

Might just get a add in card instead really.
I'm glad Skylake's going to support DDR3L memory. That's almost a cool couple hundred bucks off of the upgrade for me considering how crazy DDR4 prices are, still.

Lowen SoDium
Jun 5, 2003

Highen Fiber
Clapping Larry

Sidesaddle Cavalry posted:

I'm glad Skylake's going to support DDR3L memory. That's almost a cool couple hundred bucks off of the upgrade for me considering how crazy DDR4 prices are, still.

Hopefully, DDR4 will have a price drop with in a couple months after Skylake chips are shipping in volume.

LiquidRain
May 21, 2007

Watch the madness!

Sidesaddle Cavalry posted:

I'm glad Skylake's going to support DDR3L memory. That's almost a cool couple hundred bucks off of the upgrade for me considering how crazy DDR4 prices are, still.
If you see desktop boards with DDR3L support. I imagine you'll only see DDR4. DDR3L is likely there for lower-cost convertible tablets or some such until DDR4 reaches price parity.

EoRaptor
Sep 13, 2003

by Fluffdaddy

Combat Pretzel posted:

I thought you could retrofit lock elision into existing apps via the system's locking primitives (--edit: or threading libraries, depending on your platform)? Obviously not as effective as specifically making direct use of the relevant instructions, but it should result in some difference?

Same for the non-K version.
There are two TSX modes. One re-uses an existing x86 instruction pair that is not normally associated with the memory lock command. Older CPU's will ignore them, newer CPU will use HLE to try a basic transaction on the memory area. You need to recompile with a TSX aware compiler and library, but don't need to change your code at all.

The other is a new set of instructions that lets you finely control a transaction attempt, and catch the fallout of the success/failure. You need to change your code to handle this new method, as well as needing compiler and library support.

LiquidRain posted:

If you see desktop boards with DDR3L support. I imagine you'll only see DDR4. DDR3L is likely there for lower-cost convertible tablets or some such until DDR4 reaches price parity.

SkyLake supports DDR3L and DDR4, and a motherboard can offer support for both via the UniDIMM standard, which is a modified SO-DIMM spec (same pin count, new notch location) that allows for DDR3L or DDR4. You cannot mix and match DDR3/DDR4 in the same system, but you can switch at any time. It's not clear how well supported UniDIMM will be.

mayodreams
Jul 4, 2003


Hello darkness,
my old friend

PC LOAD LETTER posted:

Thats understandable but you could just swap to a Z77 chipset mobo and get good USB3 for cheaper and keep your current CPU and RAM. It'd take something like integrated USB3.1 support to make me think of spending the extra cash for the CPU + DDR4.

Might just get a add in card instead really.

I see your point, but a quick trip to Newegg yields only one new (a handful were refurbished) Z77 motherboard. Personally, the point of upgrading from a 4 year old chipset is to get something to last a few more years, and going forward a year to save a few hundred bucks is not worth it for me. I will probably re-purpose this board/CPU for something, but I am not saving pennies to upgrade my hardware at this point life either, so I'd prefer a Skylake and z100 or whatever combo for all the reasons that a Sandy Bridge chipset is dated and the joy of getting new hardware.

In fact, this desktop is the oldest build I have in the house between the 2 HTPCs and ESXi host I have. It's been a good run, but I've been getting the itch to update it.

Ika
Dec 30, 2004
Pure insanity

Combat Pretzel posted:

I thought you could retrofit lock elision into existing apps via the system's locking primitives (--edit: or threading libraries, depending on your platform)? Obviously not as effective as specifically making direct use of the relevant instructions, but it should result in some difference?

That assumes the authors used the system primitives, and didn't just go "Oh we need a reader writer lock? Well that's just supported starting with vista and we have customers on XP, so lets roll our own", etc.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

Combat Pretzel posted:

I thought you could retrofit lock elision into existing apps via the system's locking primitives (--edit: or threading libraries, depending on your platform)? Obviously not as effective as specifically making direct use of the relevant instructions, but it should result in some difference?
Porting pthreads, for example, to support TSX and lock elision in userspace is technically viable but anyone interested in stability will be negatively impacted potentially and cause some friction and may need a little more battle testing before it can actually be considered mainline support. I don't want my production app to be a guinea pig for hardware transactional memory when I upgraded my Postgres version for a new query type, for example.

EoRaptor posted:

There are two TSX modes. One re-uses an existing x86 instruction pair that is not normally associated with the memory lock command. Older CPU's will ignore them, newer CPU will use HLE to try a basic transaction on the memory area. You need to recompile with a TSX aware compiler and library, but don't need to change your code at all.
Last I remember looking at that instruction I don't remember any library actually using that for spin locks or semaphores, so it wouldn't be used by those libraries for what you'd hope for. REPNE/REPE is used for repeating instructions on a string sequence (which is kind of a clever instruction set to re-purpose honestly). But they're used for writing a loop as a single instruction and on TSX processors is now a safe sequence is the only difference. Atomic memory swap operations and the straight up lock prefixed instructions on x86 are what pthreads would compile to probably (I haven't touched pthreads for 12 years, I make zero claims on wtf is actually current). So this means not a whole lot of performance or anything without modifying the LOCK prefixed instructions, unfortunately.

However, implicit XACQUIRE/XRELEASE does make it possible on a TSX supporting processor to treat the memory region region as free to other threads and will support write ordering resolution plus cache coherency conflict resolution that should be working for Skylake at least. This means that if your compiler spit out a null terminated string strlen function it will be a transaction that's protected from other threads - that's kinda cool, but it's kinda one of those things that people have done with lockless programming on x86 for a long time now (instruction sequences that will force reordering by some fluke of internals creating hardware-based memory fences). Sure, this should help for some many cases of multithreaded programming performance and sanity headaches, but it won't help much beyond trivial locking failures. Then again, if they do cool stuff like creating lock digraphs to do deadlock detection and resolution with nested transactions or something, that would be super rad. But I see no such documentation so far.


Pretty sure that the few places that will get use out of this are database vendors and HFT shops that actually do multithreaded transactions but somehow haven't managed to just go lockless programming by now despite their talent availability. For everyone else that's not rewriting their locking primitives to support the instructions, TSX can give you better protection against certain segfaults in multithreaded code for userspace application code. Instead of outright crashing, you'll get a free, safe transaction that's a little faster depending upon how much and whether context switching overhead outweighs your locking overhead.

Edit:
TL;DR: TSX unfortunately does not actually get you this for all situations even with the backwards-compatibility help http://devopsreactions.tumblr.com/post/110529123748/lockess-algorithm

necrobobsledder fucked around with this message at 15:43 on Apr 27, 2015

Wistful of Dollars
Aug 25, 2009

Dx12 and Vulcan suggest 6-core or bust for my next upgrade. :v:

repiv
Aug 13, 2009

Eh, the Star Swarm test shows even in the worst-case where each draw call causes very little GPU load, a dual core can nearly saturate a GTX980. Real games will bottleneck on the GPU long before they hit that much CPU load.

Of course it depends how much CPU time the game spends on other stuff but don't get too excited about multi-core scaling :v:

repiv fucked around with this message at 18:44 on Apr 27, 2015

EoRaptor
Sep 13, 2003

by Fluffdaddy

necrobobsledder posted:

Porting pthreads, for example, to support TSX and lock elision in userspace is technically viable but anyone interested in stability will be negatively impacted potentially and cause some friction and may need a little more battle testing before it can actually be considered mainline support. I don't want my production app to be a guinea pig for hardware transactional memory when I upgraded my Postgres version for a new query type, for example.
Last I remember looking at that instruction I don't remember any library actually using that for spin locks or semaphores, so it wouldn't be used by those libraries for what you'd hope for. REPNE/REPE is used for repeating instructions on a string sequence (which is kind of a clever instruction set to re-purpose honestly). But they're used for writing a loop as a single instruction and on TSX processors is now a safe sequence is the only difference. Atomic memory swap operations and the straight up lock prefixed instructions on x86 are what pthreads would compile to probably (I haven't touched pthreads for 12 years, I make zero claims on wtf is actually current). So this means not a whole lot of performance or anything without modifying the LOCK prefixed instructions, unfortunately.

However, implicit XACQUIRE/XRELEASE does make it possible on a TSX supporting processor to treat the memory region region as free to other threads and will support write ordering resolution plus cache coherency conflict resolution that should be working for Skylake at least. This means that if your compiler spit out a null terminated string strlen function it will be a transaction that's protected from other threads - that's kinda cool, but it's kinda one of those things that people have done with lockless programming on x86 for a long time now (instruction sequences that will force reordering by some fluke of internals creating hardware-based memory fences). Sure, this should help for some many cases of multithreaded programming performance and sanity headaches, but it won't help much beyond trivial locking failures. Then again, if they do cool stuff like creating lock digraphs to do deadlock detection and resolution with nested transactions or something, that would be super rad. But I see no such documentation so far.


Pretty sure that the few places that will get use out of this are database vendors and HFT shops that actually do multithreaded transactions but somehow haven't managed to just go lockless programming by now despite their talent availability. For everyone else that's not rewriting their locking primitives to support the instructions, TSX can give you better protection against certain segfaults in multithreaded code for userspace application code. Instead of outright crashing, you'll get a free, safe transaction that's a little faster depending upon how much and whether context switching overhead outweighs your locking overhead.

Edit:
TL;DR: TSX unfortunately does not actually get you this for all situations even with the backwards-compatibility help http://devopsreactions.tumblr.com/post/110529123748/lockess-algorithm

Absolutely true that performance will only come with dedicated code. I think the main benefit of the hardware support will be that you cannot create faulty lockless code that potentially corrupts data. It will also make validating results much more straightforward.

For actual use cases, I'm thinking Apple has the leg up here. If I squint a bit, the threading wrapper code for Swift looks like it could be tweaked to take advantage of TSX with very little work to change already written code. This could give much better thread performance throughout the entire O/S and application stack, and though I doubt it would be visible to end users as performance, it will probably pop out as less heat and longer battery life.

For big data databases, the amount of inflight transactions possible would need to be greatly increased, and we will probably see that happen on the Xeon lineup. Getting it into Skylake is probably about developer usage, not data center usage (yet).

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!

El Scotch posted:

Dx12 and Vulcan suggest 6-core or bust for my next upgrade. :v:
I am still considering going with Haswell-E eventually. Now because of this.

Lord Windy
Mar 26, 2010
Is Hyper Threading decent? If you had two equally powered devices, would the one with hyper threader perform better in multithreading activities?

Yaoi Gagarin
Feb 20, 2014

Lord Windy posted:

Is Hyper Threading decent? If you had two equally powered devices, would the one with hyper threader perform better in multithreading activities?

Yes, but the performance improvement isn't as good as adding more physical cores. So if you need lots of threads running in parallel, eight cores > four cores with HT > just four cores.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Lord Windy posted:

Is Hyper Threading decent? If you had two equally powered devices, would the one with hyper threader perform better in multithreading activities?

Between 5% and 55% better performance depending on what exact mix of threads you have, and what part of the CPU they're using. In cases where you're doing the exact same poo poo with every thread, it'll be much closer to the former than the later.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Methylethylaldehyde posted:

Between 5% and 55% better performance depending on what exact mix of threads you have, and what part of the CPU they're using. In cases where you're doing the exact same poo poo with every thread, it'll be much closer to the former than the later.

What's going on with GTA V then? All my friends playing it report that disabling HT in BIOS gets them 3-5 fps in the in-engine benchmark, most of the FAQs I've seen about it say "turn off hyperthreading if you have an i7".

Kazinsal
Dec 13, 2011



Disabling hyperthreading entirely for a single game may be one of the stupidest things I've heard suggested for an extra couple frames.

You basically turn your i7 into a (possibly greatly) more expensive i5 by doing that.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
On top of that, the scheduler in Windows is hyperthreading aware, so it'll schedule things for as long as possible in such a way that hyperthreading won't interfere with performance.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

Twerk from Home posted:

What's going on with GTA V then? All my friends playing it report that disabling HT in BIOS gets them 3-5 fps in the in-engine benchmark, most of the FAQs I've seen about it say "turn off hyperthreading if you have an i7".

Really lovely programming would be my guess. There is an API call you can use to determine which cores are real and use those preferentially, but it sounds like it's just piling the threads on CPUs 0-3 and then locking cores with contention or something.

future ghost
Dec 5, 2005

:byetankie:
Gun Saliva
Worse programming than Ubisoft. Nice. Well worth that extra delay getting things as polished as possible.

Generic Monk
Oct 31, 2011

cisco privilege posted:

Worse programming than Ubisoft. Nice. Well worth that extra delay getting things as polished as possible.

I'd say 'losing a few potential FPS on enthusiast CPUs that a minority of customers own' is better programming than 'doesn't run at all on dual cores'

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast

Generic Monk posted:

I'd say 'losing a few potential FPS on enthusiast CPUs that a minority of customers own' is better programming than 'doesn't run at all on dual cores'

Yeah, truth. I actually find GTA V runs really well on my machine, but then I don't have an FPS counter up at all times, thinking about how much I need a new graphics card when it hits the lower 40s. Because hey, I'm just.. enjoying the game.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Because hyperthreading improvements are percentage-based typically in comparisons rather than absolute clock and wall clock improvements, what are those FPS in terms of percentage to the baseline comparison? If it's like 10% (20s-30s FPS implied) then that's pretty bad suboptimal scheduling of those threads to logical CPUs. I'd try to play around a little bit with CPU core affinity and try to pin the program to a few CPUs exclusively (think I could do this in Task Scheduler before even, otherwise it's from like PSExplorer or a PowerToy some MS dev wrote up) and see what happens.

This is the sort of case where I say that the TSX extensions mentioned above will probably have no perceptible improvements. If they're issuing lock instructions galore and causing massive amounts of context switches, for example, the backwards-compatibility will be of almost no help because you're already dwarfed by order of magnitude performance problems elsewhere. It's if they're trying to already do lockless programming that TSX in backwards compatibility instruction prefixes will help substantially in making your code easier to write and help avoid a solid majority of crashes within platform code.

This is weird to me but wasn't GTA 5 already on current-gen consoles and so already x86-based? wtf could it even be then?

Nintendo Kid
Aug 4, 2011

by Smythe

necrobobsledder posted:

This is weird to me but wasn't GTA 5 already on current-gen consoles and so already x86-based? wtf could it even be then?

It was originally on the 360 and PS3, where the 360 is a simple tri-core PowerPC and the PS3's Cell architecture is diverse core designs. It was then ported to the 8 full core AMD x86-64 chipsets used in both current-gen consoles, which is obviously a different programming environment than say a 4 core 8 thread Intel CPU.

eggyolk
Nov 8, 2007


Has anyone benchmarked GTAV on one of those 10+ physical core Xeons?

Generic Monk
Oct 31, 2011

HalloKitty posted:

Yeah, truth. I actually find GTA V runs really well on my machine, but then I don't have an FPS counter up at all times, thinking about how much I need a new graphics card when it hits the lower 40s. Because hey, I'm just.. enjoying the game.

I couldn't be happier with the performance - runs great on my 3570k and 290x at 1440p w/ more or less everything ratcheded up. Not a solid 60fps or anything like that but it's never less than a pleasure to play, and the game looks amazing 100% of the time. Makes a nice change from the last-gen version where it looked like the 360 was going to tear itself apart at any given time.

mobby_6kl
Aug 9, 2009

by Fluffdaddy
These are supposedly the Skylake models:



Doesn't really tell us much besides the fact that there isn't a massive frequency jump.

Beautiful Ninja
Mar 26, 2009

Five time FCW Champion...of my heart.
Here's to hoping Intel decides to solder the heatspreaders like the good old days and we can get some nice overclocks on these CPU's.

The_Franz
Aug 8, 2003

Combat Pretzel posted:

On top of that, the scheduler in Windows is hyperthreading aware, so it'll schedule things for as long as possible in such a way that hyperthreading won't interfere with performance.

Depends. If some application has threads busy-waiting instead of waiting, yeilding or at least issuing a pause instruction it can cause CPU cores not to switch hardware contexts as efficiently as they could.

Looking at this benchmark the game definitely benefits from more cores, but even the current lower-end i3 is enough to average well over 60fps with a sufficient GPU.

KillHour
Oct 28, 2007


Beautiful Ninja posted:

Here's to hoping Intel decides to solder the heatspreaders like the good old days and we can get some nice overclocks on these CPU's.

I thought the current Broadwell chips already have much better overclocking headroom than the Haswells did?

I have a Haswell, so I didn't bother looking at Broadwell too closely but even mine overclocks fine for what I need (4.5ghz is no problem).

Adbot
ADBOT LOVES YOU

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!

mobby_6kl posted:

These are supposedly the Skylake models:



Doesn't really tell us much besides the fact that there isn't a massive frequency jump.
Ah for gently caress's sake. The i7-6700 non-K with the extended virtualization features is clocked pretty much the same as my i7-2600, whereas the i7-6700K, which I'd rather like, doesn't come with VT-d and poo poo. (--edit: Then again, I used VT-d only while experimenting with Xen and Linux' KVM. Keep hoping that PEG passthrough gets more reliable... any day now.)

Also, DDR3L doesn't mean regular DDR3, right? So I'd have to get new DIMMs regardless?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply