Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
No Gravitas
Jun 12, 2013

by FactsAreUseless

BurritoJustice posted:

That was "No Gravitas" if I remember correctly. It was a cool series of posts.

Yup, it was me. Give me something insane to do, and I will.

Durinia posted:

Yeah, you can "compile and go", but you'll get complete rear end-level performance.

KNF and KNC were mostly experiments. KNL is being pushed by Intel as the first real focused HPC implementation as a product.

Of course, they also said that about KNC, so...

The performance sucked with gcc as only the intel compiler uses the wide execution units. About a 20? 30? 50? times degradation over Haswell Xeon processor, if I recall. (compared single core to single core) If I booked up the whole Phi (and remember you need twice the amount of threads as cores!) I would have ended up on par with my server processor. Not a bad deal for 100$, or however much it cost me to get it, just not what I dreamed of.

Still an amazingly cool device, but you need the intel compiler for doing anything serious on it, period.

Maybe use it as sorta kind of a swap, those 8GB of ram could come in handy for something...

Adbot
ADBOT LOVES YOU

Durinia
Sep 26, 2014

The Mad Computer Scientist

No Gravitas posted:

Yup, it was me. Give me something insane to do, and I will.


The performance sucked with gcc as only the intel compiler uses the wide execution units. About a 20? 30? 50? times degradation over Haswell Xeon processor, if I recall. (compared single core to single core) If I booked up the whole Phi (and remember you need twice the amount of threads as cores!) I would have ended up on par with my server processor. Not a bad deal for 100$, or however much it cost me to get it, just not what I dreamed of.

Still an amazingly cool device, but you need the intel compiler for doing anything serious on it, period.

Maybe use it as sorta kind of a swap, those 8GB of ram could come in handy for something...

Never saw your original posts, but the fact that you got one for $100 should tell you how useful they were versus the original list price...

icc does do better, but it still needs a lot of attention. Going to the higher thread count would also have likely hurt performance as the scalability would be nowhere near perfect.

mobby_6kl
Aug 9, 2009

by Fluffdaddy

No Gravitas posted:

Yup, it was me. Give me something insane to do, and I will.


The performance sucked with gcc as only the intel compiler uses the wide execution units. About a 20? 30? 50? times degradation over Haswell Xeon processor, if I recall. (compared single core to single core) If I booked up the whole Phi (and remember you need twice the amount of threads as cores!) I would have ended up on par with my server processor. Not a bad deal for 100$, or however much it cost me to get it, just not what I dreamed of.

Still an amazingly cool device, but you need the intel compiler for doing anything serious on it, period.

Maybe use it as sorta kind of a swap, those 8GB of ram could come in handy for something...

Is that the performance with ICC or were you not able to test that?

No Gravitas
Jun 12, 2013

by FactsAreUseless

mobby_6kl posted:

Is that the performance with ICC or were you not able to test that?

GCC, never bothered with ICC in the end. Had other projects to do than to go looking for ICC. I imagine ICC would be better, yeah. My workload wouldn't be able to use the wide execution units anyway, being basically 224 instances of GNU Octave running highly branchy code with no matrix processing at all...

Yay for academic legacy issues!

Grundulum
Feb 28, 2006
Where did you get a Phi for $100? That's crazy-go-nuts pricing -- like 95% off sticker price.

No Gravitas
Jun 12, 2013

by FactsAreUseless

Grundulum posted:

Where did you get a Phi for $100? That's crazy-go-nuts pricing -- like 95% off sticker price.

It went really cheap on Amazon during some Intel fire sale when they were dumping them for almost free. Everyone was selling it for 150$ a unit, and so on.... I found the best deal ever.

Item Subtotal: $79.40
Shipping & Handling: $12.60
Total Before Tax: $92.00
Shipment Total: $92.00
Paid by Mastercard: $92.00

I'm really good at hunting for deals. I kept searching and finally...

The seller was LoTN LLC, but the deal is very long dead now.

Professor Science
Mar 8, 2006
diplodocus + mortarboard = party
yeah, there was a fire sale on KNF when KNC started appearing (around the time Stampede went online, I guess). but yeah, Phi as a drop-in parallel replacement for standard x86 code never really made sense because of the in-order cores, the small caches, and the memory latency. you have to make use of the vector units for it to be interesting at all.

Lord Windy
Mar 26, 2010
Are the 60 odd cores independent of each-other and could act like a server if it wasn't a PCI chip? Or is it closer to using CUDA?

EDIT: And are the in-order cpu cores meant to help with writing code for it?

Lord Windy fucked around with this message at 11:13 on Oct 18, 2015

Luna Was Here
Mar 21, 2013

Lipstick Apathy
Hi, I'm not sure if this is the thread for this but I was wondering if anyone had good recommendations for any temp monitoring software? I have GPU Tweak for my GPU but I'm not sure where to get a reading for my CPU and I'm not sure if I should go with the really common recommendations that google keeps bringing up (like SpeedFan) because I somewhat remember reading on here that some of these software only give estimates and not accurate readings but I might just be misremembering.

B-Mac
Apr 21, 2003
I'll never catch "the gay"!

Luna Was Here posted:

Hi, I'm not sure if this is the thread for this but I was wondering if anyone had good recommendations for any temp monitoring software? I have GPU Tweak for my GPU but I'm not sure where to get a reading for my CPU and I'm not sure if I should go with the really common recommendations that google keeps bringing up (like SpeedFan) because I somewhat remember reading on here that some of these software only give estimates and not accurate readings but I might just be misremembering.

I use hwinfo64 to monitor temps and it works well enough for me.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Lord Windy posted:

Are the 60 odd cores independent of each-other and could act like a server if it wasn't a PCI chip? Or is it closer to using CUDA?

EDIT: And are the in-order cpu cores meant to help with writing code for it?

It's 60 odd independent x86 cores which have a wider fancier version of AVX. It's a lot closer to writing code for normal x86 than CUDA.

Knights Landing (which I think is out now? or close to it) is out-of-order. Knights Ferry (Larrabee) and Knights Corner (shrunk version, and the actual commercial product) were in-order for a couple reasons. One was to reduce the amount of silicon and power spent on control flow logic, because it was supposed to try to be a GPU. The other is that it was a somewhat experimental project without the budget (or time) to design a new core from scratch, so they rummaged through the junk drawer, pulled out the original Pentium in-order core, and modified it.

Besides the vector unit, another modification was to add hardware threading. Cache access latency kills in-order CPU performance even on hits, so they gave it 4-way SMT to hide the problem. The baseline for getting good performance out of a Knights Corner chip is 240 threads, each of which must use the 512-bit 16-wide SIMD unit.

JawnV6
Jul 4, 2004

So hot ...

BobHoward posted:

they rummaged through the junk drawer
If only. More like a dog showing back up after a couple decades and a cross-country move.

Durinia
Sep 26, 2014

The Mad Computer Scientist

BobHoward posted:

It's 60 odd independent x86 cores which have a wider fancier version of AVX. It's a lot closer to writing code for normal x86 than CUDA.

Knights Landing (which I think is out now? or close to it) is out-of-order. Knights Ferry (Larrabee) and Knights Corner (shrunk version, and the actual commercial product) were in-order for a couple reasons. One was to reduce the amount of silicon and power spent on control flow logic, because it was supposed to try to be a GPU. The other is that it was a somewhat experimental project without the budget (or time) to design a new core from scratch, so they rummaged through the junk drawer, pulled out the original Pentium in-order core, and modified it.

Besides the vector unit, another modification was to add hardware threading. Cache access latency kills in-order CPU performance even on hits, so they gave it 4-way SMT to hide the problem. The baseline for getting good performance out of a Knights Corner chip is 240 threads, each of which must use the 512-bit 16-wide SIMD unit.

KNL isn't out yet. They keep pushing the dates back. The other difference with KNL vs. KNC/F is that it's capable of running the OS by itself now. With KNC/F you had to attach it to another processor over PCIe like a GPU. That "Xeon tax" won't be there with KNL.

The out-of-orderness of KNL is still very mild. It's a modified version of Silvermont in terms of the out-of-order structures, but they added the massive AVX and the threading. It's definitely not going to beat a Haswell Xeon at anything unless great use of the AVX and threads are made within the application.

VulgarandStupid
Aug 5, 2003
I AM, AND ALWAYS WILL BE, UNFUCKABLE AND A TOTAL DISAPPOINTMENT TO EVERYONE. DAE WANNA CUM PLAY WITH ME!?




So apparently RAM speeds still don't matter on Skylake, unless you are using the onboard graphics. This is probably surprising to no one.

http://www.silentpcreview.com/Skylake_Memory_Scaling

Panty Saluter
Jan 17, 2004

Making learning fun!

VulgarandStupid posted:

So apparently RAM speeds still don't matter on Skylake, unless you are using the onboard graphics. This is probably surprising to no one.

http://www.silentpcreview.com/Skylake_Memory_Scaling

Isn't that the same with AMD's APUs? Also doesn't the higher latency offset some of the higher clock speed benefit?

Richard M Nixon
Apr 26, 2009

"The greatest honor history can bestow is the title of peacemaker."
Phi talk. I did my master's thesis on database acceleration through hardware parallelization. My advisor wanted me to use a whole bunch of setups, from shared memory systems using openmp and distributed memory systems with mpi to beefy servers using gpgpus and mic hardware.

I had a few 5110p phi models in a server and they turned out to be extremely lovely. I can't go into all the tech details on the phone, but their off-card memory access killed any kind of performance gain they could possibly give me, and they only had something like 8gb memory to share, which was absurd. I can't imagine what market they were targeting where you'd need huge parallelism but your dataset was insignificant in size. Maybe some kind of financial application or cryptography, I'm not sure. Their actual compute times were pretty impressive, but when you can't pass data to your host, you're hosed. My Tesla cards weren't that much better in terms of computation (about a 15% gain vs the phi) but the fact that I could use pinned mapped memory to do dynamic read-writes made data transfer almost invisible. I could do background transfers with no hassle, whereas with the phi I had to do my whole memcopy before ever starting execution.

Of course, with so much hardware I didn't design optimized code for every platform so I'm not saying the phi couldn't do well as a targeted environment with heavy work on the code to utilize it's unique architecture, but it didn't look good after spending a week tweaking things. I did like that it was just c++ run through ICC to get it up and running instead of all the hoops to jump through with Cuda.

Yudo
May 15, 2003

Richard M Nixon posted:

I did like that it was just c++ run through ICC to get it up and running instead of all the hoops to jump through with Cuda.

This was the big sell of Phi from the get go, it just hasn't quite delivered yet. As others have mentioned, it is not a fully realized product.

Durinia
Sep 26, 2014

The Mad Computer Scientist

Yudo posted:

This was the big sell of Phi from the get go, it just hasn't quite delivered yet. As others have mentioned, it is not a fully realized product.

That was mostly marketing. I mean, it would run, but to get any performance, you have to do a similar amount of tuning as on a GPU. And by virtue of their market, "just running" isn't really of much value.

Proud Christian Mom
Dec 20, 2006
READING COMPREHENSION IS HARD
it seemed like the Phi was Intel just going, 'well, we did this thing thats kinda cool, have at it?'

Khorne
May 1, 2002

Richard M Nixon posted:

I can't imagine what market they were targeting where you'd need huge parallelism but your dataset was insignificant in size.
Phi, and gpu computing in general, is pretty decent for molecular dynamics simulations. It's likely pretty decent for lots of other physics simulations too. The datasets are small in size because you have a small number of floats, let's say three, for each particle. And then you have some number of particles, and it fits pretty easily in far less than 8gb of memory. It's also trivially parallelized with no relevant branching.

Khorne fucked around with this message at 19:19 on Oct 20, 2015

pmchem
Jan 22, 2010


Khorne posted:

Phi, and gpu computing in general, is pretty decent for molecular dynamics simulations. It's likely pretty decent for lots of other physics simulations too. The datasets are small in size because you have a small number of floats, let's say three, for each particle. And then you have some number of particles, and it fits pretty easily in far less than 8gb of memory. It's also trivially parallelized with no relevant branching.

Performance difference between GPU and first/second-gen Phi products on MD is enormous due in part to memory bandwidth issues in Phi. NREL got a petascale computer that had about half its flops in phi, and it went extremely under-utilized on the phi side for at least the first year because the porting/performance was so ugly for gromacs and LAMMPS.

sincx
Jul 13, 2012

furiously masturbating to anime titties
.

sincx fucked around with this message at 05:55 on Mar 23, 2021

Anime Schoolgirl
Nov 28, 2002

sincx posted:

When can I justify upgrading my 2600k at 4.3 GHz (1.25V)? With Skylake's lackluster IPC improvements and overclocking, am I going to have to wait until Cannonlake?
One of these things must happen:

1) Whether you really, really want anything in the Skylake-era chipsets. If you're just gaming this will never be the case, unless for some reason you record 60fps 4k video and actually need the Intel 750's speed, in which case you probably have something better than a Sandy Bridge already
2) Zen comes out and is good enough to actually make Intel put L4 cache on more things (the only visible upgrade at the non-server top end), and that isn't coming until Kaby Lake at the earliest.

Anime Schoolgirl fucked around with this message at 04:31 on Oct 21, 2015

dud root
Mar 30, 2008

sincx posted:

When can I justify upgrading my 2600k at 4.3 GHz (1.25V)? With Skylake's lackluster IPC improvements and overclocking, am I going to have to wait until Cannonlake?

I'm in the same situation. Native USB3.0 and booting from NVMe SSDs is probably going to get me to upgrade

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!

Anime Schoolgirl posted:

2) Zen comes out and is good enough to actually make Intel put L4 cache on more things (the only visible upgrade at the non-server top end), and that isn't coming until Kaby Lake at the earliest.
I looked up what a Zen is, I don't think we'll see the devil wear a Parka anytime soon.

champagne posting
Apr 5, 2006

YOU ARE A BRAIN
IN A BUNKER

The ghost of processors future which will save AMD.

Anime Schoolgirl
Nov 28, 2002

Which is funny because the only parts people will actually pay attention to are:

1) Zen's G server socket again being a solid undercutter over Intel at the same performance and power, which had been AMD's bread and butter until Bulldozer (Vishera was okay at least but nothing came after that)
2) Zen as a mobile platform (Carrizo is them being competent and actually successfully making something for the ultralight segment, but they'll need to put graphics on said low-watt Zen chips first before OEMs consider them which isn't halfway into 2017 at the earliest :negative:)

Gwaihir
Dec 8, 2009
Hair Elf

sincx posted:

When can I justify upgrading my 2600k at 4.3 GHz (1.25V)? With Skylake's lackluster IPC improvements and overclocking, am I going to have to wait until Cannonlake?


dud root posted:

I'm in the same situation. Native USB3.0 and booting from NVMe SSDs is probably going to get me to upgrade


4.3ghz isn't very high for a 2600k, and a Skylake chip running at 4.4 -4.6 is certainly going to be faster.... But it won't be faster enough for you to really care since you probably don't do anything really CPU limited unless you're making videos or 3d rendering at home.

Chipset motherboard stuff like Native USB3/3.1 etc and NVMe boot SSD support is a way better reason for most people to upgrade at this point.

Ika
Dec 30, 2004
Pure insanity

Richard M Nixon posted:

...I can't imagine what market they were targeting where you'd need huge parallelism but your dataset was insignificant in size...

I tried to get ahold of one for work - we often have sparse matrices which are either <8gb size or can be trivially split into segments of less than 8gb, and most sparse matrix routines are trivial to parallelize.

Grundulum
Feb 28, 2006
My work machine has an i7-3930k in it; 6 physical cores with two-way hyperthreading, so 12 logical cores. When I try to run jobs that use more than one core, I get the distinct impression that I'm being throttled due to thermal considerations. That leads me to two questions:

(1) Where can I find the core-by-core breakdown of maximum speeds for this processor, assuming just the stock cooling setup?
(2) If I buy one of the coolers recommended in the PC parts thread, will it fit both this processor and Skylake/beyond? I intend to replace this machine eventually, and a cooler seems like it ought to be reusable.

Don Lapre
Mar 28, 2001

If you're having problems you're either holding the phone wrong or you have tiny girl hands.
use realtemp. It will show you every core temp

pmchem
Jan 22, 2010


Grundulum posted:

My work machine has an i7-3930k in it; 6 physical cores with two-way hyperthreading, so 12 logical cores. When I try to run jobs that use more than one core, I get the distinct impression that I'm being throttled due to thermal considerations. That leads me to two questions:

(1) Where can I find the core-by-core breakdown of maximum speeds for this processor, assuming just the stock cooling setup?
(2) If I buy one of the coolers recommended in the PC parts thread, will it fit both this processor and Skylake/beyond? I intend to replace this machine eventually, and a cooler seems like it ought to be reusable.

re: (1)
https://en.wikipedia.org/wiki/Intel_Turbo_Boost
http://www.intel.com/support/processors/corei7/sb/CS-032279.htm

The cooler shouldn't help.

Aquila
Jan 24, 2003

Grundulum posted:

My work machine has an i7-3930k in it; 6 physical cores with two-way hyperthreading, so 12 logical cores. When I try to run jobs that use more than one core, I get the distinct impression that I'm being throttled due to thermal considerations. That leads me to two questions:

(1) Where can I find the core-by-core breakdown of maximum speeds for this processor, assuming just the stock cooling setup?
(2) If I buy one of the coolers recommended in the PC parts thread, will it fit both this processor and Skylake/beyond? I intend to replace this machine eventually, and a cooler seems like it ought to be reusable.

Check the bios (especially if it's an actual workstation type system from a big vendor like Dell or HP) for performance and thermal profiles.

Just in case, if it's running linux there are some grub parameters you can set as well, including one which can steal 30% of your performance if its used with hyperthreading.

Also consider something like realtemp or another monitoring tool. Realtemp shows current clockspeed/multiplier along with temps for each core.

Richard M Nixon
Apr 26, 2009

"The greatest honor history can bestow is the title of peacemaker."

Grundulum posted:

My work machine has an i7-3930k in it; 6 physical cores with two-way hyperthreading, so 12 logical cores. When I try to run jobs that use more than one core, I get the distinct impression that I'm being throttled due to thermal considerations. That leads me to two questions:

(1) Where can I find the core-by-core breakdown of maximum speeds for this processor, assuming just the stock cooling setup?
(2) If I buy one of the coolers recommended in the PC parts thread, will it fit both this processor and Skylake/beyond? I intend to replace this machine eventually, and a cooler seems like it ought to be reusable.

Get your thermal data and cut a ticket to IT asking them to delid your cpu and fix the thermal paste.

sincx
Jul 13, 2012

furiously masturbating to anime titties
.

sincx fucked around with this message at 05:55 on Mar 23, 2021

Moey
Oct 22, 2010

I LIKE TO MOVE IT

Richard M Nixon posted:

Get your thermal data and cut a ticket to IT asking them to delid your cpu and fix the thermal paste.

This. Let your IT guys know, and let them do their job.

Anime Schoolgirl
Nov 28, 2002

sincx posted:

The -E processors are still all soldered to the heatspreader.

Yeah, delidding isn't going to do anything but rip the chip apart i'm afraid.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

Richard M Nixon posted:

Get your thermal data and cut a ticket to IT asking them to delid your cpu and fix the thermal paste.
If someone sent me a ticket like that I'd pretty much tell them to go gently caress themselves in the most enterprisey, politically correct language I could muster while I reassigned the ticket to L1 helpless desk.

Grundulum
Feb 28, 2006

Richard M Nixon posted:

Get your thermal data and cut a ticket to IT asking them to delid your cpu and fix the thermal paste.

Given that we have no IT staff to do things like this, I think I would be better off buying a CPU cooler and putting it on myself.

I would also have to ask my hypothetical IT staff to interpret said thermal data and determine if I really am heat-throttled. I didn't realize that current and voltage also acted to limit the turbo boost, and assumed that it was based solely off generated heat.

Adbot
ADBOT LOVES YOU

Watermelon Daiquiri
Jul 10, 2010
I TRIED TO BAIT THE TXPOL THREAD WITH THE WORLD'S WORST POSSIBLE TAKE AND ALL I GOT WAS THIS STUPID AVATAR.

Grundulum posted:

Given that we have no IT staff to do things like this, I think I would be better off buying a CPU cooler and putting it on myself.

I would also have to ask my hypothetical IT staff to interpret said thermal data and determine if I really am heat-throttled. I didn't realize that current and voltage also acted to limit the turbo boost, and assumed that it was based solely off generated heat.

Well, it is. ;) Heat is just excess energy being given off in thermal form. For all electrical systems, the power used is given by the formula Power = Voltage * Current, and while some of that power usage is used to run the processor, alot of that is wasted in the form of thermal energy.

Power is energy per time, voltage is energy per charge, and current is charge per time, so voltage * current is energy per time!

Watermelon Daiquiri fucked around with this message at 17:50 on Oct 23, 2015

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply