Intel: lol

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Intel: lol

No Gravitas: Jun 12, 2013; by FactsAreUseless

BurritoJustice posted:

That was "No Gravitas" if I remember correctly. It was a cool series of posts.

Yup, it was me. Give me something insane to do, and I will.

Durinia posted:

Yeah, you can "compile and go", but you'll get complete rear end-level performance.

KNF and KNC were mostly experiments. KNL is being pushed by Intel as the first real focused HPC implementation as a product.

Of course, they also said that about KNC, so...

The performance sucked with gcc as only the intel compiler uses the wide execution units. About a 20? 30? 50? times degradation over Haswell Xeon processor, if I recall. (compared single core to single core) If I booked up the whole Phi (and remember you need twice the amount of threads as cores!) I would have ended up on par with my server processor. Not a bad deal for 100$, or however much it cost me to get it, just not what I dreamed of.

Still an amazingly cool device, but you need the intel compiler for doing anything serious on it, period.

Maybe use it as sorta kind of a swap, those 8GB of ram could come in handy for something...

# ? Oct 16, 2015 18:58

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 21:01

Durinia: Sep 26, 2014; The Mad Computer Scientist

No Gravitas posted:

Yup, it was me. Give me something insane to do, and I will.

The performance sucked with gcc as only the intel compiler uses the wide execution units. About a 20? 30? 50? times degradation over Haswell Xeon processor, if I recall. (compared single core to single core) If I booked up the whole Phi (and remember you need twice the amount of threads as cores!) I would have ended up on par with my server processor. Not a bad deal for 100$, or however much it cost me to get it, just not what I dreamed of.

Still an amazingly cool device, but you need the intel compiler for doing anything serious on it, period.

Maybe use it as sorta kind of a swap, those 8GB of ram could come in handy for something...

Never saw your original posts, but the fact that you got one for $100 should tell you how useful they were versus the original list price...

icc does do better, but it still needs a lot of attention. Going to the higher thread count would also have likely hurt performance as the scalability would be nowhere near perfect.

# ? Oct 16, 2015 19:55

mobby_6kl: Aug 9, 2009; by Fluffdaddy

No Gravitas posted:

Yup, it was me. Give me something insane to do, and I will.

The performance sucked with gcc as only the intel compiler uses the wide execution units. About a 20? 30? 50? times degradation over Haswell Xeon processor, if I recall. (compared single core to single core) If I booked up the whole Phi (and remember you need twice the amount of threads as cores!) I would have ended up on par with my server processor. Not a bad deal for 100$, or however much it cost me to get it, just not what I dreamed of.

Still an amazingly cool device, but you need the intel compiler for doing anything serious on it, period.

Maybe use it as sorta kind of a swap, those 8GB of ram could come in handy for something...

Is that the performance with ICC or were you not able to test that?

# ? Oct 16, 2015 20:54

No Gravitas: Jun 12, 2013; by FactsAreUseless

mobby_6kl posted:

Is that the performance with ICC or were you not able to test that?

GCC, never bothered with ICC in the end. Had other projects to do than to go looking for ICC. I imagine ICC would be better, yeah. My workload wouldn't be able to use the wide execution units anyway, being basically 224 instances of GNU Octave running highly branchy code with no matrix processing at all...

Yay for academic legacy issues!

# ? Oct 16, 2015 20:59

Grundulum: Feb 28, 2006

Where did you get a Phi for $100? That's crazy-go-nuts pricing -- like 95% off sticker price.

# ? Oct 17, 2015 12:19

No Gravitas: Jun 12, 2013; by FactsAreUseless

Grundulum posted:

Where did you get a Phi for $100? That's crazy-go-nuts pricing -- like 95% off sticker price.

It went really cheap on Amazon during some Intel fire sale when they were dumping them for almost free. Everyone was selling it for 150$ a unit, and so on.... I found the best deal ever.

Item Subtotal: $79.40
Shipping & Handling: $12.60
Total Before Tax: $92.00
Shipment Total: $92.00
Paid by Mastercard: $92.00

I'm really good at hunting for deals. I kept searching and finally...

The seller was LoTN LLC, but the deal is very long dead now.

# ? Oct 17, 2015 17:23

Professor Science: Mar 8, 2006; diplodocus + mortarboard = party

yeah, there was a fire sale on KNF when KNC started appearing (around the time Stampede went online, I guess). but yeah, Phi as a drop-in parallel replacement for standard x86 code never really made sense because of the in-order cores, the small caches, and the memory latency. you have to make use of the vector units for it to be interesting at all.

# ? Oct 18, 2015 04:06

Lord Windy: Mar 26, 2010

Are the 60 odd cores independent of each-other and could act like a server if it wasn't a PCI chip? Or is it closer to using CUDA?

EDIT: And are the in-order cpu cores meant to help with writing code for it?

Lord Windy fucked around with this message at 11:13 on Oct 18, 2015

# ? Oct 18, 2015 11:11

Luna Was Here: Mar 21, 2013; Lipstick Apathy

Hi, I'm not sure if this is the thread for this but I was wondering if anyone had good recommendations for any temp monitoring software? I have GPU Tweak for my GPU but I'm not sure where to get a reading for my CPU and I'm not sure if I should go with the really common recommendations that google keeps bringing up (like SpeedFan) because I somewhat remember reading on here that some of these software only give estimates and not accurate readings but I might just be misremembering.

# ? Oct 18, 2015 16:26

B-Mac: Apr 21, 2003; I'll never catch "the gay"!

Luna Was Here posted:

Hi, I'm not sure if this is the thread for this but I was wondering if anyone had good recommendations for any temp monitoring software? I have GPU Tweak for my GPU but I'm not sure where to get a reading for my CPU and I'm not sure if I should go with the really common recommendations that google keeps bringing up (like SpeedFan) because I somewhat remember reading on here that some of these software only give estimates and not accurate readings but I might just be misremembering.

I use hwinfo64 to monitor temps and it works well enough for me.

# ? Oct 18, 2015 16:44

BobHoward: Feb 13, 2012; The only thing white people deserve is a bullet to their empty skull

Lord Windy posted:

Are the 60 odd cores independent of each-other and could act like a server if it wasn't a PCI chip? Or is it closer to using CUDA?

EDIT: And are the in-order cpu cores meant to help with writing code for it?

It's 60 odd independent x86 cores which have a wider fancier version of AVX. It's a lot closer to writing code for normal x86 than CUDA.

Knights Landing (which I think is out now? or close to it) is out-of-order. Knights Ferry (Larrabee) and Knights Corner (shrunk version, and the actual commercial product) were in-order for a couple reasons. One was to reduce the amount of silicon and power spent on control flow logic, because it was supposed to try to be a GPU. The other is that it was a somewhat experimental project without the budget (or time) to design a new core from scratch, so they rummaged through the junk drawer, pulled out the original Pentium in-order core, and modified it.

Besides the vector unit, another modification was to add hardware threading. Cache access latency kills in-order CPU performance even on hits, so they gave it 4-way SMT to hide the problem. The baseline for getting good performance out of a Knights Corner chip is 240 threads, each of which must use the 512-bit 16-wide SIMD unit.

# ? Oct 18, 2015 18:53

JawnV6: Jul 4, 2004; So hot ...

BobHoward posted:

they rummaged through the junk drawer

If only. More like a dog showing back up after a couple decades and a cross-country move.

# ? Oct 18, 2015 19:08

Durinia: Sep 26, 2014; The Mad Computer Scientist

BobHoward posted:

It's 60 odd independent x86 cores which have a wider fancier version of AVX. It's a lot closer to writing code for normal x86 than CUDA.

Knights Landing (which I think is out now? or close to it) is out-of-order. Knights Ferry (Larrabee) and Knights Corner (shrunk version, and the actual commercial product) were in-order for a couple reasons. One was to reduce the amount of silicon and power spent on control flow logic, because it was supposed to try to be a GPU. The other is that it was a somewhat experimental project without the budget (or time) to design a new core from scratch, so they rummaged through the junk drawer, pulled out the original Pentium in-order core, and modified it.

Besides the vector unit, another modification was to add hardware threading. Cache access latency kills in-order CPU performance even on hits, so they gave it 4-way SMT to hide the problem. The baseline for getting good performance out of a Knights Corner chip is 240 threads, each of which must use the 512-bit 16-wide SIMD unit.

KNL isn't out yet. They keep pushing the dates back. The other difference with KNL vs. KNC/F is that it's capable of running the OS by itself now. With KNC/F you had to attach it to another processor over PCIe like a GPU. That "Xeon tax" won't be there with KNL.

The out-of-orderness of KNL is still very mild. It's a modified version of Silvermont in terms of the out-of-order structures, but they added the massive AVX and the threading. It's definitely not going to beat a Haswell Xeon at anything unless great use of the AVX and threads are made within the application.

# ? Oct 19, 2015 15:36

VulgarandStupid: Aug 5, 2003; I AM, AND ALWAYS WILL BE, UNFUCKABLE AND A TOTAL DISAPPOINTMENT TO EVERYONE. DAE WANNA CUM PLAY WITH ME!?

So apparently RAM speeds still don't matter on Skylake, unless you are using the onboard graphics. This is probably surprising to no one.

http://www.silentpcreview.com/Skylake_Memory_Scaling

# ? Oct 19, 2015 16:07

Panty Saluter: Jan 17, 2004; Making learning fun!

VulgarandStupid posted:

So apparently RAM speeds still don't matter on Skylake, unless you are using the onboard graphics. This is probably surprising to no one.

http://www.silentpcreview.com/Skylake_Memory_Scaling

Isn't that the same with AMD's APUs? Also doesn't the higher latency offset some of the higher clock speed benefit?

# ? Oct 19, 2015 18:56

Richard M Nixon: Apr 26, 2009; "The greatest honor history can bestow is the title of peacemaker."

Phi talk. I did my master's thesis on database acceleration through hardware parallelization. My advisor wanted me to use a whole bunch of setups, from shared memory systems using openmp and distributed memory systems with mpi to beefy servers using gpgpus and mic hardware.

I had a few 5110p phi models in a server and they turned out to be extremely lovely. I can't go into all the tech details on the phone, but their off-card memory access killed any kind of performance gain they could possibly give me, and they only had something like 8gb memory to share, which was absurd. I can't imagine what market they were targeting where you'd need huge parallelism but your dataset was insignificant in size. Maybe some kind of financial application or cryptography, I'm not sure. Their actual compute times were pretty impressive, but when you can't pass data to your host, you're hosed. My Tesla cards weren't that much better in terms of computation (about a 15% gain vs the phi) but the fact that I could use pinned mapped memory to do dynamic read-writes made data transfer almost invisible. I could do background transfers with no hassle, whereas with the phi I had to do my whole memcopy before ever starting execution.

Of course, with so much hardware I didn't design optimized code for every platform so I'm not saying the phi couldn't do well as a targeted environment with heavy work on the code to utilize it's unique architecture, but it didn't look good after spending a week tweaking things. I did like that it was just c++ run through ICC to get it up and running instead of all the hoops to jump through with Cuda.

# ? Oct 19, 2015 20:44

Yudo: May 15, 2003

Richard M Nixon posted:

I did like that it was just c++ run through ICC to get it up and running instead of all the hoops to jump through with Cuda.

This was the big sell of Phi from the get go, it just hasn't quite delivered yet. As others have mentioned, it is not a fully realized product.

# ? Oct 20, 2015 08:20

Durinia: Sep 26, 2014; The Mad Computer Scientist

Yudo posted:

This was the big sell of Phi from the get go, it just hasn't quite delivered yet. As others have mentioned, it is not a fully realized product.

That was mostly marketing. I mean, it would run, but to get any performance, you have to do a similar amount of tuning as on a GPU. And by virtue of their market, "just running" isn't really of much value.

# ? Oct 20, 2015 16:37

Proud Christian Mom: Dec 20, 2006; READING COMPREHENSION IS HARD

it seemed like the Phi was Intel just going, 'well, we did this thing thats kinda cool, have at it?'

# ? Oct 20, 2015 18:46

Khorne: May 1, 2002

Richard M Nixon posted:

I can't imagine what market they were targeting where you'd need huge parallelism but your dataset was insignificant in size.

Phi, and gpu computing in general, is pretty decent for molecular dynamics simulations. It's likely pretty decent for lots of other physics simulations too. The datasets are small in size because you have a small number of floats, let's say three, for each particle. And then you have some number of particles, and it fits pretty easily in far less than 8gb of memory. It's also trivially parallelized with no relevant branching.

Khorne fucked around with this message at 19:19 on Oct 20, 2015

# ? Oct 20, 2015 18:59

pmchem: Jan 22, 2010

Khorne posted:

Phi, and gpu computing in general, is pretty decent for molecular dynamics simulations. It's likely pretty decent for lots of other physics simulations too. The datasets are small in size because you have a small number of floats, let's say three, for each particle. And then you have some number of particles, and it fits pretty easily in far less than 8gb of memory. It's also trivially parallelized with no relevant branching.

Performance difference between GPU and first/second-gen Phi products on MD is enormous due in part to memory bandwidth issues in Phi. NREL got a petascale computer that had about half its flops in phi, and it went extremely under-utilized on the phi side for at least the first year because the porting/performance was so ugly for gromacs and LAMMPS.

# ? Oct 20, 2015 23:20

sincx: Jul 13, 2012; furiously masturbating to anime titties

sincx fucked around with this message at 05:55 on Mar 23, 2021

# ? Oct 21, 2015 04:26

Anime Schoolgirl: Nov 28, 2002

sincx posted:

When can I justify upgrading my 2600k at 4.3 GHz (1.25V)? With Skylake's lackluster IPC improvements and overclocking, am I going to have to wait until Cannonlake?

One of these things must happen:

1) Whether you really, really want anything in the Skylake-era chipsets. If you're just gaming this will never be the case, unless for some reason you record 60fps 4k video and actually need the Intel 750's speed, in which case you probably have something better than a Sandy Bridge already
2) Zen comes out and is good enough to actually make Intel put L4 cache on more things (the only visible upgrade at the non-server top end), and that isn't coming until Kaby Lake at the earliest.

Anime Schoolgirl fucked around with this message at 04:31 on Oct 21, 2015

# ? Oct 21, 2015 04:29

dud root: Mar 30, 2008

sincx posted:

When can I justify upgrading my 2600k at 4.3 GHz (1.25V)? With Skylake's lackluster IPC improvements and overclocking, am I going to have to wait until Cannonlake?

I'm in the same situation. Native USB3.0 and booting from NVMe SSDs is probably going to get me to upgrade

# ? Oct 21, 2015 05:08

Combat Pretzel: Jun 23, 2004; No, seriously... what kurds?!

Anime Schoolgirl posted:

2) Zen comes out and is good enough to actually make Intel put L4 cache on more things (the only visible upgrade at the non-server top end), and that isn't coming until Kaby Lake at the earliest.

I looked up what a Zen is, I don't think we'll see the devil wear a Parka anytime soon.

# ? Oct 21, 2015 08:45

champagne posting: Apr 5, 2006; YOU ARE A BRAIN
IN A BUNKER

The ghost of processors future which will save AMD.

# ? Oct 21, 2015 09:15

Anime Schoolgirl: Nov 28, 2002

Which is funny because the only parts people will actually pay attention to are:

1) Zen's G server socket again being a solid undercutter over Intel at the same performance and power, which had been AMD's bread and butter until Bulldozer (Vishera was okay at least but nothing came after that)
2) Zen as a mobile platform (Carrizo is them being competent and actually successfully making something for the ultralight segment, but they'll need to put graphics on said low-watt Zen chips first before OEMs consider them which isn't halfway into 2017 at the earliest :negative:

)

# ? Oct 21, 2015 11:32

Gwaihir: Dec 8, 2009; Hair Elf

sincx posted:

When can I justify upgrading my 2600k at 4.3 GHz (1.25V)? With Skylake's lackluster IPC improvements and overclocking, am I going to have to wait until Cannonlake?

dud root posted:

I'm in the same situation. Native USB3.0 and booting from NVMe SSDs is probably going to get me to upgrade

4.3ghz isn't very high for a 2600k, and a Skylake chip running at 4.4 -4.6 is certainly going to be faster.... But it won't be faster enough for you to really care since you probably don't do anything really CPU limited unless you're making videos or 3d rendering at home.

Chipset motherboard stuff like Native USB3/3.1 etc and NVMe boot SSD support is a way better reason for most people to upgrade at this point.

# ? Oct 21, 2015 16:01

Ika: Dec 30, 2004; Pure insanity

Richard M Nixon posted:

...I can't imagine what market they were targeting where you'd need huge parallelism but your dataset was insignificant in size...

I tried to get ahold of one for work - we often have sparse matrices which are either <8gb size or can be trivially split into segments of less than 8gb, and most sparse matrix routines are trivial to parallelize.

# ? Oct 21, 2015 19:57

Grundulum: Feb 28, 2006

My work machine has an i7-3930k in it; 6 physical cores with two-way hyperthreading, so 12 logical cores. When I try to run jobs that use more than one core, I get the distinct impression that I'm being throttled due to thermal considerations. That leads me to two questions:

(1) Where can I find the core-by-core breakdown of maximum speeds for this processor, assuming just the stock cooling setup?
(2) If I buy one of the coolers recommended in the PC parts thread, will it fit both this processor and Skylake/beyond? I intend to replace this machine eventually, and a cooler seems like it ought to be reusable.

# ? Oct 22, 2015 01:52

Don Lapre: Mar 28, 2001; If you're having problems you're either holding the phone wrong or you have tiny girl hands.

use realtemp. It will show you every core temp

# ? Oct 22, 2015 02:05

pmchem: Jan 22, 2010

Grundulum posted:

My work machine has an i7-3930k in it; 6 physical cores with two-way hyperthreading, so 12 logical cores. When I try to run jobs that use more than one core, I get the distinct impression that I'm being throttled due to thermal considerations. That leads me to two questions:

(1) Where can I find the core-by-core breakdown of maximum speeds for this processor, assuming just the stock cooling setup?
(2) If I buy one of the coolers recommended in the PC parts thread, will it fit both this processor and Skylake/beyond? I intend to replace this machine eventually, and a cooler seems like it ought to be reusable.

re: (1)
https://en.wikipedia.org/wiki/Intel_Turbo_Boost
http://www.intel.com/support/processors/corei7/sb/CS-032279.htm

The cooler shouldn't help.

# ? Oct 22, 2015 02:19

Aquila: Jan 24, 2003

Grundulum posted:

My work machine has an i7-3930k in it; 6 physical cores with two-way hyperthreading, so 12 logical cores. When I try to run jobs that use more than one core, I get the distinct impression that I'm being throttled due to thermal considerations. That leads me to two questions:

(1) Where can I find the core-by-core breakdown of maximum speeds for this processor, assuming just the stock cooling setup?
(2) If I buy one of the coolers recommended in the PC parts thread, will it fit both this processor and Skylake/beyond? I intend to replace this machine eventually, and a cooler seems like it ought to be reusable.

Check the bios (especially if it's an actual workstation type system from a big vendor like Dell or HP) for performance and thermal profiles.

Just in case, if it's running linux there are some grub parameters you can set as well, including one which can steal 30% of your performance if its used with hyperthreading.

Also consider something like realtemp or another monitoring tool. Realtemp shows current clockspeed/multiplier along with temps for each core.

# ? Oct 22, 2015 05:46

Richard M Nixon: Apr 26, 2009; "The greatest honor history can bestow is the title of peacemaker."

Grundulum posted:

My work machine has an i7-3930k in it; 6 physical cores with two-way hyperthreading, so 12 logical cores. When I try to run jobs that use more than one core, I get the distinct impression that I'm being throttled due to thermal considerations. That leads me to two questions:

(1) Where can I find the core-by-core breakdown of maximum speeds for this processor, assuming just the stock cooling setup?
(2) If I buy one of the coolers recommended in the PC parts thread, will it fit both this processor and Skylake/beyond? I intend to replace this machine eventually, and a cooler seems like it ought to be reusable.

Get your thermal data and cut a ticket to IT asking them to delid your cpu and fix the thermal paste.

# ? Oct 22, 2015 19:37

sincx: Jul 13, 2012; furiously masturbating to anime titties

sincx fucked around with this message at 05:55 on Mar 23, 2021

# ? Oct 22, 2015 19:41

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Richard M Nixon posted:

Get your thermal data and cut a ticket to IT asking them to delid your cpu and fix the thermal paste.

This. Let your IT guys know, and let them do their job.

# ? Oct 22, 2015 19:51

Anime Schoolgirl: Nov 28, 2002

sincx posted:

The -E processors are still all soldered to the heatspreader.

Yeah, delidding isn't going to do anything but rip the chip apart i'm afraid.

# ? Oct 22, 2015 19:52

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Richard M Nixon posted:

Get your thermal data and cut a ticket to IT asking them to delid your cpu and fix the thermal paste.

If someone sent me a ticket like that I'd pretty much tell them to go gently caress themselves in the most enterprisey, politically correct language I could muster while I reassigned the ticket to L1 helpless desk.

# ? Oct 23, 2015 00:12

Grundulum: Feb 28, 2006

Richard M Nixon posted:

Get your thermal data and cut a ticket to IT asking them to delid your cpu and fix the thermal paste.

Given that we have no IT staff to do things like this, I think I would be better off buying a CPU cooler and putting it on myself.

I would also have to ask my hypothetical IT staff to interpret said thermal data and determine if I really am heat-throttled. I didn't realize that current and voltage also acted to limit the turbo boost, and assumed that it was based solely off generated heat.

# ? Oct 23, 2015 10:39

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 21:01

Watermelon Daiquiri: Jul 10, 2010; I TRIED TO BAIT THE TXPOL THREAD WITH THE WORLD'S WORST POSSIBLE TAKE AND ALL I GOT WAS THIS STUPID AVATAR.

Grundulum posted:

Given that we have no IT staff to do things like this, I think I would be better off buying a CPU cooler and putting it on myself.

I would also have to ask my hypothetical IT staff to interpret said thermal data and determine if I really am heat-throttled. I didn't realize that current and voltage also acted to limit the turbo boost, and assumed that it was based solely off generated heat.

Well, it is.

Heat is just excess energy being given off in thermal form. For all electrical systems, the power used is given by the formula Power = Voltage * Current, and while some of that power usage is used to run the processor, alot of that is wasted in the form of thermal energy.

Power is energy per time, voltage is energy per charge, and current is charge per time, so voltage * current is energy per time!

Watermelon Daiquiri fucked around with this message at 17:50 on Oct 23, 2015

# ? Oct 23, 2015 17:27

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Intel: lol

«‹›740 »