Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Boiled Water posted:

This is not a metric you even bother looking at when designing servers that run 24/7. Perfromance per watt is still king and this is unlikely to change.

depends if you work is done in such a way that having 24 chickens is better than 2 oxen

certain servers would benefit greatly.

Adbot
ADBOT LOVES YOU

JawnV6
Jul 4, 2004

So hot ...

BobHoward posted:

Depending on delivering the equivalent of QPI in the gen 1 product would not have been smart.

:allears: You just have the best way of phrasing things.

Krailor
Nov 2, 2001
I'm only pretending to care
Taco Defender
Oh, poo poo! B&H Photo actually shipped my 5775c. I didn't think they'd ever actually get it in. Guess I'm riding that 128mb cache train after all.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Skandranon posted:

They also don't need to soundly defeat Intel in any specific performance metric, as they can also seriously compete on cost/core.

Counter to intuition, cost per core is not a metric that sells many servers. Also, given equivalent throughput, almost all workloads are better off on fewer/stronger cores than flocks of chickens, even when there's gazillions of threads to run. These are among the many reasons why ARM server chips have yet to make a serious run at any part of Intel's server share.

(Price insensitivity is a serious problem for Qualcomm even if they are trying for an IO monster. When you look at the costs of building out and operating a data center, a couple thou for the CPU in each box isn't very significant. Especially if the servers run proprietary software with per core/system licensing fees, which frequently dwarf the hardware costs and have to be paid annually.)

pmchem
Jan 22, 2010


BobHoward posted:

Counter to intuition, cost per core is not a metric that sells many servers. Also, given equivalent throughput, almost all workloads are better off on fewer/stronger cores than flocks of chickens, even when there's gazillions of threads to run. These are among the many reasons why ARM server chips have yet to make a serious run at any part of Intel's server share.

(Price insensitivity is a serious problem for Qualcomm even if they are trying for an IO monster. When you look at the costs of building out and operating a data center, a couple thou for the CPU in each box isn't very significant. Especially if the servers run proprietary software with per core/system licensing fees, which frequently dwarf the hardware costs and have to be paid annually.)

Nicely said, and a lot of those points were touched on in discussion at realworldtech that I posted in the scientific computing thread a while ago:
http://forums.somethingawful.com/showthread.php?threadid=3359430&userid=0&perpage=40&pagenumber=58#post448943993
Particularly this post by Torvalds touching on the price issue:
http://www.realworldtech.com/forum/?threadid=151731&curpostid=152022

Durinia
Sep 26, 2014

The Mad Computer Scientist

pmchem posted:

Nicely said, and a lot of those points were touched on in discussion at realworldtech that I posted in the scientific computing thread a while ago:
http://forums.somethingawful.com/showthread.php?threadid=3359430&userid=0&perpage=40&pagenumber=58#post448943993
Particularly this post by Torvalds touching on the price issue:
http://www.realworldtech.com/forum/?threadid=151731&curpostid=152022

Jesus, I'm pretty bullish on ARM (in HPC even), but that guy is on some kind of :catdrugs:.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

Potato Salad posted:

More competition in the compute market would be awesome, but God help me if I have to start supporting multiple architectures or start staving off devs who want to build their cloud stack on ARM.
I can't wait for the ARM v. x86 ISA bugs in my datacenter and having to install ARM tools alongside x86 everywhere.

Rastor
Jun 2, 2001

I think the ARM players like Qualcomm think they have a big performance-per-watt advantage. They had better, because trying to crack that market on the basis of upfront cost only is a non-starter.

WhyteRyce
Dec 30, 2001

Intel also has some Atom based microservers if that is something you really, really needed

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Rastor posted:

I think the ARM players like Qualcomm think they have a big performance-per-watt advantage. They had better, because trying to crack that market on the basis of upfront cost only is a non-starter.

Xeon-D shows that Intel just hadn't made performance per watt at the cost of single threaded speed a priority yet, but they are capable of it.

sadus
Apr 5, 2004

Any ETA on the desktop Skylake Xeons, maybe November or probably not until 2016?

Durinia
Sep 26, 2014

The Mad Computer Scientist

WhyteRyce posted:

Intel also has some Atom based microservers if that is something you really, really needed

Microservers (defined by me here as "power-efficient chips that have low TDP") are largely a joke. You get completely destroyed by infrastructure costs of scaling to the larger number of sockets you need to reach the same performance. Some OEMs have tried to address it by denser packaging, but they're mostly research projects/demos.

"Performance per Watt" is certainly the goal for everyone here. I think the thing to remember is that "performance" is not a single metric. Whenever you see performance (raw or per watt) numbers, the follow up questions should always be "performance on what application?"

Qualcomm definitely wants to win on performance per Watt - the question is more about which application they're targeting to do that in. Because they won't be able to do it for all of them.

Don Lapre
Mar 28, 2001

If you're having problems you're either holding the phone wrong or you have tiny girl hands.
Tigerdirect has i7-6700k /w msi z170 pcmate board for $429. Great deal

http://www.tigerdirect.com/applications/searchtools/item-Details.asp?EdpNo=9836488&sku=M69-10308

Don Lapre
Mar 28, 2001

If you're having problems you're either holding the phone wrong or you have tiny girl hands.

sadus posted:

Any ETA on the desktop Skylake Xeons, maybe November or probably not until 2016?

Broadwell 2011v3 isn't even out yet.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Is there a good virtualization story for ARM yet? VMWare or similar? Because lol imo at real adoption in the data center without that stuff being mature and robust.

Rastor
Jun 2, 2001

Subjunctive posted:

Is there a good virtualization story for ARM yet? VMWare or similar? Because lol imo at real adoption in the data center without that stuff being mature and robust.
According to the story they have linux KVM virtualization up and running, but obviously that's not the same thing as VMWare. I think a solution like this should be aiming at the really huge cloud providers, the one who buy machines by the thousands and power by the tens of megawatts, stuff that's running "an ocean of user containers" rather than dozens of database VMs for an enterprise.

Fuzzy Mammal
Aug 15, 2001

Lipstick Apathy

necrobobsledder posted:

I can't wait for the ARM v. x86 ISA bugs in my datacenter and having to install ARM tools alongside x86 everywhere.

Haha welcome to my life :smithicide:

No Gravitas
Jun 12, 2013

by FactsAreUseless
May I ask, what bugs exactly?

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Subjunctive posted:

Is there a good virtualization story for ARM yet? VMWare or similar? Because lol imo at real adoption in the data center without that stuff being mature and robust.

Granny-virtualizing; not double-lxc containerizing like you should be.

Rastor
Jun 2, 2001

ARM vs. Intel:
an early look at AMD's Hierofalcon chip
http://dresdenboy.blogspot.com/2015/10/amds-arm-based-hierofalcon-soc-sighted.html

This is much smaller than the 24-core (and more) chip Qualcomm demonstrated, it's an 8-core low-watt (30 watts and less) chip aimed at the embedded market. It also comes out much sooner, possibly before 2016.

Performance/watt benchmarks show double the performance/watt of Intel on some benchmarks -- but half on others.

mobby_6kl
Aug 9, 2009

by Fluffdaddy
Is that the i5-2400S in those benchmarks?



Yeah ok, I'm sure it'll be pretty successful competing with a 5 year old CPU :thumbsup:

Gwaihir
Dec 8, 2009
Hair Elf

Rastor posted:

ARM vs. Intel:
an early look at AMD's Hierofalcon chip
http://dresdenboy.blogspot.com/2015/10/amds-arm-based-hierofalcon-soc-sighted.html

This is much smaller than the 24-core (and more) chip Qualcomm demonstrated, it's an 8-core low-watt (30 watts and less) chip aimed at the embedded market. It also comes out much sooner, possibly before 2016.

Performance/watt benchmarks show double the performance/watt of Intel on some benchmarks -- but half on others.

Hahahaha, benching against a chip from 2011 and still losing in most of the metrics sure is good!

Nintendo Kid
Aug 4, 2011

by Smythe

Gwaihir posted:

Hahahaha, benching against a chip from 2011 and still losing in most of the metrics sure is good!

The desperation is palpable.

A Bad King
Jul 17, 2009


Suppose the oil man,
He comes to town.
And you don't lay money down.

Yet Mr. King,
He killed the thread
The other day.
Well I wonder.
Who's gonna go to Hell?

Nintendo Kid posted:

The desperation is palpable.

It's the embedded market.

champagne posting
Apr 5, 2006

YOU ARE A BRAIN
IN A BUNKER

A Bad King posted:

It's the embedded market.

Why not bench it against an embedded Intel chip?

Durinia
Sep 26, 2014

The Mad Computer Scientist

Boiled Water posted:

Why not bench it against an embedded Intel chip?

They did!

The Ghost of Boiled Water posted:

Why not bench it against a recent embedded Intel chip?

Better question.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

unixbench? im holding out for the dhrystone numbers

Gwaihir
Dec 8, 2009
Hair Elf

Durinia posted:

They did!


Better question.

Yea, uh lol the more I look at those benches. The i5-2400S isn't even an embedded chip really, it's just the better binned low TDP version of the i5-2500k we all know and love.
Especially since the whole low power/perf per watt optimized segment has exploded since the Sandy Bridge chip they chose. Xeon-D at 45w TDP, multiple fast i5/i7 quads at 25 or 35w or the Atom C2758 at 20w.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

PCjr sidecar posted:

unixbench? im holding out for the dhrystone numbers

lmbench or gtfo

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Subjunctive posted:

lmbench or gtfo

Did Tridge reverse engineer that too?

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

PCjr sidecar posted:

Did Tridge reverse engineer that too?

Doubt it, it was always open afair. Larry was in his peak Linux-crushes-proprietary-Unix benchmark cheerleading phase then, I believe.

Grundulum
Feb 28, 2006
I have a dumb question about the MIC architecture (Xeon Phi).

Are these devices like a single 60-odd core CPU, in that each core can operate independently, or are they closer to a GPU in that all cores execute the same instruction, just on different data? I see Phis called SIMD, which suggests the latter, but in that case I can't understand why they're different from GPGPUs.

When I try to search for this on Google, all I get are benchmark tests.

No Gravitas
Jun 12, 2013

by FactsAreUseless

Grundulum posted:

I have a dumb question about the MIC architecture (Xeon Phi).

Are these devices like a single 60-odd core CPU, in that each core can operate independently, or are they closer to a GPU in that all cores execute the same instruction, just on different data? I see Phis called SIMD, which suggests the latter, but in that case I can't understand why they're different from GPGPUs.

When I try to search for this on Google, all I get are benchmark tests.

All the cores are independent.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Grundulum posted:

I have a dumb question about the MIC architecture (Xeon Phi).

Are these devices like a single 60-odd core CPU, in that each core can operate independently, or are they closer to a GPU in that all cores execute the same instruction, just on different data? I see Phis called SIMD, which suggests the latter, but in that case I can't understand why they're different from GPGPUs.

When I try to search for this on Google, all I get are benchmark tests.

What Mr. Gravitas Shortfall said, but furthermore each of those 60 cores has a very wide SIMD execution unit. (512 bits, capable of doing sixteen 32-bit FP calculations in parallel.) Here's a paper describing the first generation attempt at what became Xeon Phi:

http://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/readings/abrash09_lrbni.pdf

Back then it was intended to be a special type of x86 that could be used as a software GPU. The line of thought was something like this:

- Even the most programmable of GPUs punt on running code with any kind of branching, or single threaded code. They can do it, but it's not a good idea. They're like freight trains: they run on rails, they're not maneuverable at all, you don't want to stop and start a lot, you'd better have a lot of cargo to haul, but holy poo poo the cargo capacity and efficiency when fully loaded at speed. (For GPUs and CPUs, the efficiency metrics are FLOPS/watt and FLOPS/mm^2 of silicon.)

- High end x86 CPUs are way inefficient by comparison. They're more like F1 cars: awesome acceleration and cornering, terrible at hauling cargo. It is a car analogy.

- Brilliant idea! Why not take an older in-order x86 CPU core (relatively low overhead for handling branching), amortize its overhead even more by bolting on a super wide SIMD unit, and put a giant array of 60+ of these things on a chip? Intel hoped this would result in something that could get a lot closer to GPU computational efficiency (close enough to become a consumer GPU product) without totally sacrificing the ability to run general purpose code.

Long story short, it didn't work out that way. Intel never launched the GPU version of Larrabee. But it got a second life as Xeon Phi in the HPC market since lots of HPC customers like writing code for an array of x86 CPUs better than targeting GPUs. (Optimizing for GPU is still quite hard, even with NVidia's CUDA.)

Grundulum
Feb 28, 2006
This is awesome. You're awesome. Thank you both for the explanation.

Mr Chips
Jun 27, 2007
Whose arse do I have to blow smoke up to get rid of this baby?

BobHoward posted:

But it got a second life as Xeon Phi in the HPC market since lots of HPC customers like writing code for an array of x86 CPUs better than targeting GPUs. (Optimizing for GPU is still quite hard, even with NVidia's CUDA.)
One the things they were touting is that you could easily recompile your existing x86 MPI code to run on the Phi. Did that end up being worth doing?

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Wasn't there a guy either in this thread or another one in SH/SC that wound up buying a Xeon Phi and running some benchmarks for his workloads at home?

BurritoJustice
Oct 9, 2012

necrobobsledder posted:

Wasn't there a guy either in this thread or another one in SH/SC that wound up buying a Xeon Phi and running some benchmarks for his workloads at home?

That was "No Gravitas" if I remember correctly. It was a cool series of posts.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Mr Chips posted:

One the things they were touting is that you could easily recompile your existing x86 MPI code to run on the Phi. Did that end up being worth doing?

Not really. The memory is weird and small, non-optimized performance is embarrassing, and off-card communication is bad. If you rewrite or restructure your MPI app to address these issues you get mediocre floating-point performance. Mostly useful as a development platform, as optimizing to run well on Phi usually improves performance on regular Xeons, and understanding what node-level KNL will look like.

They're more interesting since the $200 fire sale.

Adbot
ADBOT LOVES YOU

Durinia
Sep 26, 2014

The Mad Computer Scientist

PCjr sidecar posted:

Not really. The memory is weird and small, non-optimized performance is embarrassing, and off-card communication is bad. If you rewrite or restructure your MPI app to address these issues you get mediocre floating-point performance. Mostly useful as a development platform, as optimizing to run well on Phi usually improves performance on regular Xeons, and understanding what node-level KNL will look like.

They're more interesting since the $200 fire sale.

Yeah, you can "compile and go", but you'll get complete rear end-level performance.

KNF and KNC were mostly experiments. KNL is being pushed by Intel as the first real focused HPC implementation as a product.

Of course, they also said that about KNC, so...

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply