|
Boiled Water posted:This is not a metric you even bother looking at when designing servers that run 24/7. Perfromance per watt is still king and this is unlikely to change. depends if you work is done in such a way that having 24 chickens is better than 2 oxen certain servers would benefit greatly.
|
# ? Oct 13, 2015 18:34 |
|
|
# ? Apr 27, 2024 14:40 |
|
BobHoward posted:Depending on delivering the equivalent of QPI in the gen 1 product would not have been smart. You just have the best way of phrasing things.
|
# ? Oct 13, 2015 18:41 |
|
Oh, poo poo! B&H Photo actually shipped my 5775c. I didn't think they'd ever actually get it in. Guess I'm riding that 128mb cache train after all.
|
# ? Oct 13, 2015 23:16 |
|
Skandranon posted:They also don't need to soundly defeat Intel in any specific performance metric, as they can also seriously compete on cost/core. Counter to intuition, cost per core is not a metric that sells many servers. Also, given equivalent throughput, almost all workloads are better off on fewer/stronger cores than flocks of chickens, even when there's gazillions of threads to run. These are among the many reasons why ARM server chips have yet to make a serious run at any part of Intel's server share. (Price insensitivity is a serious problem for Qualcomm even if they are trying for an IO monster. When you look at the costs of building out and operating a data center, a couple thou for the CPU in each box isn't very significant. Especially if the servers run proprietary software with per core/system licensing fees, which frequently dwarf the hardware costs and have to be paid annually.)
|
# ? Oct 13, 2015 23:46 |
|
BobHoward posted:Counter to intuition, cost per core is not a metric that sells many servers. Also, given equivalent throughput, almost all workloads are better off on fewer/stronger cores than flocks of chickens, even when there's gazillions of threads to run. These are among the many reasons why ARM server chips have yet to make a serious run at any part of Intel's server share. Nicely said, and a lot of those points were touched on in discussion at realworldtech that I posted in the scientific computing thread a while ago: http://forums.somethingawful.com/showthread.php?threadid=3359430&userid=0&perpage=40&pagenumber=58#post448943993 Particularly this post by Torvalds touching on the price issue: http://www.realworldtech.com/forum/?threadid=151731&curpostid=152022
|
# ? Oct 14, 2015 00:20 |
|
pmchem posted:Nicely said, and a lot of those points were touched on in discussion at realworldtech that I posted in the scientific computing thread a while ago: Jesus, I'm pretty bullish on ARM (in HPC even), but that guy is on some kind of .
|
# ? Oct 14, 2015 14:45 |
|
Potato Salad posted:More competition in the compute market would be awesome, but God help me if I have to start supporting multiple architectures or start staving off devs who want to build their cloud stack on ARM.
|
# ? Oct 14, 2015 15:24 |
|
I think the ARM players like Qualcomm think they have a big performance-per-watt advantage. They had better, because trying to crack that market on the basis of upfront cost only is a non-starter.
|
# ? Oct 14, 2015 15:33 |
|
Intel also has some Atom based microservers if that is something you really, really needed
|
# ? Oct 14, 2015 15:55 |
|
Rastor posted:I think the ARM players like Qualcomm think they have a big performance-per-watt advantage. They had better, because trying to crack that market on the basis of upfront cost only is a non-starter. Xeon-D shows that Intel just hadn't made performance per watt at the cost of single threaded speed a priority yet, but they are capable of it.
|
# ? Oct 14, 2015 16:14 |
|
Any ETA on the desktop Skylake Xeons, maybe November or probably not until 2016?
|
# ? Oct 14, 2015 16:26 |
|
WhyteRyce posted:Intel also has some Atom based microservers if that is something you really, really needed Microservers (defined by me here as "power-efficient chips that have low TDP") are largely a joke. You get completely destroyed by infrastructure costs of scaling to the larger number of sockets you need to reach the same performance. Some OEMs have tried to address it by denser packaging, but they're mostly research projects/demos. "Performance per Watt" is certainly the goal for everyone here. I think the thing to remember is that "performance" is not a single metric. Whenever you see performance (raw or per watt) numbers, the follow up questions should always be "performance on what application?" Qualcomm definitely wants to win on performance per Watt - the question is more about which application they're targeting to do that in. Because they won't be able to do it for all of them.
|
# ? Oct 14, 2015 16:52 |
|
Tigerdirect has i7-6700k /w msi z170 pcmate board for $429. Great deal http://www.tigerdirect.com/applications/searchtools/item-Details.asp?EdpNo=9836488&sku=M69-10308
|
# ? Oct 14, 2015 18:04 |
|
sadus posted:Any ETA on the desktop Skylake Xeons, maybe November or probably not until 2016? Broadwell 2011v3 isn't even out yet.
|
# ? Oct 14, 2015 18:04 |
|
Is there a good virtualization story for ARM yet? VMWare or similar? Because lol imo at real adoption in the data center without that stuff being mature and robust.
|
# ? Oct 14, 2015 18:12 |
|
Subjunctive posted:Is there a good virtualization story for ARM yet? VMWare or similar? Because lol imo at real adoption in the data center without that stuff being mature and robust.
|
# ? Oct 14, 2015 18:22 |
|
necrobobsledder posted:I can't wait for the ARM v. x86 ISA bugs in my datacenter and having to install ARM tools alongside x86 everywhere. Haha welcome to my life
|
# ? Oct 14, 2015 19:16 |
|
May I ask, what bugs exactly?
|
# ? Oct 14, 2015 20:48 |
|
Subjunctive posted:Is there a good virtualization story for ARM yet? VMWare or similar? Because lol imo at real adoption in the data center without that stuff being mature and robust. Granny-virtualizing; not double-lxc containerizing like you should be.
|
# ? Oct 14, 2015 20:51 |
|
ARM vs. Intel: an early look at AMD's Hierofalcon chip http://dresdenboy.blogspot.com/2015/10/amds-arm-based-hierofalcon-soc-sighted.html This is much smaller than the 24-core (and more) chip Qualcomm demonstrated, it's an 8-core low-watt (30 watts and less) chip aimed at the embedded market. It also comes out much sooner, possibly before 2016. Performance/watt benchmarks show double the performance/watt of Intel on some benchmarks -- but half on others.
|
# ? Oct 15, 2015 13:22 |
|
Is that the i5-2400S in those benchmarks? Yeah ok, I'm sure it'll be pretty successful competing with a 5 year old CPU
|
# ? Oct 15, 2015 19:45 |
|
Rastor posted:ARM vs. Intel: Hahahaha, benching against a chip from 2011 and still losing in most of the metrics sure is good!
|
# ? Oct 15, 2015 19:55 |
|
Gwaihir posted:Hahahaha, benching against a chip from 2011 and still losing in most of the metrics sure is good! The desperation is palpable.
|
# ? Oct 15, 2015 20:14 |
|
Nintendo Kid posted:The desperation is palpable. It's the embedded market.
|
# ? Oct 15, 2015 20:22 |
|
A Bad King posted:It's the embedded market. Why not bench it against an embedded Intel chip?
|
# ? Oct 15, 2015 20:33 |
|
Boiled Water posted:Why not bench it against an embedded Intel chip? They did! The Ghost of Boiled Water posted:Why not bench it against a recent embedded Intel chip? Better question.
|
# ? Oct 15, 2015 20:38 |
|
unixbench? im holding out for the dhrystone numbers
|
# ? Oct 15, 2015 20:41 |
|
Durinia posted:They did! Yea, uh lol the more I look at those benches. The i5-2400S isn't even an embedded chip really, it's just the better binned low TDP version of the i5-2500k we all know and love. Especially since the whole low power/perf per watt optimized segment has exploded since the Sandy Bridge chip they chose. Xeon-D at 45w TDP, multiple fast i5/i7 quads at 25 or 35w or the Atom C2758 at 20w.
|
# ? Oct 15, 2015 21:33 |
|
PCjr sidecar posted:unixbench? im holding out for the dhrystone numbers lmbench or gtfo
|
# ? Oct 16, 2015 00:29 |
|
Subjunctive posted:lmbench or gtfo Did Tridge reverse engineer that too?
|
# ? Oct 16, 2015 00:39 |
|
PCjr sidecar posted:Did Tridge reverse engineer that too? Doubt it, it was always open afair. Larry was in his peak Linux-crushes-proprietary-Unix benchmark cheerleading phase then, I believe.
|
# ? Oct 16, 2015 00:41 |
|
I have a dumb question about the MIC architecture (Xeon Phi). Are these devices like a single 60-odd core CPU, in that each core can operate independently, or are they closer to a GPU in that all cores execute the same instruction, just on different data? I see Phis called SIMD, which suggests the latter, but in that case I can't understand why they're different from GPGPUs. When I try to search for this on Google, all I get are benchmark tests.
|
# ? Oct 16, 2015 05:17 |
|
Grundulum posted:I have a dumb question about the MIC architecture (Xeon Phi). All the cores are independent.
|
# ? Oct 16, 2015 05:40 |
|
Grundulum posted:I have a dumb question about the MIC architecture (Xeon Phi). What Mr. Gravitas Shortfall said, but furthermore each of those 60 cores has a very wide SIMD execution unit. (512 bits, capable of doing sixteen 32-bit FP calculations in parallel.) Here's a paper describing the first generation attempt at what became Xeon Phi: http://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/readings/abrash09_lrbni.pdf Back then it was intended to be a special type of x86 that could be used as a software GPU. The line of thought was something like this: - Even the most programmable of GPUs punt on running code with any kind of branching, or single threaded code. They can do it, but it's not a good idea. They're like freight trains: they run on rails, they're not maneuverable at all, you don't want to stop and start a lot, you'd better have a lot of cargo to haul, but holy poo poo the cargo capacity and efficiency when fully loaded at speed. (For GPUs and CPUs, the efficiency metrics are FLOPS/watt and FLOPS/mm^2 of silicon.) - High end x86 CPUs are way inefficient by comparison. They're more like F1 cars: awesome acceleration and cornering, terrible at hauling cargo. It is a car analogy. - Brilliant idea! Why not take an older in-order x86 CPU core (relatively low overhead for handling branching), amortize its overhead even more by bolting on a super wide SIMD unit, and put a giant array of 60+ of these things on a chip? Intel hoped this would result in something that could get a lot closer to GPU computational efficiency (close enough to become a consumer GPU product) without totally sacrificing the ability to run general purpose code. Long story short, it didn't work out that way. Intel never launched the GPU version of Larrabee. But it got a second life as Xeon Phi in the HPC market since lots of HPC customers like writing code for an array of x86 CPUs better than targeting GPUs. (Optimizing for GPU is still quite hard, even with NVidia's CUDA.)
|
# ? Oct 16, 2015 07:27 |
|
This is awesome. You're awesome. Thank you both for the explanation.
|
# ? Oct 16, 2015 11:00 |
|
BobHoward posted:But it got a second life as Xeon Phi in the HPC market since lots of HPC customers like writing code for an array of x86 CPUs better than targeting GPUs. (Optimizing for GPU is still quite hard, even with NVidia's CUDA.)
|
# ? Oct 16, 2015 13:02 |
|
Wasn't there a guy either in this thread or another one in SH/SC that wound up buying a Xeon Phi and running some benchmarks for his workloads at home?
|
# ? Oct 16, 2015 13:13 |
|
necrobobsledder posted:Wasn't there a guy either in this thread or another one in SH/SC that wound up buying a Xeon Phi and running some benchmarks for his workloads at home? That was "No Gravitas" if I remember correctly. It was a cool series of posts.
|
# ? Oct 16, 2015 14:37 |
|
Mr Chips posted:One the things they were touting is that you could easily recompile your existing x86 MPI code to run on the Phi. Did that end up being worth doing? Not really. The memory is weird and small, non-optimized performance is embarrassing, and off-card communication is bad. If you rewrite or restructure your MPI app to address these issues you get mediocre floating-point performance. Mostly useful as a development platform, as optimizing to run well on Phi usually improves performance on regular Xeons, and understanding what node-level KNL will look like. They're more interesting since the $200 fire sale.
|
# ? Oct 16, 2015 15:17 |
|
|
# ? Apr 27, 2024 14:40 |
|
PCjr sidecar posted:Not really. The memory is weird and small, non-optimized performance is embarrassing, and off-card communication is bad. If you rewrite or restructure your MPI app to address these issues you get mediocre floating-point performance. Mostly useful as a development platform, as optimizing to run well on Phi usually improves performance on regular Xeons, and understanding what node-level KNL will look like. Yeah, you can "compile and go", but you'll get complete rear end-level performance. KNF and KNC were mostly experiments. KNL is being pushed by Intel as the first real focused HPC implementation as a product. Of course, they also said that about KNC, so...
|
# ? Oct 16, 2015 17:32 |