Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
No Gravitas
Jun 12, 2013

by FactsAreUseless

JawnV6 posted:

Yeah, true. Still, you're essentially counting off 100ms chunks, so even with a poll interval of 16ms you're not too far off? 100ms seems like eons, how much jitter can you tolerate there?

C# has spoiled me, I have nice DataReceived events that act enough like interrupts that I wasn't thinking about the USB device not having that capability.

100ms is eons.

USB does have an interrupt mode for quick responses, seems pretty speedy. Again, a microcontroller can do this for you, although not as neatly as going via USB-UART. You can measure the latencies you get and do a poll after 100ms - latency. This should get you there, repeat the poll if you are too early. Maybe poll such that you should be 2ms late.

Could also plug your SYNC into the microphone jack and try to do stuff with that, I guess. Then there is PS2, if you have the luxury...

Adbot
ADBOT LOVES YOU

No Gravitas
Jun 12, 2013

by FactsAreUseless
My Xeon Phi finally arrived.

Gotta love how they packaged it. The packaging is for 4 Xeons, but only one slot was filled. I guess they really expect you to buy in bulk. No instructions of any kind included, just the bare unit in an antistatic bag.

I'm busy today. Tomorrow I will try to run it, my mighty Noctua NF-R8 PWM fan providing the cooling.

Chuu
Sep 11, 2004

Grimey Drawer

No Gravitas posted:

My Xeon Phi finally arrived.

Gotta love how they packaged it. The packaging is for 4 Xeons, but only one slot was filled. I guess they really expect you to buy in bulk. No instructions of any kind included, just the bare unit in an antistatic bag.

I'm busy today. Tomorrow I will try to run it, my mighty Noctua NF-R8 PWM fan providing the cooling.

Is this the 31S1P? I've been meaning to get on that promo, but I couldn't figure out how to keep it cool. How exactly do you have your cooling setup?

No Gravitas
Jun 12, 2013

by FactsAreUseless

Chuu posted:

Is this the 31S1P? I've been meaning to get on that promo, but I couldn't figure out how to keep it cool. How exactly do you have your cooling setup?

Yup, 31S1P.

I will have the fan strapped directly to the card.

I looked at the datasheets. You don't need that much airflow if your air is room temperature, as opposed to having datacenter quality air to work with. This requirement also applies for cooling a device going at full-clip, which I won't be doing. Integer code only most of the time.

I have just a tiny bit of headroom with the fan I have, even though the fan is twice as tall as the card and thus I'm only counting on getting half of the airflow that I'd be getting otherwise. I went with a radiator/CPU cooler fan. Those have pretty decent air pressures, something that should help push the air through the Phi.

I will keep you guys posted on how that works out, keeping in mind that I'm not touching the vector units.

Chuu
Sep 11, 2004

Grimey Drawer

No Gravitas posted:

Yup, 31S1P.

I will have the fan strapped directly to the card.

I looked at the datasheets. You don't need that much airflow if your air is room temperature, as opposed to having datacenter quality air to work with. This requirement also applies for cooling a device going at full-clip, which I won't be doing. Integer code only most of the time.

I have just a tiny bit of headroom with the fan I have, even though the fan is twice as tall as the card and thus I'm only counting on getting half of the airflow that I'd be getting otherwise. I went with a radiator/CPU cooler fan. Those have pretty decent air pressures, something that should help push the air through the Phi.

I will keep you guys posted on how that works out, keeping in mind that I'm not touching the vector units.

I'd love a picture if possible. I though you'd need a blower setup considering max TDP is 220W.

That being said, the first thing I was going to do was build the Intel MKL BLAS library and see what performance gains I could get in R. That would probably get it near max TDP.

No Gravitas
Jun 12, 2013

by FactsAreUseless

Chuu posted:

I'd love a picture if possible. I though you'd need a blower setup considering max TDP is 220W.

That being said, the first thing I was going to do was build the Intel MKL BLAS library and see what performance gains I could get in R. That would probably get it near max TDP.

270W, actually. 300W has been cited in some places too.

Considering I only have GCC to work with here, I won't be building any BLAS stuff. Coremark, dhrystone, etc... If you have any binaries to send me, I'm happy to run them and watch the temperatures. I'm not the usual Xeon Phi customer, you see.

For me it was a choice: Buy a second computer for 600-700$ or the Phi, a fan and a new power supply for 300$ total, with my work paying for part of it. (I also get to play with the Phi and get experience rigging up crazy cooling systems.) The performance of both of those options is about equal on my integer-only load without any vector instructions.

There will be pictures.

JawnV6
Jul 4, 2004

So hot ...

No Gravitas posted:

There will be pictures.

With an IR camera I hope

Chuu
Sep 11, 2004

Grimey Drawer

No Gravitas posted:

270W, actually. 300W has been cited in some places too.

Considering I only have GCC to work with here, I won't be building any BLAS stuff. Coremark, dhrystone, etc... If you have any binaries to send me, I'm happy to run them and watch the temperatures. I'm not the usual Xeon Phi customer, you see.

For me it was a choice: Buy a second computer for 600-700$ or the Phi, a fan and a new power supply for 300$ total, with my work paying for part of it. (I also get to play with the Phi and get experience rigging up crazy cooling systems.) The performance of both of those options is about equal on my integer-only load without any vector instructions.

There will be pictures.

Since you mentioned GCC, can I assume linux?

I don't have any binaries handy, for me step 1 was to get the card, step 2 was to figure out how to build the BLAS libraries. That being said, linking R vs. a custom BLAS library is really trivial in Linux -- there are a lot of tutorials out there on how to do it, and most linux distros have several BLAS libraries in the default repo to download. Given my experience with Intel libs I was hoping it would be easy to build MKL targeting a Xeon Phi, and then just use the generated object files.

No Gravitas
Jun 12, 2013

by FactsAreUseless

Chuu posted:

Since you mentioned GCC, can I assume linux?

I don't have any binaries handy, for me step 1 was to get the card, step 2 was to figure out how to build the BLAS libraries. That being said, linking R vs. a custom BLAS library is really trivial in Linux -- there are a lot of tutorials out there on how to do it, and most linux distros have several BLAS libraries in the default repo to download. Given my experience with Intel libs I was hoping it would be easy to build MKL targeting a Xeon Phi, and then just use the generated object files.

I'm running Arch Linux, but the host computer is irrelevant. The co-processor runs Linux on an architecture that is slightly something like the original Pentium, except 64 bit and with a seriously insanely great vector unit bolted to it. Oh, and it is a barrel processor too, so you need 2 threads per hardware core for full utilization. Binaries have to be custom compiled for this, you cannot just throw x86 code onto the hardware. Both GCC and the Intel compiler can do this. Buuuut...

You need the Intel compiler for the vector instructions in the Phi. I don't have one, and I don't intend to get one as my use case does not call for vector instructions. With GCC you can only compile x87 floating point code and integer code. I really doubt this will light up the Phi.

I'm doing this because it is a way-outside-of-the-box solution to my problem which just happened to be cheaper and more interesting. It also is insane, which fits my usual way of life.

If you get me ready-made binaries, I will gladly run them and let you know how my cooling solution holds up.

No Gravitas fucked around with this message at 04:34 on Dec 10, 2014

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

No Gravitas posted:

Oh, and it is a barrel processor too, so you need 2 threads per hardware core for full utilization.

It's 4 threads per core to minimize cache miss stalls, if you're using 2 threads/core you might want to try bumping up your thread count.

No Gravitas
Jun 12, 2013

by FactsAreUseless
Well, it did not work! Even mad science has failures somehow.

I figured out why too: I fell into the trap of marketing.

My chosen fan does 33+ CFM. I need 18. My fan gives me 1.5 mmH2O of pressure, more than enough according to the datasheet. I'm fine then, yes?

No.

This fan can do 33CFM and 1.5 mmH2O in ideal conditions. However those are two very different circumstances which will never happen at the same time.

See, pushing air through a 30cm radiator block isn't ideal. CFM drops waaaay, waaaay down when it meets opposing pressure. Fans are advertised with CFM which they can get at no pressure loss and at pressure which prevents all air flow. You aren't getting both the CFM and the pressure. The actual performance will have less than ideal CFM at less than ideal pressure. What is advertised is just two points along a curve specific to each fan. You need that curve to know what the hell is going on at your pressure drop.

Xeon pressure and airflow requirements are in the datasheet. Different airflows have different pressure drops too. It really is easier to keep the beast cool with cold air as compared to 45C, but a single Noctua won't do it. For my case I use only half of the fan too, so that does not help.

It does make a difference to have a fan or not, even when the Phi is on idle. 3 minutes to a thermal shutdown vs 12 is a big step forward.

Tomorrow I duct my 75CFM, 2000RPM, 120mm case fan straight into the Phi as a push fan, with the Noctua doing pull. Considering even the Noctua at half-fan is an improvement, 2000RPM ducted right through this should do better. And I'm keeping the Noctua around to help, it won't hurt. I don't need the case fan anyway, it is very airy.

And when it comes to motherboards: Mine supported the Phi without any issues. Supermicro X10SLM-F, I believe.

I'm enjoying this adventure so far. Wonder where it will take me...

Chuu
Sep 11, 2004

Grimey Drawer
I don't know much about how fans work, but if you're breaking out the duct tape, would just duct taping one of those PCI-Slot blower fans onto the end work better than trying to shoehorn a normal case fan into moving air through a shroud?

redstormpopcorn
Jun 10, 2007
Aurora Master
Find someone to 3D-print a 120mm fan duct set and I will send you two slightly-used Ultra Kazes to push-pull on that fucker.

atomicthumbs
Dec 26, 2010


We're in the business of extending man's senses.

No Gravitas posted:

Well, it did not work! Even mad science has failures somehow.

I figured out why too: I fell into the trap of marketing.

My chosen fan does 33+ CFM. I need 18. My fan gives me 1.5 mmH2O of pressure, more than enough according to the datasheet. I'm fine then, yes?

No.

This fan can do 33CFM and 1.5 mmH2O in ideal conditions. However those are two very different circumstances which will never happen at the same time.

See, pushing air through a 30cm radiator block isn't ideal. CFM drops waaaay, waaaay down when it meets opposing pressure. Fans are advertised with CFM which they can get at no pressure loss and at pressure which prevents all air flow. You aren't getting both the CFM and the pressure. The actual performance will have less than ideal CFM at less than ideal pressure. What is advertised is just two points along a curve specific to each fan. You need that curve to know what the hell is going on at your pressure drop.

Xeon pressure and airflow requirements are in the datasheet. Different airflows have different pressure drops too. It really is easier to keep the beast cool with cold air as compared to 45C, but a single Noctua won't do it. For my case I use only half of the fan too, so that does not help.

It does make a difference to have a fan or not, even when the Phi is on idle. 3 minutes to a thermal shutdown vs 12 is a big step forward.

Tomorrow I duct my 75CFM, 2000RPM, 120mm case fan straight into the Phi as a push fan, with the Noctua doing pull. Considering even the Noctua at half-fan is an improvement, 2000RPM ducted right through this should do better. And I'm keeping the Noctua around to help, it won't hurt. I don't need the case fan anyway, it is very airy.

And when it comes to motherboards: Mine supported the Phi without any issues. Supermicro X10SLM-F, I believe.

I'm enjoying this adventure so far. Wonder where it will take me...

Christ, just buy a lovely loud Delta on ebay or something

Krailor
Nov 2, 2001
I'm only pretending to care
Taco Defender
I'm with Chuu, instead of trying to duct a case fan in there you should look at trying to attach a blower to the front. Something like this: http://www.xoxide.com/evercool-fox2.html

Just take the PCI bracket off and figure out some way to attach it.

No Gravitas
Jun 12, 2013

by FactsAreUseless

redstormpopcorn posted:

Find someone to 3D-print a 120mm fan duct set and I will send you two slightly-used Ultra Kazes to push-pull on that fucker.

Two words: Card stock.

Krailor posted:

I'm with Chuu, instead of trying to duct a case fan in there you should look at trying to attach a blower to the front. Something like this: http://www.xoxide.com/evercool-fox2.html

Just take the PCI bracket off and figure out some way to attach it.

Good backup position. I'm trying with the fan because I already have it and I'm not in the mood to go to the store until I go through the less sane options first.

SlayVus
Jul 10, 2009
Grimey Drawer
No Gravitas

Go with a Noctua NF-A14 iPPC or Noctua NF-F12 iPPC fans. Industrial fans, some are made with 4-pin PWM, with have a high static pressure. Pressure is what you want to push air through radiators and the like.

http://www.newegg.com/Product/Product.aspx?Item=N82E16835608052 - 7,63 mm H2O pressure max @ 3k

http://www.newegg.com/Product/Product.aspx?Item=N82E16835608051 - 3,94 mm H2O pressure max @ 2k

SlayVus fucked around with this message at 08:23 on Dec 11, 2014

future ghost
Dec 5, 2005

:byetankie:
Gun Saliva

SlayVus posted:

No Gravitas

Go with a Noctua NF-A14 iPPC or Noctua NF-F12 iPPC fans. Industrial fans, some are made with 4-pin PWM, with have a high static pressure. Pressure is what you want to push air through radiators and the like.
This but 2x Sanyo Denki H1011's on a controller.

Or just say gently caress everything (your ears mainly) and get a Delta GFB1212VHW.

Methylethylaldehyde
Oct 23, 2004

BAKA BAKA

cisco privilege posted:

This but 2x Sanyo Denki H1011's on a controller.

Or just say gently caress everything (your ears mainly) and get a Delta GFB1212VHW.

The TFB1212GHE looks like a better fan, higher airflow at almost twice the static pressure. And at 6 DB louder, a real steal for persistent tinnitus.

KillHour
Oct 28, 2007


Methylethylaldehyde posted:

The TFB1212GHE looks like a better fan, higher airflow at almost twice the static pressure. And at 6 DB louder, a real steal for persistent tinnitus.

Double the pressure, 4 times the noise!

Mofabio
May 15, 2003
(y - mx)*(1/(inf))*(PV/RT)*(2.718)*(V/I)
Yeah fans have flow curves. You can add fans in series to theoretically double your pressure. You can get closer to the 2x ideal the further spread out they are, since that helps damp radial pressure fluctuations, so the fans fight less. Obviously don't spread it out too much - more distance also equals more pressure loss from the ducting.

Good luck!

SlayVus
Jul 10, 2009
Grimey Drawer

KillHour posted:

Double the pressure, 4 times the noise!

Just go all the way at this point. 5' fan with step down duct work to push all the air through a 80mm hole.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

SlayVus posted:

Just go all the way at this point. 5' fan with step down duct work to push all the air through a 80mm hole.

All the way is a 3M novec fluid tank.

No Gravitas
Jun 12, 2013

by FactsAreUseless
Given a great exhaust fan and a lovely intake fan with a duct, I have the drat beast stable at 88C on idle, at least for now. I might flip the fans around a bit, see if that makes a difference.

Oh, yeah. It idles at 120W. Yeah... There are low power modes, but not on by default. Next goal for me, I think.

Clearly, I need a more powerful fan on the intake side, but things are looking mighty good considering where I was a few days back.

Now to start having fun with it...

EDIT: Low power mode on. Yup, now it idles at 68C and 60W. Yeah, don't want to use it full-throttle like this without a great intake fan. Only the CPU is hot, the rest of the circuit board idles at 50C at most, usually around 40C.

pre:
mic0 (info):
   Device Series: ........... Intel(R) Xeon Phi(TM) coprocessor x100 family
   Device ID: ............... 0x225e
   Number of Cores: ......... 57
   OS Version: .............. 2.6.38.8+mpss3.4.1
   Flash Version: ........... 2.1.02.0381
   Driver Version: .......... 3.4.1-1
   Stepping: ................ 0x3
   Substepping: ............. 0x0

mic0 (temp):
   Cpu Temp: ................ 69.00 C
   Memory Temp: ............. 44.00 C
   Fan-In Temp: ............. 31.00 C
   Fan-Out Temp: ............ 45.00 C
   Core Rail Temp: .......... 42.00 C
   Uncore Rail Temp: ........ 39.00 C
   Memory Rail Temp: ........ 39.00 C

mic0 (freq):
   Core Frequency: .......... 1.10 GHz
   Total Power: ............. 58.00 Watts
   Low Power Limit: ......... 283.00 Watts
   High Power Limit: ........ 337.00 Watts
   Physical Power Limit: .... 357.00 Watts
EDIT2: The very first "benchmark" is in!
bogomips : 2206.63
(Yes, I know this means nothing.)

Next up dhrystone and coremark. But that is tomorrow, I guess.

EDIT3:
Dhrystone is here! Done with gcc, latest on Arch Linux for the host and the one that Intel gives you for the Phi.

Xeon Phi 31S1P:
Dhrystones per Second: 714285.7

Xeon E3 1226 v3:
Dhrystones per Second: 14925373.0

Take whatever integer based-program result you have, divide by 21, get the Phi speed. (Yes, I made sure it isn't throttling.)

Now to turn on compiler optimizations.

Xeon Phi 31S1P:
Dhrystones per Second: 746268.7

Xeon E3 1226 v3:
Dhrystones per Second: 41666668.0

Ummm... WHAT?

I'm so lucky that I have 57 CPUs, because things now run 55 times slower. Looks like my E3 1226 and the Phi are about the same power on dhrystone when you take optimizations into account. Then the host has 3 more cores and all the Phi has to offer at that point is multi-threading. I expect the Phi to maybe give me 3/4 of the power that the host CPU gives me. Maybe a bit more.

This is poo poo of the highest caliber. I'm not sure if the version of gcc Intel gives you cannot optimize for poo poo (this isn't about just the extra instructions, but also about instruction scheduling, evidence of which I don't see in the disassembly [but I'm pretty tired and it is 2am here]) or if the Phi is just hard to optimize for. Either way: Meh.

One fan, some card stock... About 200$ cost to me, total, one future extra fan purchase included. Considering the Phi will likely perform about as well as a second Xeon E3 1226 v3... Eh, fair enough of a deal. I saved on a motherboard, RAM, case, my job paid for the new power supply... Not bad. I'll take it.

You cannot beat the fun factor of setting it up and playing with it though.


EDIT4: Memory bound benchmark: Stream. Optimizations on.

pre:
Phi:
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            1352.9     0.118420     0.118266     0.118489
Scale:            990.9     0.161569     0.161463     0.161766
Add:             1264.2     0.189998     0.189849     0.190197
Triad:           1107.1     0.216889     0.216785     0.217044

Host:
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           13263.9     0.012128     0.012063     0.012372
Scale:          13400.3     0.011981     0.011940     0.012097
Add:            14874.5     0.016169     0.016135     0.016326
Triad:          14729.3     0.016395     0.016294     0.016647
Not bad. Wonder why Scale ran so much slower, but eh... Time to sleep.

No Gravitas fucked around with this message at 11:29 on Dec 15, 2014

r0ck0
Sep 12, 2004
r0ck0s p0zt m0d3rn lyf
Sounds like you need to buy 2 or 3 more phizes.

No Gravitas
Jun 12, 2013

by FactsAreUseless
Switched the fans around, the good fan being the push fan now with the wimpy fan on exhaust. Much better. A better fan will arrive in a week, but for now I guess I'm set for some trial calculations.

Also ran a C50 classification, just to get a feel for real-world performance. 20 times slower on the Phi. Perfectly acceptable if this is the case on my software.

Yay!

...

Well, I'm sure no one cares, so I will shut up about my Phi now. Yay!

r0ck0
Sep 12, 2004
r0ck0s p0zt m0d3rn lyf
Why are you happy with 20 times slower? What is the advantage?

FormatAmerica
Jun 3, 2005
Grimey Drawer

r0ck0 posted:

Why are you happy with 20 times slower? What is the advantage?

It might burn your house/apartment/workplace down when the index cards fan ducts catch fire :laugh:

cycleback
Dec 3, 2004
The secret to creativity is knowing how to hide your sources
Have you read of anyone using the Phi with and i7-5820k on and X99 motherboard. I have a problem that is trivially parallelizable that might work well with the Phi though I am wary of getting bogged down with it.

No Gravitas
Jun 12, 2013

by FactsAreUseless

r0ck0 posted:

Why are you happy with 20 times slower? What is the advantage?

The scaling.

My host has 4 cores, one thread each. If the Phi was going on a single core it would be 80x slower than the host, assuming the host scales perfectly.

But the Phi has 57 cores, each with 4 hardware threads. My problem scales wonderfully and with 200 instances in parallel I get about 180x speedup over a single instance. Suddenly I'm 2.25 times as fast as the host, and I can still use the host. And I still have a few cores left on the Phi in case I want to do something else...

All this for the cost of two fans and the Phi, which I got on a severe discount. Truly a steal compared to buying two computers and a bit.

Oh, and the fun of running the beast. And the joy of being able to put it on my resume (and I honestly do desperately want to work for Intel! I love them!). And the DIY factor.

FormatAmerica posted:

It might burn your house/apartment/workplace down when the index cards fan ducts catch fire :laugh:

It won't. The CPU runs hot, sure. Nothing else is even remotely hot in there though. The paper bit touches the intake end, which is kinda a long rail of heatsinks without any components that generate heat. The temperatures there don't exceed 40C. They might reach more once I get better airflow, but I'm not worried about paper combusting at those temperatures. (EDIT: I'm running the air in reverse, so the Phi's intake is my exhaust and vice versa. It works better that way.)

cycleback posted:

Have you read of anyone using the Phi with and i7-5820k on and X99 motherboard. I have a problem that is trivially parallelizable that might work well with the Phi though I am wary of getting bogged down with it.

The Phi is finnicky. You need great airflow or at least good and cold airflow, a motherboard that can support it, a good power supply, some means of physically supporting the weight in case you have a tower computer and some people say that a proper CPU too, although I think it should run fine with any. You also probably want to run Linux on the host and will be running Linux on the Phi. I hope your host Linux is one of the two supported ones, or you are in for a treat trying to get your Phi to work. I had to do some trivial kernel-module hacking to have the mic module to work. Your computer will likely require some modding to provide cooling, I had to cut some holes for the fan to blow through nicely. A hammer needed to dent some things to make a fan fit into a place which is not supposed to have a fan. Then you need to recompile everything you want to run, and if you aren't running vectorized code then you aren't getting the most from your Phi. For vectorized code you need to get the Intel compiler. Then your instances must all fit under 8GB both in RAM and on "disk" because the Phi has only a ramdisk and no swap.

Not for the faint of heart. I was worried even having a Xeon CPU and a server motherboard. I'm still not out of the woods on the cooling, although I'm pretty close. I can run full-scale experiments now. A single batch jumps me from 60C to 97C, just barely before the beast throttles. My load is minimally impacted by throttling, but it is a mental barrier I don't want to cross.

The sale also is not as good as it used to be. No more 80$ Phis around that I can see.

Also an educational note.
The Phi in "ready" state. This might make you think it enters a low power mode. Nope. It spins at pretty high power. For low power you want to boot it up into an OS and make sure you have enabled sleep modes. And then leave it alone. Any action will push it up to 110+W immediately (this includes micsmc commands!). Leave the beast be and it will only take 60W.

No Gravitas fucked around with this message at 08:07 on Dec 16, 2014

sincx
Jul 13, 2012

furiously masturbating to anime titties
.

sincx fucked around with this message at 05:55 on Mar 23, 2021

r0ck0
Sep 12, 2004
r0ck0s p0zt m0d3rn lyf

sincx posted:

How does the Phi compare to using graphics card processing solution? I.e. with a GTX 970 or Radeon HD 290?

It gets double the frame rates in crysis and far cry 4, so its got that going for it at least.

Chuu
Sep 11, 2004

Grimey Drawer

sincx posted:

How does the Phi compare to using graphics card processing solution? I.e. with a GTX 970 or Radeon HD 290?

I was trying to get an intel rep to answer how MKL compares to cuBLAS (Phi vs. CUDA BLAS implementation) if you just want to use it as a 3rd party library to plug into something like R, and I just could not get an answer out of them, which leads me to suspect not favorably. This was before Knight's Landing was shipping though, so it might be a different story now.

That being said, you can get a 31S1P for $200 which is 10% of MSRP. Good CUDA cards arn't anywhere near that ballpark.

No Gravitas: I've been super busy and will see if I can get you some sort binary that natively targets the Phi, but it's not looking very likely. I'm actually really surprised Intel doesn't have a LINPACK binary that you can just download off their site.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Chuu posted:

No Gravitas: I've been super busy and will see if I can get you some sort binary that natively targets the Phi, but it's not looking very likely. I'm actually really surprised Intel doesn't have a LINPACK binary that you can just download off their site.

It's in recent versions of the Intel MKL, under the bench directory.

No Gravitas posted:

Then your instances must all fit under 8GB both in RAM and on "disk" because the Phi has only a ramdisk and no swap.

You can get the data off of the host via NFS, but it's not particularly fast. You can also NFS-root boot the phi to get the ramdisk size down.

Re: your stream numbers, http://www.cs.virginia.edu/stream/stream_mail/2013/0015.html

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

sincx posted:

How does the Phi compare to using graphics card processing solution? I.e. with a GTX 970 or Radeon HD 290?

It's great for $200, mediocre for $2000. Better at dual precision than most high end consumer cards, assuming you can use it effectively.

Chuu posted:

I was trying to get an intel rep to answer how MKL compares to cuBLAS (Phi vs. CUDA BLAS implementation) if you just want to use it as a 3rd party library to plug into something like R, and I just could not get an answer out of them, which leads me to suspect not favorably. This was before Knight's Landing was shipping though, so it might be a different story now.

KC is out, KL is early 2016. On KC, offload MKL BLAS is generally slower than equivalent Tesla CUDA BLAS, but for some ops and problem sizes it works well.

Chuu
Sep 11, 2004

Grimey Drawer
Thanks for the info sidecar.

I'd love to see how Gravitas' cooling setup handles LINPACK. And also please post pictures!

No Gravitas
Jun 12, 2013

by FactsAreUseless

Chuu posted:

No Gravitas: I've been super busy and will see if I can get you some sort binary that natively targets the Phi, but it's not looking very likely. I'm actually really surprised Intel doesn't have a LINPACK binary that you can just download off their site.

Just tell me what to point curl/wget at, and I gladly will. I'm lazy.


Yes, NFS does work too, if you can get away with the speed hit.

Stream: I was only comparing with the host and only using the compiler that I have, not trying to get a great result. Sure, nice to see what the Phi can do with a good setup and with a problem that fits. For me the only thing that could make a difference is maybe better instruction scheduling to use both the U and V pipes whenever possible. I keep meaning to look at disassemblies to see if GCC does this, but... :effort:

For 80$ the Phi is worth it for pretty much anything that runs a whole bunch of threads, if you don't mind getting it running as a fun project/adventure. Cheaper than getting a second computer. For 200$? If you have a good case for it and can use the vector units, sure. For more $ than that it is indeed mediocre. I do count the elecricity in, but I'm in :canada:, so at least that isn't much of a worry.

I'm kinda low energy lately. I need that daylight lamp to light up my eyes... So lazy without it. So :effort:.

Let me get those pictures taken, prepare to :barf:...

Mr Chips
Jun 27, 2007
Whose arse do I have to blow smoke up to get rid of this baby?
Any chance you could run MrBayes in MPI mode on the Xeon and the Phi for comparison?
I can help out with setting up a test workload.

No Gravitas
Jun 12, 2013

by FactsAreUseless
Pictures taken. Gotta import. Sheesh. I'm really having a poo poo effort day though and probably won't get it done before tomorrow.

Mr Chips posted:

Any chance you could run MrBayes in MPI mode on the Xeon and the Phi for comparison?
I can help out with setting up a test workload.

With pleasure. EDIT: I'm on Arch Linux, 64-bit, Xeon E3-1226 v3, 16GB ECC RAM at 1600 CL11. Also 1TB of normal disk, no SSD.

Keep it mind, as is right now we are surely going to hit throttling in about three minutes unless you are using only half the chip or something. I need more fan. If this does not work, I will get still more fan. There can never be enough and the Phi is just too... eh... cool... not to use.

Adbot
ADBOT LOVES YOU

Mr Chips
Jun 27, 2007
Whose arse do I have to blow smoke up to get rid of this baby?

No Gravitas posted:

With pleasure. EDIT: I'm on Arch Linux, 64-bit, Xeon E3-1226 v3, 16GB ECC RAM at 1600 CL11. Also 1TB of normal disk, no SSD.
Keep it mind, as is right now we are surely going to hit throttling in about three minutes unless you are using only half the chip or something. I need more fan. If this does not work, I will get still more fan. There can never be enough and the Phi is just too... eh... cool... not to use.

cool, i'll try and come up with recipie for you. Presumably the Intel compiler kit for the Phi includes mpicc and mpirun? I've only ever used the openMPI toolkit for this, to build x86_64 binaries.

edit: going by this: https://software.intel.com/en-us/articles/how-to-run-intel-mpi-on-xeon-phi, it doesn't look like too much of a deviation from what I've done in the past.


edit2: Going by this it may not work well with libHMSBeagle. Might have to try an older version instead.

Mr Chips fucked around with this message at 09:34 on Dec 18, 2014

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply