Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Shaocaholica
Oct 29, 2002

Fig. 5E
You mean like scientific audio? Or prototyping for audio hardware? I can't imagine studio audio requiring an FPGA and the associated programming costs.

Adbot
ADBOT LOVES YOU

coffeetable
Feb 5, 2006

TELL ME AGAIN HOW GREAT BRITAIN WOULD BE IF IT WAS RULED BY THE MERCILESS JACKBOOT OF PRINCE CHARLES

YES I DO TALK TO PLANTS ACTUALLY

Shaocaholica posted:

So what are some real world latency sensitive applications?
Network routing, but those applications have had custom silicon for a long time now. Other than that, the big money maker is algorithmic finance.

movax
Aug 30, 2008

Shaocaholica posted:

So what are some real world latency sensitive applications?

Hardware-in-the-loop simulators; you can throw in our hardware to accurately model/simulate/collect data for high-performance control applications (MRI, jet engines, hybrid power, vehicle dynamics, etc). User can write custom HDL or export from MATLAB to XSG blocks or similar and prototype control solutions at multi-MHz rates. You can have up to 32 CPUs ready to crunch data / service real-time tasks with QPI interconnects between themselves and the FPGAs (though QPI was after I left, it was all PCIe before that which is still really fast).

GokieKS
Dec 15, 2012

Mostly Harmless.

Shaocaholica posted:

You mean like scientific audio? Or prototyping for audio hardware? I can't imagine studio audio requiring an FPGA and the associated programming costs.

Oh, your question was referring to Intel FPGA thing. That's really designed more for data center applications, and in those scenarios, it's really more of an overall performance situation than one specific application being latency sensitive. For example, there's studies that show that users are only willing to wait X seconds for a website to load before moving on to a different one, and because not all the factors associated with how fast that page loads are under the control of the website host/owner, they have to optimize the ones that they do control as much as possible. So in the case of Bing, this means being able to complete the process of taking the user input, running it through their search algorithm, run database queries, and return the results in the best order, as fast as possible.

Panty Saluter
Jan 17, 2004

Making learning fun!

GokieKS posted:

It depends on how much latency you're talking about, but almost anything to do with real-time audio, for starters.

Pro Tools native with 32 24/48 tracks and RTAS reverb, EQ, compression, and delay :mrgw:

Ignoarints
Nov 26, 2010

Don Lapre posted:

Tigerdirect just cancelled my order stating lack of availability.

?? They didn't even delay, just cancel?




KillHour posted:

Nah, that build cost him like $400. His case didn't even come with a Diablotek power supply!

http://cnj.craigslist.org/sys/4513401320.html

This bastard wants $300 for a computer with a loving floppy disk drive.






samsung digital hard drive im sold

Assepoester
Jul 18, 2004
Probation
Can't post for 10 years!
Melman v2
There should be a thread for craigslist computer listings

Don Lapre
Mar 28, 2001

If you're having problems you're either holding the phone wrong or you have tiny girl hands.
I wanna play

http://clarksville.craigslist.org/sys/4468857004.html

Gaming Rig (INCOMPLETE) Onboard Graphics - $500
This rig is a hardcore gaming PC made to withstand the fullest of capacities. My GTX 780 became defective while warrenty expired, should've bought extended. I do not have the $200 more dollars to replace the graphics. The Specs are listed below. The PC in AS IS condition is worth roughly $700 to $800. Will send pictures if asked via text.

I am willing to trade! Looking for a new electric guitar and amp of equal value.

CPU: AMD Phenom x6 1090T 3.2 GHz
GRAPHICS: Onboard HD Radeon 3000
RAM: G.Skill Ripjaws 4x2 8GB 1333hz
HDD: 1TB Toshiba 7200 RPM HDD
MOBO: Gigabyte LMT87-USB Micro-ATX
HDD: 1TB Toshiba 7200 RPM HDD
PSU: ThermalTake TR2 "Black Widow" 850 Watt Semi-Modular
SHELL: NZXT Vulcan Micro-ATX mid tower
OS: Windows 7 Ultimate

Proof of pricing:
http://pcpartpicker.com/p/3IS6y

Sidesaddle Cavalry
Mar 15, 2013

Oh Boy Desert Map

Don Lapre posted:

My GTX 780 became defective while warrenty expired, should've bought extended.
I shouldn't be the one to say this, but we have a winner for Most Stupid and Needlessly Dishonest Person of the Day.

Endymion FRS MK1
Oct 29, 2011

I don't know what this thing is, and I don't care. I'm just tired of seeing your stupid newbie av from 2011.
I feel like I'm doing something wrong, I put my one year old rig on SA Mart a few months ago for like a 10% markdown :confused:

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.

Endymion FRS MK1 posted:

I feel like I'm doing something wrong, I put my one year old rig on SA Mart a few months ago for like a 10% markdown :confused:

You made the mistake of selling into a market containing knowledgeable people and treating your fellow man with respect. That's no way to hustle.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Rastor posted:

Here's Charlie Demerjian's ramblings about why Intel would add that feature. Charlie's often a bit loopy but sometimes he comes up with some interesting thoughts.

Sorry, but Charlie Demerjian should not be taken seriously and this article is a great example why. He's clumsily trying to connect dots he doesn't understand so that they spell out doom for Intel, which is one of his objects of irrational hatred.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

movax posted:

Think FPGAs will still have a higher development cost/curve than paying guys to write OpenCL/CUDA kernels to run on GPUs, though I believe some FPGA vendors have voodoo OpenCL toolkits that can turn your kernels into RTL to implement on the chip.

The Xilinx one is called Vivado HLS (high level synthesis) and IIRC it attempts to support legal ANSI C, not just OpenCL. I can't say I've ever used it, so I don't have a good feel for the actual limitations (obviously they don't support truly arbitrary code, the whole of the standard C library, etc), but I got a chance to listen to one of the Xilinx HLS engineers talk about it and chatted with him after the talk. It's fascinating stuff.

Ignoarints
Nov 26, 2010

Endymion FRS MK1 posted:

I feel like I'm doing something wrong, I put my one year old rig on SA Mart a few months ago for like a 10% markdown :confused:

Yeah if you want a 10% markdown on something a year old head straight to craigslist

Shaocaholica
Oct 29, 2002

Fig. 5E
I've always gotten pretty good deals on CL with some haggling and not going for the douchey listings. Still funny though.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

BobHoward posted:

Sorry, but Charlie Demerjian should not be taken seriously and this article is a great example why. He's clumsily trying to connect dots he doesn't understand so that they spell out doom for Intel, which is one of his objects of irrational hatred.

Yeah Charlie is entertaining but ultimately generally off the mark and anyone who takes him seriously is dumb

Like he has sources sure but being a conduit anonymous sources is not exactly hard

His analysis at leasts the free ones are not particularly illuminating

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice
When SemiAccurate was open-access you could at least read all of his reasons why he had a crazy opinion so he could occasionally come up with something thought-provoking. His pieces about Apple's future with ARM vs Intel and nVidia's plans to be an x86/ARM CPU designer were roundly mocked but Apple Swift and nVidia Denver were real. Nowadays though you just see a clickbait headline, word-salad intro paragraph, and a prompt to pay $1000 to read what Charlie thinks. It's stupid.

Ninja Rope
Oct 22, 2005

Wee.
How much faster the FPGA is for RSA and DHE are what's going to determine how large the market is for these. If it's a significant improvement over current CPUs then everyone with a website will be buying them.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Ninja Rope posted:

How much faster the FPGA is for RSA and DHE are what's going to determine how large the market is for these. If it's a significant improvement over current CPUs then everyone with a website will be buying them.

You are gonna see people buying crypto accelerators before using fpgas because tailor made asics are better if the thing you are doing is common

atomicthumbs
Dec 26, 2010


We're in the business of extending man's senses.
The only thing I don't like about the Maximus VII Gene is that apparently the same PCI Express lines are shared by the mPCIe slot, the M.2 slot, and one of the x1 slots. Why are they all there if we can only use one?!

Edit:

Factory Factory posted:

Intel announced a new server SoC yesterday. It's the Xeon D, which marries Xeon E5 x86 cores with a cache-coherent FPGA on the same package.

Microsoft played with add-in card FPGA acceleration for its Bing servers and found that they got a 10x performance increase for practically no extra power draw on accelerated algorithms. Intel estimates that QPI-connected access to system RAM and the x86 cores' cache will double again that speedup. Probably little coincidence that Microsoft is rolling out FPGA-accelerated Bing en masse next year, and speculation it'll be ported to this chip rather than stay with add-in cards.

Intel's blog post on the subject.

I want one just so I an load an M68000 core onto the FPGA and natively run the classic Mac OS. :getin:

Shaocaholica
Oct 29, 2002

Fig. 5E
So have FPGAs gotten much faster, bigger, etc over the last decade?

karoshi
Nov 4, 2008

"Can somebody mspaint eyes on the steaming packages? TIA" yeah well fuck you too buddy, this is the best you're gonna get. Is this even "work-safe"? Let's find out!

Shaocaholica posted:

So have FPGAs gotten much faster, bigger, etc over the last decade?

Yes of course. FPGAs are ridiculously easy to go wide. Altera recently released a new one with FP32 ALUs, instead of the typical integer ALUs, for a nominal 10 TFlops.

Ninja Rope
Oct 22, 2005

Wee.

Malcolm XML posted:

You are gonna see people buying crypto accelerators before using fpgas because tailor made asics are better if the thing you are doing is common

Sure, but the current ASIC solutions are pretty terrible. If this is something already built into your CPU by Intel I'd imagine it would see better support and growth than current solutions. Even if it only acts as a kick in the butt to current hardware crypto vendors it will be welcome.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

BobHoward posted:

The Xilinx one is called Vivado HLS (high level synthesis) and IIRC it attempts to support legal ANSI C, not just OpenCL. I can't say I've ever used it, so I don't have a good feel for the actual limitations (obviously they don't support truly arbitrary code, the whole of the standard C library, etc), but I got a chance to listen to one of the Xilinx HLS engineers talk about it and chatted with him after the talk. It's fascinating stuff.
I've worked on compilers and cottage industry DSLs for converting imperative code to digital logic and my general summation is - you will have some really suboptimal logic in the end thinking imperatively instead of simultaneously and in parallel like a CLP program. To convert imperative code with loops and such into something that could be optimized by a parallel style programming language like Verilog or VHDL or... Matlab even in a sense is what developers in the auto-vectorization featureset of compilers for icc, llvm, and gcc have been working on with (at best) ok performance and exploitation of the most obvious codepaths. The current logic in most compilers is serial processing is the default and the compiler has to figure out parallelism while with parallel languages you assume parallel first and the compiler tries to figure out how to order them correctly to get the result you want.

Beyond the parallel v. serial issue is that of just digital logic idiosyncracies that general purpose programming never encounters. One of the simpler examples of why you shouldn't treat HDLs like you would a regular imperative C (even system C) program is with a case statement:

code:
switch (selector) {
case CASE1: outbuf <= inputA;
break;
case CASE2: outbuf <= inputB;
}
vs.

code:
switch (selector) {
case CASE1: outbuf <= inputA;
break;
case CASE2: outbuf <= inputB;
break;
default:
outbuf <= inputA;
}
The only difference is a default caseThe former creates a (inferred) latch, the later segment creates a flip flop, which is what you want. Unless you've been doing some tricky stuff, most programmers at least operate on discrete cycles and have no control over what happens in the middle as part of the language convention.

Ninja Rope posted:

How much faster the FPGA is for RSA and DHE are what's going to determine how large the market is for these. If it's a significant improvement over current CPUs then everyone with a website will be buying them.
People already buy SSL accelerators for servers for a couple hundred to a few thousand USD that perform the https connections to lower CPU load on them and save power by having these functions moved out to a hardware solution beforehand and with some proxying done in their architecture to isolate the crypto away from the general I/O intensive work of OLTP stuff in the middleware. Those have been around since like '98 or so I believe. It's all about cost effectiveness though and if Intel ships some workable crypto solutions with the hetereogeneous compute units, they might have a chance in the market. I'm curious about the I/O bandwidth and latency from the FPGA to the CPU and how they're related in the cache hierarchy before I write this off though.

Shaocaholica
Oct 29, 2002

Fig. 5E
Basic question here but in order to use an FPGA attached to a CPU like the Intel solution, you first have to write a x86 program that feeds the FPGA its program and data right? Does the CPU need to be fast in this regard?

How do you debug FPGA programs? In an emulator?

movax
Aug 30, 2008

Shaocaholica posted:

Basic question here but in order to use an FPGA attached to a CPU like the Intel solution, you first have to write a x86 program that feeds the FPGA its program and data right? Does the CPU need to be fast in this regard?

How do you debug FPGA programs? In an emulator?

You use a simulator such as Cadence Incisive, Mentor ModelSim, Aldec Active-HDL, etc to simulate your FPGA logic with stimuli you drive from a testbench. Simulators like what I listed start at the low five figures and can get up to six figures depending on the feature sets you need (mixed language, etc). With simulators though, the developer needs to know what they are doing to write HDL that is both synthesizeable and simulate-able for accurate results. You can write logic that is only simulateable and not actually synthesizable for real hardware, and you can also write logic that behaves differently in hardware vs. simulation if you don't take into account the vendor implementation of the event queue, which can make life difficult.

And yes, the downside of a lot of CPU off-load applications like GPGPU, FPGA acceleration, etc is that you pay a price in latency to control the other device and get it its data. Any HPC application worth its salt will have DMA so that the accelerator can talk to memory itself without a CPU managing it, though that can still be latency as now you must go from accelerator -> memory controller -> memory over whatever interconnect (PCI Express, QPI, etc). QPI lowers latency compared to PCI Express.

Ideally, your CPU spends a few cycles setting up the accelerator with some DMA descriptors and performing configuration, and then it is hands off from there.

I'd imagine that in the new Xeon solution, the FPGA appears similar to QPI devices to the host OS, namely sitting (from a software POV) on PCI Bus 0xFF, with a few BARs available for applications.

movax fucked around with this message at 02:18 on Jun 22, 2014

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice

atomicthumbs posted:

The only thing I don't like about the Maximus VII Gene is that apparently the same PCI Express lines are shared by the mPCIe slot, the M.2 slot, and one of the x1 slots. Why are they all there if we can only use one?!
Check out this page from the Haswell Refresh review about FlexIO, it has more information about the ports available from the chipset and what you can do with them. Motherboard manufacturers have a limited number of ports to work with without resorting to crappy third-party chipsets, Intel doesn't want to provide more because that's why LGA-2011 exists, and there's a 2GB/sec bus between the CPU and chipset. Personally I really wish Intel would have used the 9-series chipsets to add PCI-Express 3.0 to the chipset, bumping the DMI bus to the CPU up to 4GB/sec, but that didn't happen.

Alereon fucked around with this message at 05:14 on Jun 22, 2014

One Eye Open
Sep 19, 2006
Am I awake?

movax posted:

You use a simulator such as Cadence Incisive, Mentor ModelSim, Aldec Active-HDL, etc to simulate your FPGA logic with stimuli you drive from a testbench. Simulators like what I listed start at the low five figures and can get up to six figures depending on the feature sets you need (mixed language, etc).

Although if you're learning, the free version of Altera Quartus II comes with ModelSim Starter Edition which can do small designs.

Ardlen
Sep 30, 2005
WoT



After simulation, you can use something like Xilinx Chipscope to probe the actual signals on the device.

mobby_6kl
Aug 9, 2009

by Fluffdaddy
Pretty interesting stuff. Any suggestions for books or online classes to get started messing with this?

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

movax posted:

You use a simulator such as Cadence Incisive, Mentor ModelSim, Aldec Active-HDL, etc to simulate your FPGA logic with stimuli you drive from a testbench. Simulators like what I listed start at the low five figures and can get up to six figures depending on the feature sets you need (mixed language, etc). With simulators though, the developer needs to know what they are doing to write HDL that is both synthesizeable and simulate-able for accurate results. You can write logic that is only simulateable and not actually synthesizable for real hardware, and you can also write logic that behaves differently in hardware vs. simulation if you don't take into account the vendor implementation of the event queue, which can make life difficult.

And yes, the downside of a lot of CPU off-load applications like GPGPU, FPGA acceleration, etc is that you pay a price in latency to control the other device and get it its data. Any HPC application worth its salt will have DMA so that the accelerator can talk to memory itself without a CPU managing it, though that can still be latency as now you must go from accelerator -> memory controller -> memory over whatever interconnect (PCI Express, QPI, etc). QPI lowers latency compared to PCI Express.

Ideally, your CPU spends a few cycles setting up the accelerator with some DMA descriptors and performing configuration, and then it is hands off from there.

I'd imagine that in the new Xeon solution, the FPGA appears similar to QPI devices to the host OS, namely sitting (from a software POV) on PCI Bus 0xFF, with a few BARs available for applications.

Yes actually writing VHDL/verilog is a giant pain in the rear end and the tools are so loving terrible compared to software. I really wish someone would come in w/ an fpga that had an open bitstream format and allow open tools, but that ain't happening.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
That someone would be Intel in theory. If they get enough developers working on heterogeneous computing in the traditional software community, we'll gosh I think I might have some use for what I did ten years ago that's become relevant again. Sure beats trying to basically recreate mainframes with all this cloud BS that works so terribly in practice unless you're at one of the top 5 Internet giants and even they screw up rather often.

karoshi
Nov 4, 2008

"Can somebody mspaint eyes on the steaming packages? TIA" yeah well fuck you too buddy, this is the best you're gonna get. Is this even "work-safe"? Let's find out!

movax posted:

You use a simulator such as Cadence Incisive, Mentor ModelSim, Aldec Active-HDL, etc to simulate your FPGA logic with stimuli you drive from a testbench. Simulators like what I listed start at the low five figures and can get up to six figures depending on the feature sets you need (mixed language, etc). With simulators though, the developer needs to know what they are doing to write HDL that is both synthesizeable and simulate-able for accurate results. You can write logic that is only simulateable and not actually synthesizable for real hardware, and you can also write logic that behaves differently in hardware vs. simulation if you don't take into account the vendor implementation of the event queue, which can make life difficult.

And yes, the downside of a lot of CPU off-load applications like GPGPU, FPGA acceleration, etc is that you pay a price in latency to control the other device and get it its data. Any HPC application worth its salt will have DMA so that the accelerator can talk to memory itself without a CPU managing it, though that can still be latency as now you must go from accelerator -> memory controller -> memory over whatever interconnect (PCI Express, QPI, etc). QPI lowers latency compared to PCI Express.

Ideally, your CPU spends a few cycles setting up the accelerator with some DMA descriptors and performing configuration, and then it is hands off from there.

I'd imagine that in the new Xeon solution, the FPGA appears similar to QPI devices to the host OS, namely sitting (from a software POV) on PCI Bus 0xFF, with a few BARs available for applications.

Being cache coherent and unified memory would open the door to even lower latency, with zero copy and no need for DMA. Much like an APU and all the flim-flam from the HSA AMD circus.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

karoshi posted:

Being cache coherent and unified memory would open the door to even lower latency, with zero copy and no need for DMA. Much like an APU and all the flim-flam from the HSA AMD circus.

Let me tell you about zynq!

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
Intel detailed the next-gen Xeon Phi, Knight's Landing. Instead of wee custom Pentiums, the new version runs a mob of Silvermont Atom cores. As promised, it can be socketed as a peer to a buff Xeon. By using Silvermont instead of old Pentiums, it's ISA compatible with those buff Xeons, too.

Most interestingly, it comes with 16 GB of on-die memory based on Micron's stacked cube whatever tech. It's supposed to be 5x more bandwidth than DDR4 in 1/3 the die space.

Rime
Nov 2, 2011

by Games Forum

Factory Factory posted:

Most interestingly, it comes with 16 GB of on-die memory based on Micron's stacked cube whatever tech. It's supposed to be 5x more bandwidth than DDR4 in 1/3 the die space.

Why even bother with DDR4. :psypop:

Shaocaholica
Oct 29, 2002

Fig. 5E
gently caress, bring on socketed ram and tower coolers for them. Sticks always sucked for bandwidth/cooling.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Factory Factory posted:

[url=http://vr-zone.com/articles/intel-unveils-knights-landing/79686.html]As promised, it can be socketed as a peer to a buff Xeon.

Nope. Take a look at:


It has no QPI. It's socketed, but none of the Intel docs say anything about multiple KL or KL-Xeon cohosting.

quote:

ISA compatible

Well, you can run your existing binaries but you probably don't have any code compiled for AVX-512 (which you'll need to use to get anywhere near 3 TF.)

quote:

on-die


Nope, on package.

Rime posted:

Why even bother with DDR4. :psypop:

If your data can fit in 16 GB, there's not a real reason to.

Shaocaholica posted:

gently caress, bring on socketed ram and tower coolers for them. Sticks always sucked for bandwidth/cooling.

off-package is too far away for stacked RAM

in a well actually fucked around with this message at 03:41 on Jun 26, 2014

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
I always gently caress up on-die/on package, especially when I think "I better use on-die/on-package correctly."

But no cohosting? I distinctly remember that being talked up about a six months ago as an expected feature of the first Phi's successor.

Adbot
ADBOT LOVES YOU

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Factory Factory posted:

I always gently caress up on-die/on package, especially when I think "I better use on-die/on-package correctly."

But no cohosting? I distinctly remember that being talked up about a six months ago as an expected feature of the first Phi's successor.

Yeah, the normally excellent realworldtech put out an article where they summarized the leaks available, and then took one assumption (KL and Skylake-EX must share infrastructure) that lead them afield (therefore it must have QPI (and cohosting), therefore a big LLC)

It's possible that cohosting is a possibility, but other than realworldtech (or others citing them) I've not seen anything to indicate it.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply