Intel: lol

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Intel: lol

Shaocaholica: Oct 29, 2002; Fig. 5E

You mean like scientific audio? Or prototyping for audio hardware? I can't imagine studio audio requiring an FPGA and the associated programming costs.

# ? Jun 20, 2014 23:53

Adbot: ADBOT LOVES YOU

# ? May 6, 2024 20:03

coffeetable: Feb 5, 2006; TELL ME AGAIN HOW GREAT BRITAIN WOULD BE IF IT WAS RULED BY THE MERCILESS JACKBOOT OF PRINCE CHARLES

YES I DO TALK TO PLANTS ACTUALLY

Shaocaholica posted:

So what are some real world latency sensitive applications?

Network routing, but those applications have had custom silicon for a long time now. Other than that, the big money maker is algorithmic finance.

# ? Jun 20, 2014 23:54

movax: Aug 30, 2008

Shaocaholica posted:

So what are some real world latency sensitive applications?

Hardware-in-the-loop simulators; you can throw in our hardware to accurately model/simulate/collect data for high-performance control applications (MRI, jet engines, hybrid power, vehicle dynamics, etc). User can write custom HDL or export from MATLAB to XSG blocks or similar and prototype control solutions at multi-MHz rates. You can have up to 32 CPUs ready to crunch data / service real-time tasks with QPI interconnects between themselves and the FPGAs (though QPI was after I left, it was all PCIe before that which is still really fast).

# ? Jun 21, 2014 00:01

GokieKS: Dec 15, 2012; Mostly Harmless.

Shaocaholica posted:

You mean like scientific audio? Or prototyping for audio hardware? I can't imagine studio audio requiring an FPGA and the associated programming costs.

Oh, your question was referring to Intel FPGA thing. That's really designed more for data center applications, and in those scenarios, it's really more of an overall performance situation than one specific application being latency sensitive. For example, there's studies that show that users are only willing to wait X seconds for a website to load before moving on to a different one, and because not all the factors associated with how fast that page loads are under the control of the website host/owner, they have to optimize the ones that they do control as much as possible. So in the case of Bing, this means being able to complete the process of taking the user input, running it through their search algorithm, run database queries, and return the results in the best order, as fast as possible.

# ? Jun 21, 2014 00:11

Panty Saluter: Jan 17, 2004; Making learning fun!

GokieKS posted:

It depends on how much latency you're talking about, but almost anything to do with real-time audio, for starters.

Pro Tools native with 32 24/48 tracks and RTAS reverb, EQ, compression, and delay :mrgw:

# ? Jun 21, 2014 01:28

Ignoarints: Nov 26, 2010

Don Lapre posted:

Tigerdirect just cancelled my order stating lack of availability.

?? They didn't even delay, just cancel?

KillHour posted:

Nah, that build cost him like $400. His case didn't even come with a Diablotek power supply!

http://cnj.craigslist.org/sys/4513401320.html

This bastard wants $300 for a computer with a loving floppy disk drive.

samsung digital hard drive im sold

# ? Jun 21, 2014 01:32

Assepoester: Jul 18, 2004; Can't post for 10 years!; Melman v2

There should be a thread for craigslist computer listings

# ? Jun 21, 2014 03:12

Don Lapre: Mar 28, 2001; If you're having problems you're either holding the phone wrong or you have tiny girl hands.

I wanna play

http://clarksville.craigslist.org/sys/4468857004.html

Gaming Rig (INCOMPLETE) Onboard Graphics - $500
This rig is a hardcore gaming PC made to withstand the fullest of capacities. My GTX 780 became defective while warrenty expired, should've bought extended. I do not have the $200 more dollars to replace the graphics. The Specs are listed below. The PC in AS IS condition is worth roughly $700 to $800. Will send pictures if asked via text.

I am willing to trade! Looking for a new electric guitar and amp of equal value.

CPU: AMD Phenom x6 1090T 3.2 GHz
GRAPHICS: Onboard HD Radeon 3000
RAM: G.Skill Ripjaws 4x2 8GB 1333hz
HDD: 1TB Toshiba 7200 RPM HDD
MOBO: Gigabyte LMT87-USB Micro-ATX
HDD: 1TB Toshiba 7200 RPM HDD
PSU: ThermalTake TR2 "Black Widow" 850 Watt Semi-Modular
SHELL: NZXT Vulcan Micro-ATX mid tower
OS: Windows 7 Ultimate

Proof of pricing:
http://pcpartpicker.com/p/3IS6y

# ? Jun 21, 2014 03:16

Sidesaddle Cavalry: Mar 15, 2013; Oh Boy Desert Map

Don Lapre posted:

My GTX 780 became defective while warrenty expired, should've bought extended.

I shouldn't be the one to say this, but we have a winner for Most Stupid and Needlessly Dishonest Person of the Day.

# ? Jun 21, 2014 03:52

Endymion FRS MK1: Oct 29, 2011; I don't know what this thing is, and I don't care. I'm just tired of seeing your stupid newbie av from 2011.

I feel like I'm doing something wrong, I put my one year old rig on SA Mart a few months ago for like a 10% markdown :confused:

# ? Jun 21, 2014 05:14

Factory Factory: Mar 19, 2010; This is what
Arcane Velocity was like.

Endymion FRS MK1 posted:

I feel like I'm doing something wrong, I put my one year old rig on SA Mart a few months ago for like a 10% markdown

You made the mistake of selling into a market containing knowledgeable people and treating your fellow man with respect. That's no way to hustle.

# ? Jun 21, 2014 05:26

BobHoward: Feb 13, 2012; The only thing white people deserve is a bullet to their empty skull

Rastor posted:

Here's Charlie Demerjian's ramblings about why Intel would add that feature. Charlie's often a bit loopy but sometimes he comes up with some interesting thoughts.

Sorry, but Charlie Demerjian should not be taken seriously and this article is a great example why. He's clumsily trying to connect dots he doesn't understand so that they spell out doom for Intel, which is one of his objects of irrational hatred.

# ? Jun 21, 2014 05:35

BobHoward: Feb 13, 2012; The only thing white people deserve is a bullet to their empty skull

movax posted:

Think FPGAs will still have a higher development cost/curve than paying guys to write OpenCL/CUDA kernels to run on GPUs, though I believe some FPGA vendors have voodoo OpenCL toolkits that can turn your kernels into RTL to implement on the chip.

The Xilinx one is called Vivado HLS (high level synthesis) and IIRC it attempts to support legal ANSI C, not just OpenCL. I can't say I've ever used it, so I don't have a good feel for the actual limitations (obviously they don't support truly arbitrary code, the whole of the standard C library, etc), but I got a chance to listen to one of the Xilinx HLS engineers talk about it and chatted with him after the talk. It's fascinating stuff.

# ? Jun 21, 2014 05:39

Ignoarints: Nov 26, 2010

Endymion FRS MK1 posted:

I feel like I'm doing something wrong, I put my one year old rig on SA Mart a few months ago for like a 10% markdown

Yeah if you want a 10% markdown on something a year old head straight to craigslist

# ? Jun 21, 2014 05:47

Shaocaholica: Oct 29, 2002; Fig. 5E

I've always gotten pretty good deals on CL with some haggling and not going for the douchey listings. Still funny though.

# ? Jun 21, 2014 16:23

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

BobHoward posted:

Sorry, but Charlie Demerjian should not be taken seriously and this article is a great example why. He's clumsily trying to connect dots he doesn't understand so that they spell out doom for Intel, which is one of his objects of irrational hatred.

Yeah Charlie is entertaining but ultimately generally off the mark and anyone who takes him seriously is dumb

Like he has sources sure but being a conduit anonymous sources is not exactly hard

His analysis at leasts the free ones are not particularly illuminating

# ? Jun 21, 2014 17:18

Alereon: Feb 6, 2004; Dehumanize yourself and face to Trumpshed; College Slice

When SemiAccurate was open-access you could at least read all of his reasons why he had a crazy opinion so he could occasionally come up with something thought-provoking. His pieces about Apple's future with ARM vs Intel and nVidia's plans to be an x86/ARM CPU designer were roundly mocked but Apple Swift and nVidia Denver were real. Nowadays though you just see a clickbait headline, word-salad intro paragraph, and a prompt to pay $1000 to read what Charlie thinks. It's stupid.

# ? Jun 21, 2014 19:02

Ninja Rope: Oct 22, 2005; Wee.

How much faster the FPGA is for RSA and DHE are what's going to determine how large the market is for these. If it's a significant improvement over current CPUs then everyone with a website will be buying them.

# ? Jun 21, 2014 21:39

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Ninja Rope posted:

How much faster the FPGA is for RSA and DHE are what's going to determine how large the market is for these. If it's a significant improvement over current CPUs then everyone with a website will be buying them.

You are gonna see people buying crypto accelerators before using fpgas because tailor made asics are better if the thing you are doing is common

# ? Jun 21, 2014 22:16

atomicthumbs: Dec 26, 2010; We're in the business of extending man's senses.

The only thing I don't like about the Maximus VII Gene is that apparently the same PCI Express lines are shared by the mPCIe slot, the M.2 slot, and one of the x1 slots. Why are they all there if we can only use one?!

Edit:

Factory Factory posted:

Intel announced a new server SoC yesterday. It's the Xeon D, which marries Xeon E5 x86 cores with a cache-coherent FPGA on the same package.

Microsoft played with add-in card FPGA acceleration for its Bing servers and found that they got a 10x performance increase for practically no extra power draw on accelerated algorithms. Intel estimates that QPI-connected access to system RAM and the x86 cores' cache will double again that speedup. Probably little coincidence that Microsoft is rolling out FPGA-accelerated Bing en masse next year, and speculation it'll be ported to this chip rather than stay with add-in cards.

Intel's blog post on the subject.

I want one just so I an load an M68000 core onto the FPGA and natively run the classic Mac OS. :getin:

# ? Jun 21, 2014 22:30

Shaocaholica: Oct 29, 2002; Fig. 5E

So have FPGAs gotten much faster, bigger, etc over the last decade?

# ? Jun 21, 2014 22:50

karoshi: Nov 4, 2008; "Can somebody mspaint eyes on the steaming packages? TIA" yeah well fuck you too buddy, this is the best you're gonna get. Is this even "work-safe"? Let's find out!

Shaocaholica posted:

So have FPGAs gotten much faster, bigger, etc over the last decade?

Yes of course. FPGAs are ridiculously easy to go wide. Altera recently released a new one with FP32 ALUs, instead of the typical integer ALUs, for a nominal 10 TFlops.

# ? Jun 21, 2014 23:19

Ninja Rope: Oct 22, 2005; Wee.

Malcolm XML posted:

You are gonna see people buying crypto accelerators before using fpgas because tailor made asics are better if the thing you are doing is common

Sure, but the current ASIC solutions are pretty terrible. If this is something already built into your CPU by Intel I'd imagine it would see better support and growth than current solutions. Even if it only acts as a kick in the butt to current hardware crypto vendors it will be welcome.

# ? Jun 21, 2014 23:45

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

BobHoward posted:

The Xilinx one is called Vivado HLS (high level synthesis) and IIRC it attempts to support legal ANSI C, not just OpenCL. I can't say I've ever used it, so I don't have a good feel for the actual limitations (obviously they don't support truly arbitrary code, the whole of the standard C library, etc), but I got a chance to listen to one of the Xilinx HLS engineers talk about it and chatted with him after the talk. It's fascinating stuff.

I've worked on compilers and cottage industry DSLs for converting imperative code to digital logic and my general summation is - you will have some really suboptimal logic in the end thinking imperatively instead of simultaneously and in parallel like a CLP program. To convert imperative code with loops and such into something that could be optimized by a parallel style programming language like Verilog or VHDL or... Matlab even in a sense is what developers in the auto-vectorization featureset of compilers for icc, llvm, and gcc have been working on with (at best) ok performance and exploitation of the most obvious codepaths. The current logic in most compilers is serial processing is the default and the compiler has to figure out parallelism while with parallel languages you assume parallel first and the compiler tries to figure out how to order them correctly to get the result you want.

Beyond the parallel v. serial issue is that of just digital logic idiosyncracies that general purpose programming never encounters. One of the simpler examples of why you shouldn't treat HDLs like you would a regular imperative C (even system C) program is with a case statement:

code:

switch (selector) {
case CASE1: outbuf <= inputA;
break;
case CASE2: outbuf <= inputB;
}

vs.

code:

switch (selector) {
case CASE1: outbuf <= inputA;
break;
case CASE2: outbuf <= inputB;
break;
default:
outbuf <= inputA;
}

The only difference is a default caseThe former creates a (inferred) latch, the later segment creates a flip flop, which is what you want. Unless you've been doing some tricky stuff, most programmers at least operate on discrete cycles and have no control over what happens in the middle as part of the language convention.

Ninja Rope posted:

How much faster the FPGA is for RSA and DHE are what's going to determine how large the market is for these. If it's a significant improvement over current CPUs then everyone with a website will be buying them.

People already buy SSL accelerators for servers for a couple hundred to a few thousand USD that perform the https connections to lower CPU load on them and save power by having these functions moved out to a hardware solution beforehand and with some proxying done in their architecture to isolate the crypto away from the general I/O intensive work of OLTP stuff in the middleware. Those have been around since like '98 or so I believe. It's all about cost effectiveness though and if Intel ships some workable crypto solutions with the hetereogeneous compute units, they might have a chance in the market. I'm curious about the I/O bandwidth and latency from the FPGA to the CPU and how they're related in the cache hierarchy before I write this off though.

# ? Jun 22, 2014 00:30

Shaocaholica: Oct 29, 2002; Fig. 5E

Basic question here but in order to use an FPGA attached to a CPU like the Intel solution, you first have to write a x86 program that feeds the FPGA its program and data right? Does the CPU need to be fast in this regard?

How do you debug FPGA programs? In an emulator?

# ? Jun 22, 2014 01:31

movax: Aug 30, 2008

Shaocaholica posted:

Basic question here but in order to use an FPGA attached to a CPU like the Intel solution, you first have to write a x86 program that feeds the FPGA its program and data right? Does the CPU need to be fast in this regard?

How do you debug FPGA programs? In an emulator?

You use a simulator such as Cadence Incisive, Mentor ModelSim, Aldec Active-HDL, etc to simulate your FPGA logic with stimuli you drive from a testbench. Simulators like what I listed start at the low five figures and can get up to six figures depending on the feature sets you need (mixed language, etc). With simulators though, the developer needs to know what they are doing to write HDL that is both synthesizeable and simulate-able for accurate results. You can write logic that is only simulateable and not actually synthesizable for real hardware, and you can also write logic that behaves differently in hardware vs. simulation if you don't take into account the vendor implementation of the event queue, which can make life difficult.

And yes, the downside of a lot of CPU off-load applications like GPGPU, FPGA acceleration, etc is that you pay a price in latency to control the other device and get it its data. Any HPC application worth its salt will have DMA so that the accelerator can talk to memory itself without a CPU managing it, though that can still be latency as now you must go from accelerator -> memory controller -> memory over whatever interconnect (PCI Express, QPI, etc). QPI lowers latency compared to PCI Express.

Ideally, your CPU spends a few cycles setting up the accelerator with some DMA descriptors and performing configuration, and then it is hands off from there.

I'd imagine that in the new Xeon solution, the FPGA appears similar to QPI devices to the host OS, namely sitting (from a software POV) on PCI Bus 0xFF, with a few BARs available for applications.

movax fucked around with this message at 02:18 on Jun 22, 2014

# ? Jun 22, 2014 02:15

Alereon: Feb 6, 2004; Dehumanize yourself and face to Trumpshed; College Slice

atomicthumbs posted:

The only thing I don't like about the Maximus VII Gene is that apparently the same PCI Express lines are shared by the mPCIe slot, the M.2 slot, and one of the x1 slots. Why are they all there if we can only use one?!

Check out this page from the Haswell Refresh review about FlexIO, it has more information about the ports available from the chipset and what you can do with them. Motherboard manufacturers have a limited number of ports to work with without resorting to crappy third-party chipsets, Intel doesn't want to provide more because that's why LGA-2011 exists, and there's a 2GB/sec bus between the CPU and chipset. Personally I really wish Intel would have used the 9-series chipsets to add PCI-Express 3.0 to the chipset, bumping the DMI bus to the CPU up to 4GB/sec, but that didn't happen.

Alereon fucked around with this message at 05:14 on Jun 22, 2014

# ? Jun 22, 2014 05:10

One Eye Open: Sep 19, 2006; Am I awake?

movax posted:

You use a simulator such as Cadence Incisive, Mentor ModelSim, Aldec Active-HDL, etc to simulate your FPGA logic with stimuli you drive from a testbench. Simulators like what I listed start at the low five figures and can get up to six figures depending on the feature sets you need (mixed language, etc).

Although if you're learning, the free version of Altera Quartus II comes with ModelSim Starter Edition which can do small designs.

# ? Jun 22, 2014 05:10

Ardlen: Sep 30, 2005; WoT

After simulation, you can use something like Xilinx Chipscope to probe the actual signals on the device.

# ? Jun 22, 2014 06:01

mobby_6kl: Aug 9, 2009; by Fluffdaddy

Pretty interesting stuff. Any suggestions for books or online classes to get started messing with this?

# ? Jun 22, 2014 10:34

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

movax posted:

You use a simulator such as Cadence Incisive, Mentor ModelSim, Aldec Active-HDL, etc to simulate your FPGA logic with stimuli you drive from a testbench. Simulators like what I listed start at the low five figures and can get up to six figures depending on the feature sets you need (mixed language, etc). With simulators though, the developer needs to know what they are doing to write HDL that is both synthesizeable and simulate-able for accurate results. You can write logic that is only simulateable and not actually synthesizable for real hardware, and you can also write logic that behaves differently in hardware vs. simulation if you don't take into account the vendor implementation of the event queue, which can make life difficult.

And yes, the downside of a lot of CPU off-load applications like GPGPU, FPGA acceleration, etc is that you pay a price in latency to control the other device and get it its data. Any HPC application worth its salt will have DMA so that the accelerator can talk to memory itself without a CPU managing it, though that can still be latency as now you must go from accelerator -> memory controller -> memory over whatever interconnect (PCI Express, QPI, etc). QPI lowers latency compared to PCI Express.

Ideally, your CPU spends a few cycles setting up the accelerator with some DMA descriptors and performing configuration, and then it is hands off from there.

I'd imagine that in the new Xeon solution, the FPGA appears similar to QPI devices to the host OS, namely sitting (from a software POV) on PCI Bus 0xFF, with a few BARs available for applications.

Yes actually writing VHDL/verilog is a giant pain in the rear end and the tools are so loving terrible compared to software. I really wish someone would come in w/ an fpga that had an open bitstream format and allow open tools, but that ain't happening.

# ? Jun 22, 2014 14:04

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

That someone would be Intel in theory. If they get enough developers working on heterogeneous computing in the traditional software community, we'll gosh I think I might have some use for what I did ten years ago that's become relevant again. Sure beats trying to basically recreate mainframes with all this cloud BS that works so terribly in practice unless you're at one of the top 5 Internet giants and even they screw up rather often.

# ? Jun 22, 2014 15:06

karoshi: Nov 4, 2008; "Can somebody mspaint eyes on the steaming packages? TIA" yeah well fuck you too buddy, this is the best you're gonna get. Is this even "work-safe"? Let's find out!

movax posted:

You use a simulator such as Cadence Incisive, Mentor ModelSim, Aldec Active-HDL, etc to simulate your FPGA logic with stimuli you drive from a testbench. Simulators like what I listed start at the low five figures and can get up to six figures depending on the feature sets you need (mixed language, etc). With simulators though, the developer needs to know what they are doing to write HDL that is both synthesizeable and simulate-able for accurate results. You can write logic that is only simulateable and not actually synthesizable for real hardware, and you can also write logic that behaves differently in hardware vs. simulation if you don't take into account the vendor implementation of the event queue, which can make life difficult.

And yes, the downside of a lot of CPU off-load applications like GPGPU, FPGA acceleration, etc is that you pay a price in latency to control the other device and get it its data. Any HPC application worth its salt will have DMA so that the accelerator can talk to memory itself without a CPU managing it, though that can still be latency as now you must go from accelerator -> memory controller -> memory over whatever interconnect (PCI Express, QPI, etc). QPI lowers latency compared to PCI Express.

Ideally, your CPU spends a few cycles setting up the accelerator with some DMA descriptors and performing configuration, and then it is hands off from there.

I'd imagine that in the new Xeon solution, the FPGA appears similar to QPI devices to the host OS, namely sitting (from a software POV) on PCI Bus 0xFF, with a few BARs available for applications.

Being cache coherent and unified memory would open the door to even lower latency, with zero copy and no need for DMA. Much like an APU and all the flim-flam from the HSA AMD circus.

# ? Jun 22, 2014 15:50

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

karoshi posted:

Being cache coherent and unified memory would open the door to even lower latency, with zero copy and no need for DMA. Much like an APU and all the flim-flam from the HSA AMD circus.

Let me tell you about zynq!

# ? Jun 22, 2014 16:25

Factory Factory: Mar 19, 2010; This is what
Arcane Velocity was like.

Intel detailed the next-gen Xeon Phi, Knight's Landing. Instead of wee custom Pentiums, the new version runs a mob of Silvermont Atom cores. As promised, it can be socketed as a peer to a buff Xeon. By using Silvermont instead of old Pentiums, it's ISA compatible with those buff Xeons, too.

Most interestingly, it comes with 16 GB of on-die memory based on Micron's stacked cube whatever tech. It's supposed to be 5x more bandwidth than DDR4 in 1/3 the die space.

# ? Jun 26, 2014 02:26

Rime: Nov 2, 2011; by Games Forum

Factory Factory posted:

Most interestingly, it comes with 16 GB of on-die memory based on Micron's stacked cube whatever tech. It's supposed to be 5x more bandwidth than DDR4 in 1/3 the die space.

Why even bother with DDR4. :psypop:

# ? Jun 26, 2014 03:28

Shaocaholica: Oct 29, 2002; Fig. 5E

gently caress, bring on socketed ram and tower coolers for them. Sticks always sucked for bandwidth/cooling.

# ? Jun 26, 2014 03:33

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Factory Factory posted:

[url=http://vr-zone.com/articles/intel-unveils-knights-landing/79686.html]As promised, it can be socketed as a peer to a buff Xeon.

Nope. Take a look at:

It has no QPI. It's socketed, but none of the Intel docs say anything about multiple KL or KL-Xeon cohosting.

quote:

ISA compatible

Well, you can run your existing binaries but you probably don't have any code compiled for AVX-512 (which you'll need to use to get anywhere near 3 TF.)

quote:

on-die

Nope, on package.

Rime posted:

Why even bother with DDR4.

If your data can fit in 16 GB, there's not a real reason to.

Shaocaholica posted:

gently caress, bring on socketed ram and tower coolers for them. Sticks always sucked for bandwidth/cooling.

off-package is too far away for stacked RAM

in a well actually fucked around with this message at 03:41 on Jun 26, 2014

# ? Jun 26, 2014 03:39

Factory Factory: Mar 19, 2010; This is what
Arcane Velocity was like.

I always gently caress up on-die/on package, especially when I think "I better use on-die/on-package correctly."

But no cohosting? I distinctly remember that being talked up about a six months ago as an expected feature of the first Phi's successor.

# ? Jun 26, 2014 03:48

Adbot: ADBOT LOVES YOU

# ? May 6, 2024 20:03

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Factory Factory posted:

I always gently caress up on-die/on package, especially when I think "I better use on-die/on-package correctly."

But no cohosting? I distinctly remember that being talked up about a six months ago as an expected feature of the first Phi's successor.

Yeah, the normally excellent realworldtech put out an article where they summarized the leaks available, and then took one assumption (KL and Skylake-EX must share infrastructure) that lead them afield (therefore it must have QPI (and cohosting), therefore a big LLC)

It's possible that cohosting is a possibility, but other than realworldtech (or others citing them) I've not seen anything to indicate it.

# ? Jun 26, 2014 04:20

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Intel: lol

«‹›740 »