Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Alereon posted:

And that's L2 cache too, not L3 like on the modern CPUs. I think this is the article HalloKitty was mentioning, up through the Core 2 days Intel used L2 caches that were shared between the cores, optimizing how efficiently data was packed into the L2 cache. The downside is that larger caches have higher latency, which kind of defeats the purpose of having a cache. Intel decided that the best thing was to give each core its own smaller, private L2 cache, and add a larger shared L3 cache to handle applications where 256KB just isn't enough. It does seem like we're starting to reach the point today where 512KB gets to be the optimal L2 cache size, so maybe that will happen on future CPUs, but what do I know.

That particular AnandTech article hinted at the full reasons why Intel made that choice, but wasn't clear about it.

It's not just size which hurts latency, it's also associativity (number of "ways") and access ports. The benefit of increased associativity is better hit rate, and access ports have to scale with the number of CPU cores accessing a cache.

Intel designed Core 2 for only two CPU cores per chip. (Core 2 Quad processors are actually two dual-core chips mounted in a multichip package.) Because they only needed to share the last layer cache (or LLC) between two cores, they were able to be very aggressive on its size, associativity, and place in the hierarchy.

Nehalem jumped up to 4 cores per chip, doubling the number of access ports required. They also needed to bump the LLC size to 8MB in order to keep it at Intel's preferred minimum of 2MB LLC capacity per core. Doing both these things together would have made a Core 2 style L2 LLC too slow. You can kind of get a feel for it in the numbers:

Core 2 Penryn (45nm):
Per core (2 copies): 32KB 4-way instruction + 32KB 8-way data L1, 3 cycle latency
Shared: 6MB 24-way 2-port L2, 15 cycle latency

Nehalem (45nm):
Per core (4 copies): 32KB 4-way instruction + 32KB 8-way data L1, 4 cycle latency
Per core (4 copies): 256KB 8-way L2, 11 cycle latency
Shared: 8MB 16-way 4-port L3, 35+ cycle latency (sources seem to vary on this number)

That said, a hypothetical Nehalem design with a L2 LLC probably could have done better than 35 cycle latency. In the real Nehalem, thanks to the fast private L2 caches, L3 performance wasn't as critical and Intel was able to optimize it to reduce power use. In a Core 2 style design, the L2 LLC has to be very fast since it's the only thing between L1 and DRAM.

Note also that the Nehalem L1 latency grew by 1 cycle. They probably needed that to target higher clock speeds and perhaps power reduction (a likely need when doubling the cores per chip). This probably put more pressure on reducing L2 latency, which would have pushed them towards the 3-level design.

CPU design involves an insane number of engineering tradeoffs, all entangled.

BobHoward fucked around with this message at 10:39 on Feb 22, 2014

Adbot
ADBOT LOVES YOU

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

PUBLIC TOILET posted:

So it sounds like the next series of Xeons will mean the end of a budget-priced Xeon processor?

Not sure where you're getting that from, unless you're mistaking recent discussion of new E7 series Xeons as representing all Xeons. Roughly speaking, E3 = budget 1-socket, E5 = midrange 1-2 socket, E7 = high end 2-8 socket.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Gwaihir posted:

Funny thing, I used to work for the guy that took a ton of original die shots on older chips. Our ancient rear end website has a pretty good gallery of some of the funny things that used to get slipped in to dies back then: http://micro.magnet.fsu.edu/creatures/index.html

That is a classic website! I first found it back in the 1990s. Thanks to both of you for it.

Relevant to the question of when flip chip began, I found this die shot of one of the earliest flip-chip parts I remember seeing in the wild, the 0.6 micron PowerPC 601. Do you have any recollection of whether you took that photo yourselves? Because if you did, somebody at IBM or Motorola must've donated you a die that had been balled but not yet soldered to a package.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

EoRaptor posted:

I'm more interested in what market intel is going after with a K series that has Iris Pro. A general rule is that the bigger the die, the lower the overclock potential, and Iris Pro is a huge amount of silicon. Is it separately clocked? Can you turn it off and use the 'area' to help with cooling?

The performance potential of 128mb of fast, local L4 cache is nice, but few programs will be really able to take advantage of it, and the CPU cache control hardware won't be optimized for it, so it may not yield as much benefit as it could.

Overclocking potential: if you're just going to be OCing the CPUs you're still overclocking the same amount of chip area. The GPU clocks are wholly independent and I believe also have their own power plane in Haswell (iow yes, you can turn it off, by not using it).

Also, a major reason for reduced OC potential on a larger die is process variation: transistors on one corner may not perform as well as those on the opposite corner. The Iris Pro die is much more square than the regular Haswell dies, which have such a rectangular shape that the diagonal may actually be larger.

Cache: what basis do you have for claiming the CPU's cache hierarchy isn't "optimized" for the L4 cache? And plenty of programs will benefit from the L4. Not all types of programs by any means, but it's a pretty nice thing to have.

bull3964 posted:

I think you can honestly blame Apple for that. Since they bought up all the initial supply, other OEMs designed their products around the available chips. Short of doing a mid-gen refresh once the parts became available, they are just going to skip them altogether.

I don't think you can blame Apple for it at all. The "initial supply" argument puts the cart before the horse. If Intel was being run in a halfway competent fashion at that time, I'm sure they chose how to allocate wafer start capacity based on what each OEM was interested in ordering, and in what quantity. If other OEMs had been seriously interested, I'm sure they could have had supply at launch. IMO, the only sense in which you can blame Apple is that Apple has the high end laptop market sewn up so completely that high volume PC OEMs aren't trying real hard to be there.

One cause of that problem: it's hard for PC OEMs to sell stuff like this. Iris Pro 5200 is more expensive and clocked slightly lower than similar ordinary mobile quadcore Haswells. Even though the L4 cache largely makes up for the clock speed, and then some, this combo is hard for PC OEMs to sell to the public. Apple has the luxury of being able to offer something no PC OEM can (OS X), and markets largely by saying "our stuff is awesome, buy it". PC OEMs have to market mostly on specs, and it's tough to inform nontechnical consumers that they're actually better off with slightly less MHz for more money.

HalloKitty posted:

Also, unlocked CPU with Iris Pro? No, Intel, that's not what people wanted. People wanted "Intel TSX-NI, Intel VT-d, vPro, and TXT" extensions that were disabled for no particular reason on the K CPUs, and a heatspreader that wasn't the width of the Grand Canyon away from the die.

What relevance do those named features have to the average overclocking enthusiast? Far as I can tell, gently caress all, except maybe TSX-NI (and even so, not much and not today). VT-d is only useful if you're a heavy user of virtualization and you need the VMs to have native I/O performance. vPro / TXT are enterprise management features.

The L4 cache, on the other hand, actually has a chance of being somewhat relevant. It'd be interesting to see someone do a serious study of how it affects CPU-limited gaming, for example.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull
Silicon is a decent thermal conductor. Not as good as aluminum or copper by any means, but cold silicon next to a hot circuit does help conduct heat away from it and into the heatsink.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Shaocaholica posted:

^^^ Thanks ^^^

I might just spring for a cheap Pentium D 945 which only came in a 95W stepping and can be had for less than the cost of dinner. The 95W 960's seem to be non existent on the open market.

Do not pay money for a Pentium 4 in TYOOL 2014. P4s aren't worth their weight in poop.

A Core 2 Duo E6300 is faster than a 945, uses 65W, fits the same socket IIRC (check motherboard compatibility of course), and seems to cost about the same as a 945 on ebay. A C2D E6600 or better will blow the 945 away, and I easily found an E6600 listed under $10.

e: f,b

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Police Automaton posted:

I bought this i7 860 (Lynnfield) back when it was the best you could get for money, about 4-5 years ago. I tried reading up on current technology but was murdered by buzzwords. It seems to me that all Intel is doing right now is die-shrinking and higher integration in favor of cutting costs and making more energy efficient CPUs? Am I correct with this assessment? I feel I'll get another few years out of this computer at this rate, I see no reason to upgrade.

You are sort of correct, but you may have gotten a mistaken impression from people who whine if Intel doesn't deliver huge performance gains every year. They have been focused more on power and better integrated video recently, since the market keeps shifting further towards portables, but that hasn't stopped them from delivering at least 5% desktop CPU performance gain every generation, and sometimes lots more (Sandy Bridge, two generations after yours, was a huge leap). Also, there have been a lot of generations between then and now, so the incremental changes add up. Intel's current ~$300 chip is the i7-4770K and it eats the i7-860 for lunch:

http://anandtech.com/bench/product/108?vs=836

That said, it doesn't really matter how much better the 4770 is if you don't feel a need to upgrade.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

movax posted:

They are going to need to pour a ridiculous amount of money and other support in getting an ecosystem up to support that architecture. Sponsoring compiler development (developers had over a decade to work on x86 compilers), kernel development, etc to even make it remotely worthwhile to consider moving over.

Echoing that POWER / PowerPC is over 20 years old. The software ecosystem is mature.

To me the key question is how will they do at promoting an open platform this time, given what a mess was made of it the last time. The Apple-IBM-Motorola (AIM) alliance set out to completely replace x86 with PowerPC from the desktop up back in the early 1990s, and tried to recruit a lot of support from many other players too. In hindsight that was an unrealistic goal even if they executed well, but they didn't -- when I think about AIM, phrases like "terrible strategic mistakes", "infighting", and "mediocre CPU designs" are what comes to mind, and none of the three parties was blameless.

In the meantime, Intel got ticked off by the AIM propaganda, executed well, and shocked everyone who thought that CISC architectures were destined for the dustbin of history.

It never helped the PPC cause that IBM always seemed to treat the more open PowerPC as a way of subsidizing their IBM-proprietary POWER CPU designs. They'd better be doing a great job of convincing everybody involved that this is a real long-term commitment to being much more open with their POWER designs, or they aren't likely to get too much traction.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

necrobobsledder posted:

It's kind of a big deal because you get page faults constantly as you use a machine. When a program loads, you get compulsory faults as various pages of a program are loaded into the correct segments and/or pages. A page fault is a major part of how the Linux kernel gets performance via the mmap call as well. It doesn't explicitly load anything, it marks pages to fault and the kernel handles the load into the page and can use neat tricks like zerocopy and predictive fault handling to queue up more fetches on I/O.

This will make a huge drop in Linux performance aggregately for high performance compute overall.

"Huge"? No, I don't think so. Growth from 940 to 1045 cycles isn't that awful given that clock speeds have improved since Core 2, by enough that wall clock time should usually be lower anyways.

Torvalds mentions that in the real world load he was profiling, one which hammers on the page fault handler a lot without actually swapping, CPU page fault overhead is 5% of all CPU time. If you had the power to do ridiculously impossible things, you could reduce the CPU's page fault overhead to 0 and his kernel compiles would only improve 5%. If you improved that overhead by the ratio of 1045 to 940, the improvement in wall clock time would be sub-1%.

Which is why he isn't ranting and raving, the way he's wont to do when he discovers something outrageous. This is "hey Intel, you used to be able to do this in N cycles, now you're doing it in 1.11*N, so even though you're faster than you were before it looks like you could be even faster than that, what gives?"

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

movax posted:

DRAM is one of the most finicky things from a hardware point of view; we're lucky it works as well as it does in consumer hardware, honestly. Both Intel and AMD have patents galore on techniques to improve DDR training, performance and general stability.

And all their work can be undone if the board design isn't done correctly; the mobo vendors certainly don't have time to test every stick ever, so you can end up with the perfect corner case of a motherboard on the edge of its specification with a DRAM stick at the edge of its tolerances that manifest every now and then as instability for the end user.

And then add in the "must make common overclocker benchmarks 0.1% faster" factor and it's a miracle some of these board vendors ever ship a BIOS with truly stable DRAM defaults.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

VodeAndreas posted:

Yes! My i7 920 struggles slightly at Battlefield 3 & 4 physics.

But not enough that I've felt pressured into upgrading yet... I think part of me is waiting for another jump like my Athlon X2 to Conroe then again to Nehalem.

Another jump like that? Dude, it's already here.

http://www.anandtech.com/bench/product/47?vs=1199

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

MisterAlex posted:

It's quite possible previous chips did not pass validation and those things were disabled. That happens all the time. Intel's manufacturing process is always improving; they may have decided that "Okay, we can now reliably produce i7 chips that pass these feature tests."
Or it could very well be a marketing trick. But I wouldn't jump to either conclusion too readily without more information.

It was always artificial market segmentation. The likelihood of some significant fraction of the die population working ok only if TSX is disabled is basically nil. I have no inside info, but I do know enough about this kind of thing to state that with high confidence, and I'll try to describe why.

There's two categories of things to think about here, defects and process variation.

Point defects are usually what's behind selling partially defective chips as good ones. They are what they sound like: a particle of some kind of contaminant fucks up a circuit wherever it happened to land. Harvesting dies that have defects is almost always about designing in redundant copies of block structures which are already being replicated for other reasons, or are somewhat standalone and not required by all customers. Got a memory array with 32 columns? Add a 33rd, some fuses, and a little bit of extra logic, and now you can make a good chip out of one that has a defect in one column. Got an entire GPU that not everyone needs? Sell chips with defective GPUs into markets that don't mind the whole thing being disabled. (Though I think it's more likely that Intel does this mainly to reduce power for those markets.)

Note that harvesting requires extra design effort, including circuitry to completely isolate a defective block. It might be consuming excessive power even when not clocked, it might be pulling signals on a bus low or high. You have to depower and disconnect it. This is why it's not a realistic explanation here. If you have a defect in TSX-NI silicon, that's a defective CPU core. No way Intel designed that as an isolate-able function -- by its nature, it has to be part of too many critical timing paths in the CPU core to move it out into its own little island.

Process variation is shorthand for all the things which can cause variance in transistor performance. Most of these issues are regional in nature: all the chips from some region of the wafer don't perform as well, or this entire core doesn't have as high a Fmax as the others in the same die, and so on. Usually this happens thanks to reasons such as slight but non fatal misalignment between process steps, or variations in the thickness of deposited films. These variations occur over relatively large areas, not tiny little spots.

Once again the nature of the disabled functions tells us that harvesting of some sort isn't what's going on. For example, AMD once harvested 3-core CPUs from quad core dies because sometimes they'd have one of the four cores have a significantly lower Fmax, or use too much power at high frequency. But with tightly integrated functions like TSX-NI there's no way for those circuits to test out slow without the rest of the core also testing out slow (and vice versa).

Alereon posted:

Most Xeons feature Intel HD Graphics P4600/4700, which has drivers certified for professional workstation applications. This can be a pretty compelling benefit since even the shittiest Quadro and FirePro cards are pretty expensive.

This too is an artificial distinction, including Nvidia and AMD standalone GPUs. It's literally that it costs a lot to support some of these pro applications (usually because they have been around so long that they use caveman 1990s style OpenGL, which requires a lot of fiddly work to emulate on any modern GPU core). So they include fuse bits which ID the same exact GPU core as the "pro" version, sell the cards at prices where they can actually make a profit despite the support costs and smaller sales figures, and prevent customer cheating by having the pro drivers look for those IDs.

Intel's rationing of features like TSX-NI is actually motivated by much the same thing, extracting more money from smaller user bases which need difficult to validate features. It seldom makes economic sense to manufacture a wholly different product line without the whizbang feature, since in the silicon world the cost for taping out and validating fundamentally different variants is so high.

Personally I think Intel goes a bit overboard with it, but there's a pretty strong economic argument that if this kind of practice went away entirely in the chip industry (note: everyone does it) we'd see less innovation and new features.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Shaocaholica posted:

Since when has intel disabled x64 on a capable part?

Atom.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Alereon posted:

Oh it certainly is, I'm just saying that since the graphics comes free with a Xeon CPU, you can get out of paying the nVidia/AMD workstation graphics tax, which is one reason why the IGP is valuable even if you don't plan on using the compute performance.

Didn't mean that as a counterpoint; I used your post more as a jumping off point to say more :words: about differentiation.

Over time more workstation application code bases are getting modernized, so I do wonder how sustainable the workstation GPU thing is. Sooner or later workstation certified is just going to mean "not built with all the unstable bleeding edge AAA game-title specific optimizations", which IMO is why Intel GPUs have become a reasonable option here (Intel is not so good at the GPU driver thing). Both Nvidia and AMD are trying hard to pivot into GPU compute as the next big driver of high end sales, with Nvidia the biggest winner so far.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Alereon posted:

Yeah as cool as it would be I don't believe you can re-enable disabled features just via bitflips in microcode, if anything it might be an efuse that didn't quite open, or was marginal then failed closed in some manner. I wonder if you can even download microcode from the CPU, or just upload it?

Remember how Intel dabbled with software CPU upgrade unlocks? Intel sold "upgrade cards" for certain low end Sandy Bridge CPU models through retail. You'd buy one, enter the code on it into an Intel website, and it would generate a program that would permanently change your CPU to a different, faster model. (I say "generate" because I believe it was tied to your CPU serial number -- you weren't getting a program that would unlock anybody's CPU.) Most of the time the upgrade was just a clock speed boost, but one of the upgrades Intel offered unlocked hyperthreading.

It's a good guess that on-chip firmware is involved in the feature enable process. Not the microcode, but a pre-boot ROM. It's common to design in protected configuration registers where there's a tiny window of time after powerup or hard reset to write to them, after which they lock their values.

The reason for doing this is that fuses are expensive -- they take a lot of die area per bit. So you'd rather compress as much information as you can into a tiny number of fuse bits, then use pre-boot firmware stored in mask ROM to interpret them and do appropriate register configuration. I'd guess that Intel uses a small number of fuse bits as a model ID code. If that guess is correct, the ROM would contain a table with all the necessary configuration information for every CPU model that particular die design can be, and it would use the ID code stored in fuses to select just one row from the table. (The software upgrade process likely involves blowing one fuse bit to select a different row.)

Getting back to this supposed accidental unlock, I could believe a damaged hidden configuration register getting stuck in the state which permits hyperthreading.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Rime posted:

An i7 920, for example, remains a very powerful piece of hardware for modern gaming. It usually benchmarks not significantly lower than even the 4790k

Rime posted:

Yeah, I'm talking from the perspective of Joe Gamer who runs ARMA III as his most CPU-intensive task. If you're doing scientific computing or video / 3D rendering, you're more likely to benefit from the 15% performance upgrade. Just gaming though? The 920 is still making it into the top ten high score lists on most benchmarking sites.

I'm pretty sure you didn't do much legwork making sure your post was factual here because the first benchmarking site I pulled up (Anandtech's "Bench") shows a 4790 faster than a 920 by a ratio greater than 2x in some of the single thread CPU tests. Granted, there's not a lot of tests from 2008 that AT still runs on modern CPUs, but still.

Yes, the 920 is still a useful CPU, but so is a Core 2. Can we stop with the mythmaking? It's over 5 years old, and real progress has actually been made since then. It's time to let go and acknowledge that your pride and joy is obsolete.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

deimos posted:

Audiophiles are idiots, Asus did the Xonar sound cards waaaaaaaaaaaaaaaay before the Essence One, and all of it's iterations were considered best-in-class. I am pretty sure that they were one of the first consumer grade (a few $500+ cards had it) audio cards with replaceable opamps.

Replaceable opamps are stupid bullshit pandering to audiophiles who want to believe they can improve things by ~tweaking~

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Rastor posted:

Here's Charlie Demerjian's ramblings about why Intel would add that feature. Charlie's often a bit loopy but sometimes he comes up with some interesting thoughts.

Sorry, but Charlie Demerjian should not be taken seriously and this article is a great example why. He's clumsily trying to connect dots he doesn't understand so that they spell out doom for Intel, which is one of his objects of irrational hatred.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

movax posted:

Think FPGAs will still have a higher development cost/curve than paying guys to write OpenCL/CUDA kernels to run on GPUs, though I believe some FPGA vendors have voodoo OpenCL toolkits that can turn your kernels into RTL to implement on the chip.

The Xilinx one is called Vivado HLS (high level synthesis) and IIRC it attempts to support legal ANSI C, not just OpenCL. I can't say I've ever used it, so I don't have a good feel for the actual limitations (obviously they don't support truly arbitrary code, the whole of the standard C library, etc), but I got a chance to listen to one of the Xilinx HLS engineers talk about it and chatted with him after the talk. It's fascinating stuff.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Malcolm XML posted:

Yes actually writing VHDL/verilog is a giant pain in the rear end and the tools are so loving terrible compared to software. I really wish someone would come in w/ an fpga that had an open bitstream format and allow open tools, but that ain't happening.

As much as I agree about it being a giant PITA, I can't imagine open tools being an improvement in any way. The difficulty and expense of writing competent FPGA place-and-route software is off the charts, it's a really nasty NP-hard optimization problem.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull
You're seeing conflicting info because there is no one true answer. You can have both onboard and standalone active, or you can have only one of them active. If you want to use a standalone GPU, plug a monitor into it.

Also, the "some" saying that if you do so, the onboard GPU is going to take over to save power? Pretty sure they're full of poo poo. Notebooks can do that kind of thing, but I've never heard of desktop GPU drivers enabling that kind of feature.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

SourKraut posted:

While I don't think performance is going to suffer, why not just use the x16 slot?

Someone can confirm this but I believe PCIe 3.0 x8 bandwidth is equal to the bandwidth of PCIe 2.0 x16 (though ultimately it depends on how the lane was electrically wired).

I'm not sure what you mean by 'how it was wired' but yes, PCIe 3.0 x8 is about the same as PCIe 2.0 x16. PCIe 2.0 uses 5GT/s line rate with 8b10b line coding, giving 5*(8/10) = 4Gbps per lane. 3.0 is 8GT/s with 128b130b encoding, or 7.88Gbps per lane.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull
For most game-like loads, the performance hit will be nearly nothing so long as the GPU has enough local memory to hold all the textures and static vertex data. GPU command traffic doesn't need much bandwidth.

If you're doing something GPGPU-ish, that kind of thing usually depends more on PCIe bandwidth.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Rime posted:

If you RTFA you'll notice that what Intel calls a "20nm Process" is, in actuality, using 80nm components. 14nm? 54nm.

Process names have never directly reflected component sizes. Phonepostin' so I'm not going to go into great detail, but it has traditionally been related to the minimum line width the lithography process can print, as opposed to larger structures (like transistors). Things are definitely muddier now, in part because it's become standard to print at different resolutions in different layers, but before 2x nm processes I think you could still at least count on some minimum line width at the base of the layer stack matching the nominal process node.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull
If it took a year to discover I'm assuming the breakage was real subtle.

My main question is whether there are security implications, as can often be the case with errata of this type.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull
Isn't Dwarf Fortress single thread only though? Meaning the best CPU for it would be a 4790k.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Alereon posted:

Extreme high-end boards typically do have at least one additional 8-pin power connector to handle a large number of cards. Very low-end boards on which a lot of corners have been cut have a SATA power connector or a molex connector. The $400 Asus X99 Deluxe does not have any additional power connectors and it is meant to take four PCI-E 3.0 x16 cards, plus an x4. It has the build quality to power its slots without additional power connectors, the Asrock board does not. Consider that Gigabyte boards with a SATA power connector require it to be connected for stable operation with even a single videocard, despite the manual claiming it is for multi-GPU.

75W per card * 5 = 375W
Haswell-E: 140W, potentially a lot more if overclocking
Total budget (ignoring anything else that might need to draw from 12V going to the motherboard): 515W (no OC), 600+ (OC)

EATX main connector: two 12V pins, 8A max each (that being the Molex connector's rated limit per pin, wire gauge matters too)
8-pin EATX 12V connector: four 12V pins, 8A max each

6*8A*12V = 576W

Safe if not overclocking, but if you are? Forget about it. Asus is relying on the fact that most high end GPUs draw very little power through the PCIe slot.

Also your theory that adding (relatively expensive) molex connectors is a viable way of reducing the total board cost by saving money on traces is somewhat unlikely. A plane of generic 2oz copper can carry rather a lot of current. Perhaps there are some cheapass motherboards which actually do try to save insignificant amounts of money by using blank copper clad with thinner-than-normal plating, but this will probably cause CPU issues long before PCIe. CPUs are a much tougher power delivery problem: low voltage at ridiculously high amps, with vastly less tolerance for voltage droop.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

roadhead posted:

Thats the shape the crystal (yes, its a single large crystal) grows in. Then they slice it really thin. Its like a big crayon shape almost.

Those interested in more should google/wiki the Czochralski process. I designed an industrial controller for one of these machines once and the wiki article seems reasonable to me.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

1gnoirents posted:

lol, the people walking up, stopping, staring...

"oh god"

Keep in mind that they're not just reacting to losing the contents of the FOUP. Fabs are clean room environments. I'm sure shattering so many wafers released a bunch of particles into the air. Would not be surprised if an accident like that has serious secondary effects.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Combat Pretzel posted:

If Legitreviews isn't wrong about their analysis of the failure, the FIVR seems to be a rather suboptimal thing. ASUS' OC socket would probably helped then, I wonder if they had it enabled.

The legitreviews article doesn't have anything I'd call analysis and I don't see how what is there necessarily points the finger at FIVR.

The only thing you know for sure from what was written is that motherboard power conversion components failed. FIVR needs an input voltage of about 1.8V, therefore you still have a power conversion step on the motherboard, and it's really not a lot different from the old ~1.0V core supplies on pre-FIVR boards. Except in that it's considerably easier to design for since current delivered to the CPU drops by almost a factor of 2 and regulation tolerances are relaxed too.

The linked wccftech article on the Asus OC socket is nontechnical garbage, the socket simply cannot actually work by completely "bypassing" the FIVR. Maybe they've hacked its control loop to get more control over the final core voltage output so they can boost higher for extreme OC, but no more than that.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull
I'd love to see somebody try to sell real military grade hardware to gamers. Not in the certification sense but just actually bothering to construct the box, board, wiring harnesses etc. to requirements for gear which gets mounted in a tank or whatever. Hello ugly green painted aluminum boxes with 1/4 inch thick walls and sealed Cannon type cylindrical connectors! Good luck dragging that poo poo to a LAN party.

(Also good luck getting gamers to pay at least 10x for everything while getting lower specs because it has to be super rugged.)

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Tacier posted:

The IT guy in my office insists AMD is the best for price/performance (he says intel motherboards cost way more or something) and recommends them to everyone and recently replaced all our 3.4Ghz Phenom II X4 965 machines with 3.1Ghz Bulldozer FX-8120s (is that even an upgrade??). We do some fairly computationally intensive stuff (Geographic Information Systems) that is almost all single threaded.

For single threaded work that is almost certainly a downgrade. To beat a 4 core phenom II, Bulldozer generally needs at least one of: higher clocks, software that uses AVX instructions, or software that utilizes the hell out of 8 threads.

Maybe your GIS package uses AVX? Probably not though. And in any case, if you're trying to pinch pennies on workstation spending, and there's no need for lots of threads, high clocked Haswell i3 CPUs are where it's at. Intel list price on the 3.8GHz i3-4370 is $157, and it supports ECC. It will kick the living poo poo out of any AMD CPU on single threaded loads.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Lowen SoDium posted:

Personally, I expect this to change in the next couple years since the Xbone and PS4 are both 8 core systems. Console to PC ports will probably start having higher thread counts.

Eh. Mainstream i7 already has eight threads thanks to hyperthreading, and the console cores are AMD's Jaguar. Jaguar is more or less AMD's Atom competitor, and while Jaguar's good at that it's not in the same league as an i7 core, or even half of an i7 core which is using hyperthreading. So you're going to see a performance advantage without even needing to resort to the bigger expensive hex-core EP series i7s.

That's assuming game developers find good ways to make use of eight Jaguar cores. It's definitely one of those easier said than done things. Also, I seem to remember that Microsoft reserves three cores exclusively for the OS; the system's resources are partitioned by virtualization so that the console's background services can always be running without impacting foreground gameplay.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Ninja Rope posted:

Any good resources on dealing with/optimizing for NUMA? Especially when moving data across PCIE devices?

I don't have any but here is my non-helpful first-order approximation of what you're going to find: Avoid moving data.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

evensevenone posted:

And of course ARM, GPUs, POWER, etc are all RISC. The only non-RISC architectures (aside from x86) that are used at all any more are ancient embedded ISAs like 8051 or PIC, and ARM is pretty much replacing all of those nowadays too.

GPUs are not RISC by any stretch of the imagination. They're their own thing. VLIW DSPs are common, and they aren't RISC either.

Rime posted:

This is the software equivalent of spending all your profits maintaining and fueling a steam engine instead of investing in a diesel locomotive. Or trying to run your shipping business on a schooner in 2014.

We'd view both of those as idiotic at best, so why does software luddism get a free pass? Don't say "Because it's expensive to write new custom software", either. No poo poo, that's called progress.

Because it's expensive and risky to write new custom software, especially if forumsposter Rime has his way and it has to run on all-new hardware.

Also, your spin is kinda bullshit. This sort of thing isn't the equivalent of using schooners in 2014. Go look up the hardware IBM will sell you to run your multi-decade old enterprise software. It's not steam-engine technology.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

go3 posted:

.gov projects are more likely to be ruined due to unrealistic deadlines and constantly changing requirements due to politics ala healthcare.gov

This applies to more than just software.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

phongn posted:

Probably the best software group I ever read about was the Space Shuttle's software team, which was incredibly expensive per line of code, but produced amazingly high quality code. Even they occasionally screwed up - in one case, they couldn't guarantee their software would work across years (IIRC it did, but they couldn't prove it until well after the mission had completed). Their practice was documentation, testing and specification at every level (and no epic overtime burnout drives).

As I understand it, the "documentation" wasn't ordinary either -- it was detailed to the level of nearly being pseudocode.

Another interesting detail: The Shuttle flight software was naturally a safety critical system, and everything safety-critical on the Shuttle was engineered using highly detailed fault trees to estimate the probability of loss of mission, loss of vehicle, loss of vehicle and crew, etc. Relying on just one implementation of the software spec was considered too risky -- they had some target defect rate per line of code, and even though it was really low, fault tree analysis suggested the risk of loss of life was too high. So they implemented all the software twice, with independent and semi-firewalled teams, in hopes that if one version had a potentially devastating implementation bug, the other version might not share it.

In flight, both versions were always running simultaneously. The primary version ran on a cluster of three redundant computers, using majority vote to decide on the correct control outputs. The secondary backup software ran on a 2-way redundant set (so, 5 computers in total). Handoff from the 3-way to the 2-way was automatic if the 3-way self-detected severe problems with itself, and could also be forced manually.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Rime posted:

So I went to a talk by Ray Kurzweil last week, and at one point he briefly dropped that Intel has working prototypes of self-assembling 3D molecular circuits (which he said he has seen in person) and they are planning to have them replace their current production methods before 2020.

I dug up some stuff about MIT demonstrating the tech about four years ago, and Kurzweil isn't the type to just straight up lie about something like this, but I can't find anything about Intel specifically. :shrug:

Ray Kurzweil suffers from a peculiar delusional belief system which leads him to be wildly over-optimistic (kooky, even) about predictions of rapid technological progress. It truly wouldn't surprise me if someone told him something like "We're loving around with this wild idea for making circuits, but holy poo poo the problems are enormous and 2020 is the earliest we could even begin to ship tiny quantities of engineering sample devices to outsiders, if everything goes perfect, which it never does", and Kurzweil heard what he wanted to hear.

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Chuu posted:

Thanks for the info. Do you know when this changed?

LiquidRain is wrong, the turbo frequency table you're interested in is still a thing. Here it is:

http://www.intel.com/support/processors/corei7/sb/CS-032279.htm

The stuff LiquidRain mentioned about turbo control being based on lots of sensor data is true, but this isn't a new development. The table's meaning has always been "with N cores active the cores will run somewhere between BaseFreq and FreqN, depending on conditions".

Adbot
ADBOT LOVES YOU

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

No Gravitas posted:

Oh, and it is a barrel processor too, so you need 2 threads per hardware core for full utilization.

It's 4 threads per core to minimize cache miss stalls, if you're using 2 threads/core you might want to try bumping up your thread count.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply