Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
crazypenguin
Mar 9, 2005
nothing witty here, move along

MaxxBot posted:

What is going on that takes up that much time and yet isn't improved by faster storage?

Some back of the envelope estimates suggest it's decompression limited. (i.e. CPU)

Some numbers: JPEG seems to decompress at maybe 20 MB/s of compressed data per core. gzip at about 80 MB/s, I think. With 8 cores, that's about SATA speeds.

It'll be interesting to see what happens when this bottleneck becomes more recognized. Maybe nobody cares? Decompression accelerators? Compression methods with faster decompression? GPU assisted decompression? FPGAs? (Hey, there's reason to want them on servers, and there may be reason to want them on cell phones, why not a reason to want them on the desktop!)

Adbot
ADBOT LOVES YOU

crazypenguin
Mar 9, 2005
nothing witty here, move along
I think just because disk space still isn't cheap. Modern games suck up 10s of gigabytes compressed. Completely uncompressed would be almost 10x that, no?

crazypenguin
Mar 9, 2005
nothing witty here, move along

Harik posted:

That's pretty insane, LZO 2012 was decompressing at 1.2 GB/s on sandy bridge. Why would anyone gimp their performance by using a slow algorithm?

I suspect you're thinking of the wrong number. Compression algorithms are usually measured on rate of decompressed output per second. This is generally a good number to optimize and compare algorithms on: it reflects both improved compression ratio and improved decompression speed, both. (That is, if my algorithm is 10X slower, but achieves 100X better ratios, then I win.)

I was trying to give numbers for the rate these algorithms can consume input (compressed) data per second. This is generally a useless number as a point of comparison between algorithms, so it's very rarely reported or measured. But it's exactly what we care about when looking at why NVMe isn't a performance win. We can load the data faster than it can be decompressed.

I looked up LZO, and it looks like there may be licensing problems. However, it looks like Google has released "snappy" which I can't find good enough numbers on to figure out at what rate it can generally decompress data. But I suspect it was designed specifically with "oh poo poo, SSDs are CPU-bottlenecked on decompression, let's optimize for that while avoiding patents."

So maybe if games ditched zlib for snappy, we might cut loading times in half. But (take this with a grain of salt: I can't find numbers) I think we'd still be CPU limited, actually.

crazypenguin
Mar 9, 2005
nothing witty here, move along
I couldn't help but think about this more. I found a nice set of benchmark numbers, which can be sorted by decompression speed: https://quixdb.github.io/squash-benchmark/#results-table

It looks like a reasonable sweet spot is the lzo/lzo1b option, which gets about a 2.4 ratio with 560 MB/s decompression output on one core. So, assuming linear scaling (hah), that's able to consume 930 MB/s of input on a 4 core cpu. NVMe drives can firehose 3500 MB/s at it, so yeah. Definitely still very CPU bottlenecked. But SATA is about 600 MB/s, so a good 50% improvement.

The same benchmarks put zlib (the compressor that e.g. skyrim/fallout uses) at 200 MB/s at 3 ratio, or consuming 270 MB/s at 4 cores. This game also stores textures as dds files in S3 format, though, which doesn't need additional decompression before being uploaded to GPU. ...Except they might be also recompressing those with zlib, I dunno.

Meanwhile, Xilinx advertises 4000MB/s consumption speeds (100 Gbps, zlib's 3 ratio) on cheap fpgas for zlib: https://www.xilinx.com/products/intellectual-property/1-7aisy9.html (actually, it looks like that's specifically talking about compression, but that should be a good conservative proxy for what's possible.)

So an FPGA can easily handle full speed NVMe drives, and without having to choose a less efficient compression algorithm, either. So hardware can definitely handle this bottleneck problem.

But like I said, we'll see whether this is regarded as a real problem, and who knows how it might get solved, if so.

crazypenguin
Mar 9, 2005
nothing witty here, move along

eames posted:

What's the deal with U.2 by the way? My new mainboard has a connector for it but there seem to be no consumer drives that use it.

IIRC, the traditional drive people started working on a successor to SATA, took a long time, and aimed too low with respect to speed.

NVMe formed and released a standard a year before SATA did, and it was just all-around better. (Wikipedia says Intel formed the standards group, but I thought Facebook had something to do with spearheading it, too...)

So SATAe/U.2 was basically stillborn. Motherboards have the connector because Intel's chipset has support (and hey, why not, costs $.02 for a couple traces and bit of plastic), but NVMe/m.2 won.

crazypenguin
Mar 9, 2005
nothing witty here, move along
In terms of available bandwidth, there's a little difference between RAM speeds, but most of the difference just comes from the technology (DDR4 vs upcoming 5, PCIe 3 vs upcoming 4) and what our platform overlords see fit to give us in terms of lanes/channels.

A typical DDR4 chip is 40 GB/s per channel, and PCIe3 is 1 GB/s per lane (slightly less for talking NVMe over PCIe, hence why drives max out at 3.5GB/s not 4). Consumer hardware is 2 channel, and 20-24 total available CPU lanes. Server hardware varies but can be 12 channel and 128 lanes.

Some people stuffed 8 SSDs into a Threadripper system (which has the CPU PCIe lanes to support them) and got 28 GB/s out of their SSDs: http://www.guru3d.com/news-story/eight-nvme-m2-ssds-in-raid-on-x399-threadripper-reach-28-gbs.html

crazypenguin
Mar 9, 2005
nothing witty here, move along
The benefits of even faster SSDs are often hidden because of decompression. The applications where NVMe really shines are all areas where compression isn't used: initial boot, DBs, VMs, scratch disks.

If you've got only 4 cores then, for games and such, you're not going to see load times improvements from disks faster than 200ish MB/s, because that bottlenecks you on CPU, decompressing assets. And if you think about it, that's terrible. Even a 16 core threadripper could only handle 800ish MB/s?! SSDs are already well over 4 times that, bottlenecked on PCIe connectivity!

NVMe is such a big game changer that I continue to be slightly annoyed that the industry isn't moving faster to adapt. Intel should already be offering 4 more CPU lanes for consumers, and the industry should be arguing about picking a few algorithms to put in hardware, or just shipping FPGAs on consumer CPUs to handle the "decompression crisis". A small FPGA could easily handle GB/s of decompression.

...But then again, I guess nobody in the business really cares that much about gamers' load times, and I don't know that this bottleneck seriously impacts many other applications.

crazypenguin
Mar 9, 2005
nothing witty here, move along

codo27 posted:

I'm just getting brushed up on everything here. I had thought PCIe > m2, but now I've figured it out. That'll shave off a few hundred bucks.

m.2 can be PCIe (NVMe) or SATA. Don't buy m.2 SATA.

crazypenguin
Mar 9, 2005
nothing witty here, move along
m.2 SATA is for laptops that don't have space for a drive.

For anything else, you're paying $40 more than a regular SATA SSD for no benefit whatsoever.

And I was specifically replying to someone who seemed to want NVMe, and seemed unaware that m.2 did not automatically mean NVMe.

crazypenguin
Mar 9, 2005
nothing witty here, move along

Atomizer posted:

The "don't buy m.2 SATA SSD" thing comes up frequently in this thread, and while it's wrong, you can be genuinely helpful if you explain (copy & paste if necessary) what you recommend instead and why, because that information is going to people who aren't as familiar with the technology as the rest of us are.

I don't get why this is so hard. My recommendation is to ensure everyone understands that m.2 does not automatically mean NVMe. People are routinely confused.

I said "don't buy m.2 SATA" to a poster who wanted NVMe.

crazypenguin
Mar 9, 2005
nothing witty here, move along
I finally upgraded a 2500K desktop (P67 chipset) to an SSD, and just thought I'd drop a small bit of advice here:

1. Macrium Reflect just works perfectly.
2. I had disappointing benchmark numbers. It seemed suspiciously like SATA 2 speeds, but I was plugging into the right port for SATA 3...

It turns out #2 was because my BIOS was in IDE mode, and you have to switch to AHCI mode, but that breaks windows without a little bit of jiggling.

Just in case anyone else might be suffering from that problem, here's the instructions I followed to fix the issue: http://triplescomputers.com/blog/uncategorized/solution-switch-windows-10-from-raidide-to-ahci-operation/

Suddenly the SeqQ32 benchmark in crystal disk mark went from 300 MB/s to 562 MB/s. Much better.

crazypenguin
Mar 9, 2005
nothing witty here, move along
https://en.wikipedia.org/wiki/Advanced_Format

Basically everything consumer is still 512e not 4Kn, probably because of compatibility. Windows 7 doesn't support 4Kn, for example.

Adbot
ADBOT LOVES YOU

crazypenguin
Mar 9, 2005
nothing witty here, move along
At least drive manufacturers standardized on 1000, so they're comparable.

I once saw a service that reported storage metrics in "GB": 1000*1024*1024 bytes

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply