Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
BurritoJustice
Oct 9, 2012

Dr. Video Games 0031 posted:

Users who might want some advantage from overclocking but don't want to manually oc for funsies should just use the buildzoid timings. Despite him being wrong about a few things, his DDR5-6000 timings are highly compatible with Hynix kits and provide a tangible benefit over stock DDR5-6000 expo. There seems to be a bigger gap between buildzoid's timings and EXPO timings than there is between buildzoid timings and the DDR5-7800 config.

Yeah, BZ timings are great. He did a good job of choosing conservative enough timings that still capture the majority of the performance benefit of manual tweaking.

IIRC his suggestion of using 2033MHz is because the early AGESAs actually hard enforced the 2:3 ratio by silently upping the MCLK/UCLK to match, so his 2033 numbers are actually running at 6100MT/s which is why he sees a performance benefit. Newer AGESAs don't do this and will absolutely let you desync your FCLK, and doing it to gain 33MHz is not worth it.

Adbot
ADBOT LOVES YOU

BlankSystemDaemon
Mar 13, 2009



Dr. Video Games 0031 posted:

Users who might want some advantage from overclocking but don't want to manually oc for funsies should just use the buildzoid timings. Despite him being wrong about a few things, his DDR5-6000 timings are highly compatible with Hynix kits and provide a tangible benefit over stock DDR5-6000 expo. There seems to be a bigger gap between buildzoid's timings and EXPO timings than there is between buildzoid timings and the DDR5-7800 config.
As in, it'll show a noticeable performance benefit in real-world gaming, or synthetic benchmarks that just benefit from bandwidth and don't take latency into account?

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!

Dr. Video Games 0031 posted:

Users who might want some advantage from overclocking but don't want to manually oc for funsies should just use the buildzoid timings. Despite him being wrong about a few things, his DDR5-6000 timings are highly compatible with Hynix kits
Yeah, that's what I'm doing, using the BZ timings to jerryrig me some DDR5-6000 ECC kits.

It's just that I've the 2033MHz FCLK set because of some rambling in the same video, that it'd always be off somehow and the added 33MHz are supposed to give an improvement. Yet these quoted benchmarks a few posts above show a latency improvement that seems worthwhile by just dropping 33Mhz.

ConanTheLibrarian
Aug 13, 2004


dis buch is late
Fallen Rib

BlankSystemDaemon posted:

As in, it'll show a noticeable performance benefit in real-world gaming, or synthetic benchmarks that just benefit from bandwidth and don't take latency into account?

The chart at the top of the pic BurritoJustice posted shows figures for a mix of games and other software.

BurritoJustice
Oct 9, 2012

BlankSystemDaemon posted:

As in, it'll show a noticeable performance benefit in real-world gaming, or synthetic benchmarks that just benefit from bandwidth and don't take latency into account?

I mean in real-world videogame Baldur's Gate 3 you're gaining 10FPS just from BZ timings over EXPO in the chart I posted.

It's been true for generations now that if you are running highly tuned/overclocked ram you're basically a generation ahead on gaming performance. A 9900K with tuned to the gills ram sits roughly in between a 5800X and 5800X3D in games.

BlankSystemDaemon
Mar 13, 2009



ConanTheLibrarian posted:

The chart at the top of the pic BurritoJustice posted shows figures for a mix of games and other software.
I somehow managed to completely miss that image.
:negative:

BurritoJustice posted:

I mean in real-world videogame Baldur's Gate 3 you're gaining 10FPS just from BZ timings over EXPO in the chart I posted.

It's been true for generations now that if you are running highly tuned/overclocked ram you're basically a generation ahead on gaming performance. A 9900K with tuned to the gills ram sits roughly in between a 5800X and 5800X3D in games.
That's quite impressive, yeah.
I'll need to look up if I have Hynix A/M-die memory, so that if I ever get a game that's not already pushing +300FPS (like Elite Dangerous is, at max settings, which is the only game I play currently), I can take advantage of it.

Also, it's this set of timings, right?

BurritoJustice
Oct 9, 2012


That's the one, yeah.

DeathSandwich
Apr 24, 2008

I fucking hate puzzles.
How generous is AMDs warranty process?

I had a two month old 7800x3d keel over and die basically out of nowhere. I dropped my old 7600 in and it works fully fine now. Stock clocking settings, liquid cooled system that had no thermal problems. Not seeing and dead contact pads on the bottom nor is there any thermal muck.

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast

DeathSandwich posted:

How generous is AMDs warranty process?

I had a two month old 7800x3d keel over and die basically out of nowhere. I dropped my old 7600 in and it works fully fine now. Stock clocking settings, liquid cooled system that had no thermal problems. Not seeing and dead contact pads on the bottom nor is there any thermal muck.

I recently rma'd my 2 year old 5950x (I've posted about its issues before) and it was fine. I explained what I'd tried, sent a pdf of my invoice and photo of the chip, got a prepaid shipping label, took a few days after they received it to get a new

I'd obviously provided enough technical info in the ticket as no questions were asked about that

Klyith
Aug 3, 2007

GBS Pledge Week

DeathSandwich posted:

How generous is AMDs warranty process?

I had a two month old 7800x3d keel over and die basically out of nowhere. I dropped my old 7600 in and it works fully fine now. Stock clocking settings, liquid cooled system that had no thermal problems. Not seeing and dead contact pads on the bottom nor is there any thermal muck.

Back when people noticed that Intel's fine print defines running XMP memory as "overclocking" and therefore out of warranty, gamers nexus did some anonymous trials to see if they could get a CPU warranty denied. In every case they were unable to get AMD or Intel to deny a warranty. Including when they boldly admitted stuff like "I was overvolting the CPU a lot, that's not covered by warranty right?" Nope, send it in, here's a new CPU.


I think the safe conclusion is that DIY overclocking enthusiasts are a rounding error for these companies so they just replace stuff no questions. And the warranty restrictions on overclocking are more aimed at keeping OEMs from doing anything crazy.


edit: also with the previous brouhaha over exploding x3ds I bet you could get a warranty replacement for a 7800x3d even if you were overvolting it with 120V AC.

Lockback
Sep 2, 2006

All days are nights to see till I see thee; and nights bright days when dreams do show me thee.
I cannot imagine you'll have any trouble RMA'ing a 2 month old 7800x3d

forbidden dialectics
Jul 26, 2005





BurritoJustice posted:

A 9900K with tuned to the gills ram sits roughly in between a 5800X and 5800X3D in games.

...is it possible to learn this power?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

forbidden dialectics posted:

...is it possible to learn this power?

Look at your mobos QVL RAM list, pick the highest validated speed you see on the list, crank up the voltage one notch past that and run tight timings.

Every board's QVL list has some insane poo poo on there like running a 2400 kit at DDRR-4400 or faster at 1.6V or something.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

Klyith posted:

edit: also with the previous brouhaha over exploding x3ds I bet you could get a warranty replacement for a 7800x3d even if you were overvolting it with 120V AC.

Electroboom/Gamers Nexus crossover? :haw:

Unrelated what would be the best AMD CPU for a NAS build these days? I was looking at the 8700G but seems like no ECC, I canít decide if this is a dealbreaker. My previous cpu didnít have it either (2500K).

BlankSystemDaemon
Mar 13, 2009



BurritoJustice posted:

That's the one, yeah.
I have a pair of F5-6000J3238F16Gs and hwinfo64 says they're SK Hynix, so I think I'm good?

A Bad King
Jul 17, 2009


Suppose the oil man,
He comes to town.
And you don't lay money down.

Yet Mr. King,
He killed the thread
The other day.
Well I wonder.
Who's gonna go to Hell?

priznat posted:

Electroboom/Gamers Nexus crossover? :haw:

Unrelated what would be the best AMD CPU for a NAS build these days? I was looking at the 8700G but seems like no ECC, I canít decide if this is a dealbreaker. My previous cpu didnít have it either (2500K).

I think DDR5 has error correcting baked in each module? Just not the interconnects between the module and the CPU?

Tuna-Fish
Sep 13, 2017

A Bad King posted:

I think DDR5 has error correcting baked in each module? Just not the interconnects between the module and the CPU?

The important part of ECC is the reporting, and normal DDR5 does not have that.

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
It's kind of a travesty that (fast) ECC UDIMMs are still kinda bleh to get by, where as soon as the Zen4 Threadrippers were announced, companies like G.Skill et al hopped on the high frequency ECC RDIMM train for what's essentially a way more niche platform than people wanting to run ECC RAM in their regular desktop computers.

Cygni
Nov 12, 2005

raring to post

priznat posted:

Electroboom/Gamers Nexus crossover? :haw:

Unrelated what would be the best AMD CPU for a NAS build these days? I was looking at the 8700G but seems like no ECC, I canít decide if this is a dealbreaker. My previous cpu didnít have it either (2500K).

if you are just running it as a NAS with some light VM/Linux ISO Downloading work, you can go way cheaper than an 8700G. something like a 2500k. :v: i personally ran a 3770S for a number of years and was satisfied until i went to gigabit internet and I could actually fully saturate the CPU while downloading things over a WireGuard VPN.

if you are running it as a Plex server, youre gonna probably want to use Intel 11th-gen or later to take advantage of QuickSync. AMD's encoder is not well liked and i believe is currently bugged with a desync issue. if you wanna go all in on AV1 though, intel wont have that integrated in the DIY sector until Arrow Lake later this year, so you are basically stuck adding an intel Arc/Nvidia dGPU just for encoding.

im personally waiting to make the AV1 switch until later. x265 is pretty widely supported by end user devices so transcodes are limited, and the QuickSync x265 encoder is pretty speedy.

Cygni fucked around with this message at 22:13 on Feb 27, 2024

LRADIKAL
Jun 10, 2001

Fun Shoe
A 1000 series Nvidia GPU if you can find one for free/cheap does a good job with transcoding as well!

BlankSystemDaemon
Mar 13, 2009



A Bad King posted:

I think DDR5 has error correcting baked in each module? Just not the interconnects between the module and the CPU?
DDR5 has ECC for each individual memory chip, not across the whole DIMM - which requires a whole extra memory chip on the DIMM.
GDDR5 did it right and mandated full ECC support.

Both specs were necessary because the speeds got so fast that bit error rate got above what JEDEC considered acceptable.

Craptacular!
Jul 9, 2001

Fuck the DH

priznat posted:

Electroboom/Gamers Nexus crossover? :haw:

Unrelated what would be the best AMD CPU for a NAS build these days? I was looking at the 8700G but seems like no ECC, I canít decide if this is a dealbreaker. My previous cpu didnít have it either (2500K).

Using a 5600G here. Never felt like I needed ECC.

BlankSystemDaemon
Mar 13, 2009



Craptacular! posted:

Using a 5600G here. Never felt like I needed ECC.
This post is an excellent argument for ECC with MCE - with it, you wouldn't have to rely on a feeling and you'd know if a program crash, system panic, or even data corruption happened because of a memory issue.

Every other form of storage in PCs already has ECC; the CPU caches all have it, as does both spinning rust and non-volatile flash storage (well, most of them use ECC because it's cheap to implement in hardware).

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!
Even all (critical) buses in the system use some form of error correction. Basically everything except the goddamn RAM.

Craptacular!
Jul 9, 2001

Fuck the DH

BlankSystemDaemon posted:

This post is an excellent argument for ECC with MCE

I'm not running production machinery here, but what started 15 years ago as a Windows Media Center DVR that added an old 32 bit ARM consumer NAS, now running VMs for storage and occasional services.

That 'feeling' was more like an opinion from talking to plenty of people about it across multiple forums but sure just presume I'm being a clown.

Klyith
Aug 3, 2007

GBS Pledge Week
Actual studies of memory errors* in giant-scale production environments have shown that uncorrelated one-off errors are uncommon. In a facebook study, about 2% of servers had a error per month. Half of the servers with errors are strongly predicted by other repeating errors -- ie hardware fault. (And of course hardware fault generates the vast majority of errors by absolute count.)

ECC is definitely nice insurance, but as a home user it's good to think about the probabilities involved:

How often will my memory have errors? A bit more than 1% chance per month that you'll have a spurious error somewhere in your memory, the cosmic ray bit-flip type event. And a little less than 1% that something in your hardware will start generating repeatable errors.

How much of my memory is hosting critical data? On something like a NAS the answer is "not much". Your critical data is sitting on a drive. It only passes through memory when you're reading, writing, or the FS is doing scrubs or other maintenance. (And if you read but don't save, an error doesn't matter.)

What does that mean? The chance that the unpredictable error strikes at just the right time and location to corrupt your data is low. For something like a home NAS, what you are worried about is the failure that generates frequent ongoing errors, because those make lots of errors. Now you have reasonable odds of an error hitting something you care about.


Therefore, testing your memory on a regular (1-3 times per year) basis is pretty good protection for most home user purposes. Not absolute: get ECC if you're really paranoid, or if your home server is more than a home server. But it is well after stuff like offsite backups as far as data protection, and IMO unnecessary.

Also if you spring for ECC memory on your NAS but not your desktop, and aren't doing those periodic memtests on the desktop, you probably have just as good odds of corrupting your data.



*these studies are using ECC ram to see errors, but I see no reason the results would not apply to normal ram

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Klyith posted:

Actual studies of memory errors* in giant-scale production environments have shown that uncorrelated one-off errors are uncommon. In a facebook study, about 2% of servers had a error per month. Half of the servers with errors are strongly predicted by other repeating errors -- ie hardware fault. (And of course hardware fault generates the vast majority of errors by absolute count.)

Another important element of the paper, if I remember it correctly (I havenít read it in years and I only re-read the abstract and intro), is

quote:

non-DRAM memory failures from the memory controller and memory channel cause the majority of errors,

which means that ECC wouldnít even necessarily protect you AIUI.

On the other hand, that was DDR3, and the susceptibility of DRAM to corruption increases in 5 (hence on-chip non-signalling ECC), right?

BlankSystemDaemon
Mar 13, 2009



Klyith posted:

Actual studies of memory errors* in giant-scale production environments have shown that uncorrelated one-off errors are uncommon. In a facebook study, about 2% of servers had a error per month. Half of the servers with errors are strongly predicted by other repeating errors -- ie hardware fault. (And of course hardware fault generates the vast majority of errors by absolute count.)

ECC is definitely nice insurance, but as a home user it's good to think about the probabilities involved:

How often will my memory have errors? A bit more than 1% chance per month that you'll have a spurious error somewhere in your memory, the cosmic ray bit-flip type event. And a little less than 1% that something in your hardware will start generating repeatable errors.

How much of my memory is hosting critical data? On something like a NAS the answer is "not much". Your critical data is sitting on a drive. It only passes through memory when you're reading, writing, or the FS is doing scrubs or other maintenance. (And if you read but don't save, an error doesn't matter.)

What does that mean? The chance that the unpredictable error strikes at just the right time and location to corrupt your data is low. For something like a home NAS, what you are worried about is the failure that generates frequent ongoing errors, because those make lots of errors. Now you have reasonable odds of an error hitting something you care about.


Therefore, testing your memory on a regular (1-3 times per year) basis is pretty good protection for most home user purposes. Not absolute: get ECC if you're really paranoid, or if your home server is more than a home server. But it is well after stuff like offsite backups as far as data protection, and IMO unnecessary.

Also if you spring for ECC memory on your NAS but not your desktop, and aren't doing those periodic memtests on the desktop, you probably have just as good odds of corrupting your data.



*these studies are using ECC ram to see errors, but I see no reason the results would not apply to normal ram
If it was only about data corruption, thatís be one thing - but if you read the Microsoft study linked, youíll see how many BSODs Microsoft attribute to lack of ECC - and if you expand that to the entire fleet of computers without ECC everywhere, itís still an absolutely mind-staggering amount of productivity thatís been lost over the decades.
All so that some PC clones could get a bit more money out of the purchasers.

Eletriarnation
Apr 6, 2005

People don't appreciate the substance of things...
objects in space.


Oven Wrangler
I run ECC on my NAS because:

1) It greatly reduces the chance of outages due to soft errors.
2) It greatly increases the chance that I will notice if the RAM actually starts to fail before I start getting mysterious outages.
3) The cost increase vs. non-ECC DDR4 (which is the only downside of ECC) was only about $60 for 32GB, less than 5% of the total cost of the system after I include drives.

I know that there are hypothetical benefits for file data integrity, but it's a distant fourth as priorities go - I expect ZFS to play a greater role there since file data spends a lot more time on the disk than it does transitioning through system memory. I have found ZFS to be very good at handling the aftermath of sudden outages, so to me the importance of this system's reliability is more about uptime and avoiding the work of recovery than avoiding data loss. Of course, if your data is all easily replaceable then this is even more true. On the other hand, if you are using something less resilient than ZFS then you might really want to avoid sudden outages.

I think it's totally valid for a home user to not give ECC serious consideration, if your preferred platform doesn't support it (e.g. Intel) or if you aren't very concerned about the possibility of soft error outages. Soft error outages don't seem to be especially common in my experience even now that we are getting into the tens of gigabytes of memory, so that's fine.

However, if your platform does support it then the cost is not bad. For that reason, I think it's a bit apples vs. oranges to say "you should have an offsite backup first" considering that could easily double the cost of the overall solution. You should have an offsite backup for anything critical or irreplaceable, but that is not most of the contents of many home NAS systems.

I also don't understand the claim that it's comparably important to have ECC (or periodic testing) on your desktop vs. your NAS. It would be nice, sure, but a soft error on my desktop only affects me and maybe the data on its one SSD. Most of that data is Steam games and can be replaced in a heartbeat, and that which isn't easily replaceable is backed up... mostly to the NAS. The vast majority of the data on the NAS isn't even transferred through the desktop, I download it through VMs also hosted on the NAS.

Eletriarnation fucked around with this message at 15:35 on Feb 28, 2024

Anime Schoolgirl
Nov 28, 2002

and if you're talking about theoretical performance losses from going to ECC, even registered ECC ram has XMP and EXPO now.

granted, DDR5 now properly silos off registered and unregistered RAM.

Klyith
Aug 3, 2007

GBS Pledge Week

Subjunctive posted:

Another important element of the paper, if I remember it correctly (I havenít read it in years and I only re-read the abstract and intro), is

which means that ECC wouldnít even necessarily protect you AIUI.

ECC bits are kept over the bus, so yes true ECC would protect you from controller and bus problems. Otherwise it would just be the DDR5 form of ECC, protecting from errors on the module itself.

Subjunctive posted:

On the other hand, that was DDR3, and the susceptibility of DRAM to corruption increases in 5 (hence on-chip non-signalling ECC), right?

Mostly the susceptibility to errors increases with density. I think DDR5 getting on-chip ECC wasn't really about the DDR4->5 transition, it just happens to be a convenient place where they could change the spec to mandate it.

Which means late-generation high-density DDR4, where they fit 16GB on a single-side stick, is the worst case for uncorrelated memory errors. That may be the place where you have the most incentive to use ECC. (Or some old sticks of low-density ddr4 in your parts bin.)



Eletriarnation posted:

I think it's totally valid for a home user to not give ECC serious consideration, if your preferred platform doesn't support it (e.g. Intel) or if you aren't very concerned about the possibility of soft error outages. Soft error outages don't seem to be especially common in my experience even now that we are getting into the tens of gigabytes of memory, so that's fine.

However, if your platform does support it then the cost is not bad. For that reason, I think it's a bit apples vs. oranges to say "you should have an offsite backup first" considering that could easily double the cost of the overall solution. You should have an offsite backup for anything critical or irreplaceable, but that is not most of the contents of many home NAS systems.

Good points, there's definitely a lot of personal priority stuff going on. To me a 1% chance to crash isn't really worth paying extra to avoid, even if I have to get up and walk to reset it. Repeated crashes because something went wrong -- such as the memory going bad -- are more annoying but eminently solvable problems. (Edit: I do agree that if the platform supports it, ECC is not a big splurge. But that's a big if, as we started this convo with "what's a lot-cost CPU that supports ECC?")

Crashes are temporary, data loss or corruption is forever. My house burning down would suck for many reasons, and if that included losing everything I've ever done with bits it would suck worse. That's why offside / cloud is more important than ECC to me.

Eletriarnation posted:

I also don't understand the claim that it's comparably important to have ECC on your desktop. ... The vast majority of the data on the NAS isn't even transferred through the desktop, I download it through VMs also hosted on the NAS.

Again, in my mind the thing that's important is personal and irreplaceable data. Whatever bullshit that I download from the internet isn't that. Some of it would be annoying to lose, but not important.

If I'm editing a document or photo or whatever and it gets silent corruption in flight, that could easily become irreplaceable. That could happen just as easily on my desktop as my NAS (if I had a NAS).

Combat Pretzel
Jun 23, 2004

No, seriously... what kurds?!

Klyith posted:

Your critical data is sitting on a drive. It only passes through memory when you're reading, writing, or the FS is doing scrubs or other maintenance. (And if you read but don't save, an error doesn't matter.)
I know I keep harping on this one, but are we again ignoring that disk caching is a thing in this type of discussion? Personally, I've currently 52GB of (meta)data hot in RAM on my NAS.

Klyith
Aug 3, 2007

GBS Pledge Week

Combat Pretzel posted:

I know I keep harping on this one, but are we again ignoring that disk caching is a thing in this type of discussion? Personally, I've currently 52GB of (meta)data hot in RAM on my NAS.

Ok. You got 00 on this month's roll of the d100, and now have a bit-flip error somewhere in that 52GB cache. For that error to matter, you would need to:
1) access the file containing the error before the cache gets refreshed or the machine reboots
2) use / modify the file and save it back to disk including the error

1 isn't that unlikely, the cache is full of stuff the machine thinks you'll use. But 2 needs a little thinking:
Did the error strike a file you'll modify, or just a video you might watch?
Does the file format have any independent checksumming / integrity / features that will make corruption obvious?
Will the entire file be written back to disk whole?

If you're working with a big file -- the ones most likely to be hit by the error -- you'll probably only write a portion back to the drive. A large file with data blocks like "ABCDEFGH" which I modify to be "ABCDEXGH", only the X and some bits of the E and G get modified on disk. If there was a memory error in A it doesn't matter.


I know I keep harping on this, but is there any explanation for why ECC is super important that accounts for how constant file corruption is not a thing we observe on the zillions of PCs that don't have ECC? (And when you do see file corruption it is generally paired with non-mysteries like a drive that has bad sectors all over.)

Adbot
ADBOT LOVES YOU

Desuwa
Jun 2, 2011

I'm telling my mommy. That pubbie doesn't do video games right!
Especially with desktop systems, running overclocked RAM pushing the limits of stability, I am not convinced RAM errors are that rare. All the studies have been of servers running ECC in-spec. Even when the RAM itself is fine the memory controller can be unstable, and that can pass memtest for extended periods. I made the questionable choice of going with 4x32GB ddr5 this time around and even with BIOS updates I got ~weekly corrected errors (across all sticks) as low as 4400mhz and was able to pass short memtest checks at 4800. While I'm pushing the capacity that Zen 4 is willing to tolerate, I assume a lot of these just-barely-stable overclocks with two DIMMs are also not actually that stable on the memory controller side.

I've also had two DIMMs go bad after multiple years of stability even without overclocks. One of those was ECC which was at least easier to conclusively narrow down.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply