Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Crystael
Mar 19, 2010
Problem description: I built my own PC about a year ago, and have been plagued intermittently with blue screens, ranging from NTFS_FILE_SYSTEM to INACCESSIBLE_BOOT_DEVICE and PAGE_FAULT_IN_NON_PAGE_AREA, almost always during the Windows boot, or sometimes just after I've loaded to desktop.

These blue screens always start a couple of days after a clean install of Windows and get progressively more frequent (often requiring multiple restarts until eventually Windows boots) until I get fed up and do a clean install. Fortunately, this isn't the chore it used to be - with OneDrive and an unaffected data drive I can be up and running again with a couple of hours.

My question is - what is causing this? I have run MemTest and Disk Check several times over the last few months and no issues are ever flagged. CrystalDiskInfo never finds anything either. If I leave a completely fresh install of Windows for a few days (i.e. don't update any drivers or install anything) the errors start to creep in, so I'm pretty sure it's not a software issue.

The most likely candidate is clearly my SSD (120GB Samsung 840 Pro), but I don't understand why I can't find any errors on there, even when Windows is refusing to boot?

Bizarrely, taking one stick of RAM out, and just leaving 4GB in one slot achieved stability for a few months; introducing the other stick brought the errors back. Swapping the sticks around made no difference, leaving me to think it could be an issue on the motherboard, maybe?

Anyway, I'm quite happy to drop £60 on a new SSD to eliminate that as as possibility, ditto with the RAM. Is there any chance the motherboard could be at fault? If so, I might consider a premature full rebuild.

Attempted fixes: Run MemTest, Disk Check, performed a clean install of Windows several times, tried RAM in every possible combination, tried leaving Windows for a few boot cycles with no driver updates

Recent changes: On-going issue

Operating system: Windows 8.1 64 bit

System specs: Intel i5 3570k, Asus P8Z77-M motherboard, ATI 7770 graphics card, 8GB Crucial Ballistix DDR3 RAM, Samsung 840 Pro SSD, Seagate Barracuda 1GB HDD, 450W XFX Pro Core PSU

Location: UK

I have Googled and read the FAQ: Yes

Adbot
ADBOT LOVES YOU

r0ck0
Sep 12, 2004
r0ck0s p0zt m0d3rn lyf
First place to look is in the windows event viewer. Look for errors related to the BSOD. You can usually google the error codes or other info from the event viewer logs to get a better idea.

Have you updated your motherboard BIOS? Did you update your drivers, see if you can get the correct drivers directly from the hardware manufacturer and not your motherboard vendor. e.g. Go to intel to get the chipset drivers.

Crystael
Mar 19, 2010
The motherboard BIOS and drivers are all up to date. It's just the frequency of the BSODs that are worrying; within a week or so of installing Windows it's taking multiple attempts fully boot and the errors are usually different, although they seem to be focused on problems in the boot partition.

r0ck0
Sep 12, 2004
r0ck0s p0zt m0d3rn lyf
Did you look at the event viewer?

CaptainSarcastic
Jul 6, 2013



Were you running Memtest 86+ or some other memory diagnostic? If you were running memtest86, did you let it complete at least one full pass?

What you describe reminds me most of a case I had where my main machine would slowly get more and more error-prone, then start blue-screening, and finally I'd reinstall, only to have the process repeat. This was years ago and Windows XP, and memtest86 actually came back clean on that machine. I finally found another more burn-in type test, and I had some slightly bad RAM in that machine, and eventually errors written to the HDD would pile up and cause system failure.

When you said swapping the sticks of RAM did nothing, did you try running the machine with just one stick of RAM for a period of time, then try running it with just the other stick of RAM for a period of time? Or did it run successfully with one stick of RAM for a while, and then started degrading once the other was added back in?

Crystael
Mar 19, 2010
This sounds pretty similar to the issues you had way back when. I didn't run Memtest 86 - just the one built in to Windows - so that's the next thing to try.

I had the machine running stably with one stick of RAM in for a few months - introducing the other started this latest catalogue of problems. I was working under the assumption that faulty RAM would cause problems instantly (rather than gradually shafting my SSD over time) so I assumed that when I swapped over the sticks of RAM and nothing immediately changed that the RAM couldn't be the issue.

I'll some overnight memory tests tonight and see if that finds anything.

Crystael
Mar 19, 2010
Just to update you on this (sorry it's taken a while - work got in the way):

I ran MemTest86 overnight and no errors were found. I reinstalled Windows yesterday without any problems (restarting plenty of times during driver updates, Windows updates etc.) but this morning I got my first BSOD (System Service Exception) immediately after entering my password when logging on to Windows. I hadn't shut down (only restarted) my PC since reinstalling Windows yesterday if that could make any difference? This was my first cold boot.

So if it isn't my RAM (which it looks like it isn't - if there were any issues there, an overnight MemTest would find them, correct?) surely it has to be the SSD? What diagnostics can I run there? So just to clarify, this is on a fresh Windows install, with all main drivers updated, and I got a BSOD upon my first cold boot since reinstalling Windows.

Because I built the PC myself I am worried that I have hosed something else up or the motherboard is somehow shagged, but from the problems I've been having I don't think this could be the case... still, it's worrying!

EDIT: Just got home and looked at my Event Viewer. So when booting this morning I got multiple copies of this error, at which point my PC blue screened, as above:

Log Name: Application
Source: ESENT
Date: 01/12/2014 08:25:16
Event ID: 399
Task Category: (2)
Level: Warning
Keywords: Classic
User: N/A
Computer: TOM
Description:
wuaueng.dll (936) SUS20ClientDataStore: The database page read from the file "C:\WINDOWS\SoftwareDistribution\DataStore\DataStore.edb" at offset 98304 (0x0000000000018000) (database page 2 (0x2)) for 32768 (0x00008000) bytes failed verification. Bit 88913 was corrupted and has been corrected. This problem is likely due to faulty hardware and may continue. Transient failures such as these can be a precursor to a catastrophic failure in the storage subsystem containing this file. Please contact your hardware vendor for further assistance diagnosing the problem.

Upon reboot I got this critical error, which I guess was just reporting the reboot:

Log Name: System
Source: Microsoft-Windows-Kernel-Power
Date: 01/12/2014 08:26:01
Event ID: 41
Task Category: (63)
Level: Critical
Keywords: (2)
User: SYSTEM
Computer: TOM
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

I left my PC on all day while at work - seems there was another BSOD just after I left:

Log Name: Application
Source: Microsoft-Windows-Wininit
Date: 01/12/2014 08:32:22
Event ID: 1015
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: TOM
Description:
A critical system process, C:\WINDOWS\system32\lsass.exe, failed with status code c0000409. The machine must now be restarted.

While rebooting from this BSOD (hope you're following!) I got this one:

Log Name: System
Source: Microsoft-Windows-WLAN-AutoConfig
Date: 01/12/2014 08:33:52
Event ID: 10000
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: TOM
Description:
WLAN Extensibility Module has failed to start.

Module Path: C:\WINDOWS\system32\Rtlihvs.dll
Error Code: 126

This is followed by a load of DeviceSetupManager warnings due to not being connected to the internet (expected, I guess, due to the previous error).

Now this is all a bit beyond my knowledge, but judging from the first error, and given that this is a clean Windows install, it rather looks like my SSD (i.e. where C:\WINDOWS is) is hosed, doesn't it? Could it be anything else? These were just Administrative Events, by the way - should I be looking somewhere else?

Crystael fucked around with this message at 20:51 on Dec 1, 2014

Zogo
Jul 29, 2003

Run http://www.techpowerup.com/realtemp/ and see what it says your CPU temperature is at.


Crystael posted:

So if it isn't my RAM (which it looks like it isn't - if there were any issues there, an overnight MemTest would find them, correct?)

If memtest finds a problem the RAM has most definitely failed but unfortunately it's not a 100% foolproof test. Some RAM is bad but won't register errors.

It could be some kind of issue with one of the RAM slots or that one RAM stick could be faulty.

Crystael posted:

surely it has to be the SSD? What diagnostics can I run there? So just to clarify, this is on a fresh Windows install, with all main drivers updated, and I got a BSOD upon my first cold boot since reinstalling Windows.

Logically if you removed one stick of RAM and had no BSODs for months and then put it back in and are having BSODs every few days I'd think the RAM would be the culprit. You could try removing the one stick again and see if the computer goes back to normal. That'd be the only way to be sure.

Crystael posted:

I was working under the assumption that faulty RAM would cause problems instantly (rather than gradually shafting my SSD over time) so I assumed that when I swapped over the sticks of RAM and nothing immediately changed that the RAM couldn't be the issue.

Completely bad/dead RAM won't allow a computer to POST. However, RAM can have small defects that allow a computer to work stably for days/weeks before the annoyingly random but inevitable BSOD.

Zogo fucked around with this message at 06:00 on Dec 2, 2014

Crystael
Mar 19, 2010

Zogo posted:

Run http://www.techpowerup.com/realtemp/ and see what it says your CPU temperature is at.

If memtest finds a problem the RAM has most definitely failed but unfortunately it's not a 100% foolproof test. Some RAM is bad but won't register errors.

It could be some kind of issue with one of the RAM slots or that one RAM stick could be faulty.

Logically if you removed one stick of RAM and had no BSODs for months and then put it back in and are having BSODs every few days I'd think the RAM would be the culprit. You could try removing the one stick again and see if the computer goes back to normal. That'd be the only way to be sure.

What kind of issue could it be with the RAM slots, and is there a way I could diagnose that?

When the problems first started occurring I tried various combinations of my 2 RAM sticks in the four available motherboard slots, and did achieve relative stability for a few months using just one 4GB stick in slot 1B (I think). Introducing the second 4GB stick and moving them to slots 1A and 2A caused this latest catalogue of problems, eventually resulting in Windows being unable to access the boot device. Since then every clean install in Windows I've tried (always using 8GB RAM in slots 1A and 2A) has given me BSOD fairly quickly.

When I was testing the RAM I do recall that using different RAM sticks in the same slot didn't help, but moving the RAM to a different slot did (which is why I was able to run stably for a while using one stick in slot 1B). However, I guess that if the RAM had already caused errors on my SSD by that point it could've been a coincidence that using a different slot worked, and actually I'd just chanced upon using the one non-faulty RAM stick?

Skandranon
Sep 6, 2008
fucking stupid, dont listen to me
RAM is a funny thing, and can fail in subtle ways. It can silently fail to write a value to a bit, but return one reliably. Having 1 bit switched in 16gb is a tiny, tiny error amount, but if it is undetected, can lead to problems like CaptainSarcastic described. Memory tests work by writing specific, predictable patterns of data to a stick, and then read them back to see if it is the same. This catches most subtle errors, but cannot catch all of them. In your case, where we are pretty sure it's the RAM, but tests say RAM is fine, the only real way to test is to use some new RAM and see if the issue goes away. If it does, the old RAM was bad. If it doesn't, the issue is somewhere else. If a NEW issue comes up, the new RAM is probably worse. Whichever way it goes, at least you'll have a next step.

Unfortunately, unless you happen to have some spare RAM lying around, the test is the same as simply assuming the RAM is bad and buying replacements.

Also, yes, if it was bad RAM that has written changes back to the drive that have been piling up, you'll also need to reinstall Windows with the new RAM to rule that out.

Skandranon fucked around with this message at 20:58 on Dec 2, 2014

Zogo
Jul 29, 2003

Crystael posted:

What kind of issue could it be with the RAM slots, and is there a way I could diagnose that?

Any part of a motherboard can have a failure. Visible physical defects are the easiest way of finding them but there's a lot of things that aren't even visible and a lot of trial and error/troubleshooting is the only way to be sure with something like this.

Adbot
ADBOT LOVES YOU

Crystael
Mar 19, 2010

Zogo posted:

Any part of a motherboard can have a failure. Visible physical defects are the easiest way of finding them but there's a lot of things that aren't even visible and a lot of trial and error/troubleshooting is the only way to be sure with something like this.

Understood. I've got 8GB of fresh RAM arriving tomorrow so I'll reinstall Windows with the new sticks in the evening and see what happens. If no change then at the very least I'll have 16BG RAM (I work with audio a lot so this isn't actually a waste), even if I'll be back at square one. I'll also check the CPU temperature as advised above.

I'm so glad that between OneDrive, File History and having Windows 8 on a USB stick, it doesn't take long to get up and running again after a fresh install!

EDIT: Early signs are good. Reinstalled Windows with the new RAM last night and haven't encountered any issues so far. Fingers crossed.

Crystael fucked around with this message at 12:14 on Dec 4, 2014

  • Locked thread