|
Problem description: I built my own PC about a year ago, and have been plagued intermittently with blue screens, ranging from NTFS_FILE_SYSTEM to INACCESSIBLE_BOOT_DEVICE and PAGE_FAULT_IN_NON_PAGE_AREA, almost always during the Windows boot, or sometimes just after I've loaded to desktop. These blue screens always start a couple of days after a clean install of Windows and get progressively more frequent (often requiring multiple restarts until eventually Windows boots) until I get fed up and do a clean install. Fortunately, this isn't the chore it used to be - with OneDrive and an unaffected data drive I can be up and running again with a couple of hours. My question is - what is causing this? I have run MemTest and Disk Check several times over the last few months and no issues are ever flagged. CrystalDiskInfo never finds anything either. If I leave a completely fresh install of Windows for a few days (i.e. don't update any drivers or install anything) the errors start to creep in, so I'm pretty sure it's not a software issue. The most likely candidate is clearly my SSD (120GB Samsung 840 Pro), but I don't understand why I can't find any errors on there, even when Windows is refusing to boot? Bizarrely, taking one stick of RAM out, and just leaving 4GB in one slot achieved stability for a few months; introducing the other stick brought the errors back. Swapping the sticks around made no difference, leaving me to think it could be an issue on the motherboard, maybe? Anyway, I'm quite happy to drop £60 on a new SSD to eliminate that as as possibility, ditto with the RAM. Is there any chance the motherboard could be at fault? If so, I might consider a premature full rebuild. Attempted fixes: Run MemTest, Disk Check, performed a clean install of Windows several times, tried RAM in every possible combination, tried leaving Windows for a few boot cycles with no driver updates Recent changes: On-going issue Operating system: Windows 8.1 64 bit System specs: Intel i5 3570k, Asus P8Z77-M motherboard, ATI 7770 graphics card, 8GB Crucial Ballistix DDR3 RAM, Samsung 840 Pro SSD, Seagate Barracuda 1GB HDD, 450W XFX Pro Core PSU Location: UK I have Googled and read the FAQ: Yes
|
# ? Nov 20, 2014 16:35 |
|
|
# ? Apr 26, 2024 01:51 |
|
First place to look is in the windows event viewer. Look for errors related to the BSOD. You can usually google the error codes or other info from the event viewer logs to get a better idea. Have you updated your motherboard BIOS? Did you update your drivers, see if you can get the correct drivers directly from the hardware manufacturer and not your motherboard vendor. e.g. Go to intel to get the chipset drivers.
|
# ? Nov 20, 2014 19:38 |
|
The motherboard BIOS and drivers are all up to date. It's just the frequency of the BSODs that are worrying; within a week or so of installing Windows it's taking multiple attempts fully boot and the errors are usually different, although they seem to be focused on problems in the boot partition.
|
# ? Nov 20, 2014 22:28 |
|
Did you look at the event viewer?
|
# ? Nov 21, 2014 00:14 |
|
Were you running Memtest 86+ or some other memory diagnostic? If you were running memtest86, did you let it complete at least one full pass? What you describe reminds me most of a case I had where my main machine would slowly get more and more error-prone, then start blue-screening, and finally I'd reinstall, only to have the process repeat. This was years ago and Windows XP, and memtest86 actually came back clean on that machine. I finally found another more burn-in type test, and I had some slightly bad RAM in that machine, and eventually errors written to the HDD would pile up and cause system failure. When you said swapping the sticks of RAM did nothing, did you try running the machine with just one stick of RAM for a period of time, then try running it with just the other stick of RAM for a period of time? Or did it run successfully with one stick of RAM for a while, and then started degrading once the other was added back in?
|
# ? Nov 21, 2014 09:22 |
|
This sounds pretty similar to the issues you had way back when. I didn't run Memtest 86 - just the one built in to Windows - so that's the next thing to try. I had the machine running stably with one stick of RAM in for a few months - introducing the other started this latest catalogue of problems. I was working under the assumption that faulty RAM would cause problems instantly (rather than gradually shafting my SSD over time) so I assumed that when I swapped over the sticks of RAM and nothing immediately changed that the RAM couldn't be the issue. I'll some overnight memory tests tonight and see if that finds anything.
|
# ? Nov 21, 2014 12:54 |
|
Just to update you on this (sorry it's taken a while - work got in the way): I ran MemTest86 overnight and no errors were found. I reinstalled Windows yesterday without any problems (restarting plenty of times during driver updates, Windows updates etc.) but this morning I got my first BSOD (System Service Exception) immediately after entering my password when logging on to Windows. I hadn't shut down (only restarted) my PC since reinstalling Windows yesterday if that could make any difference? This was my first cold boot. So if it isn't my RAM (which it looks like it isn't - if there were any issues there, an overnight MemTest would find them, correct?) surely it has to be the SSD? What diagnostics can I run there? So just to clarify, this is on a fresh Windows install, with all main drivers updated, and I got a BSOD upon my first cold boot since reinstalling Windows. Because I built the PC myself I am worried that I have hosed something else up or the motherboard is somehow shagged, but from the problems I've been having I don't think this could be the case... still, it's worrying! EDIT: Just got home and looked at my Event Viewer. So when booting this morning I got multiple copies of this error, at which point my PC blue screened, as above: Log Name: Application Source: ESENT Date: 01/12/2014 08:25:16 Event ID: 399 Task Category: (2) Level: Warning Keywords: Classic User: N/A Computer: TOM Description: wuaueng.dll (936) SUS20ClientDataStore: The database page read from the file "C:\WINDOWS\SoftwareDistribution\DataStore\DataStore.edb" at offset 98304 (0x0000000000018000) (database page 2 (0x2)) for 32768 (0x00008000) bytes failed verification. Bit 88913 was corrupted and has been corrected. This problem is likely due to faulty hardware and may continue. Transient failures such as these can be a precursor to a catastrophic failure in the storage subsystem containing this file. Please contact your hardware vendor for further assistance diagnosing the problem. Upon reboot I got this critical error, which I guess was just reporting the reboot: Log Name: System Source: Microsoft-Windows-Kernel-Power Date: 01/12/2014 08:26:01 Event ID: 41 Task Category: (63) Level: Critical Keywords: (2) User: SYSTEM Computer: TOM Description: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly. I left my PC on all day while at work - seems there was another BSOD just after I left: Log Name: Application Source: Microsoft-Windows-Wininit Date: 01/12/2014 08:32:22 Event ID: 1015 Task Category: None Level: Error Keywords: Classic User: N/A Computer: TOM Description: A critical system process, C:\WINDOWS\system32\lsass.exe, failed with status code c0000409. The machine must now be restarted. While rebooting from this BSOD (hope you're following!) I got this one: Log Name: System Source: Microsoft-Windows-WLAN-AutoConfig Date: 01/12/2014 08:33:52 Event ID: 10000 Task Category: None Level: Error Keywords: User: SYSTEM Computer: TOM Description: WLAN Extensibility Module has failed to start. Module Path: C:\WINDOWS\system32\Rtlihvs.dll Error Code: 126 This is followed by a load of DeviceSetupManager warnings due to not being connected to the internet (expected, I guess, due to the previous error). Now this is all a bit beyond my knowledge, but judging from the first error, and given that this is a clean Windows install, it rather looks like my SSD (i.e. where C:\WINDOWS is) is hosed, doesn't it? Could it be anything else? These were just Administrative Events, by the way - should I be looking somewhere else? Crystael fucked around with this message at 20:51 on Dec 1, 2014 |
# ? Dec 1, 2014 10:36 |
|
Run http://www.techpowerup.com/realtemp/ and see what it says your CPU temperature is at.Crystael posted:So if it isn't my RAM (which it looks like it isn't - if there were any issues there, an overnight MemTest would find them, correct?) If memtest finds a problem the RAM has most definitely failed but unfortunately it's not a 100% foolproof test. Some RAM is bad but won't register errors. It could be some kind of issue with one of the RAM slots or that one RAM stick could be faulty. Crystael posted:surely it has to be the SSD? What diagnostics can I run there? So just to clarify, this is on a fresh Windows install, with all main drivers updated, and I got a BSOD upon my first cold boot since reinstalling Windows. Logically if you removed one stick of RAM and had no BSODs for months and then put it back in and are having BSODs every few days I'd think the RAM would be the culprit. You could try removing the one stick again and see if the computer goes back to normal. That'd be the only way to be sure. Crystael posted:I was working under the assumption that faulty RAM would cause problems instantly (rather than gradually shafting my SSD over time) so I assumed that when I swapped over the sticks of RAM and nothing immediately changed that the RAM couldn't be the issue. Completely bad/dead RAM won't allow a computer to POST. However, RAM can have small defects that allow a computer to work stably for days/weeks before the annoyingly random but inevitable BSOD. Zogo fucked around with this message at 06:00 on Dec 2, 2014 |
# ? Dec 2, 2014 05:58 |
|
Zogo posted:Run http://www.techpowerup.com/realtemp/ and see what it says your CPU temperature is at. What kind of issue could it be with the RAM slots, and is there a way I could diagnose that? When the problems first started occurring I tried various combinations of my 2 RAM sticks in the four available motherboard slots, and did achieve relative stability for a few months using just one 4GB stick in slot 1B (I think). Introducing the second 4GB stick and moving them to slots 1A and 2A caused this latest catalogue of problems, eventually resulting in Windows being unable to access the boot device. Since then every clean install in Windows I've tried (always using 8GB RAM in slots 1A and 2A) has given me BSOD fairly quickly. When I was testing the RAM I do recall that using different RAM sticks in the same slot didn't help, but moving the RAM to a different slot did (which is why I was able to run stably for a while using one stick in slot 1B). However, I guess that if the RAM had already caused errors on my SSD by that point it could've been a coincidence that using a different slot worked, and actually I'd just chanced upon using the one non-faulty RAM stick?
|
# ? Dec 2, 2014 10:59 |
|
RAM is a funny thing, and can fail in subtle ways. It can silently fail to write a value to a bit, but return one reliably. Having 1 bit switched in 16gb is a tiny, tiny error amount, but if it is undetected, can lead to problems like CaptainSarcastic described. Memory tests work by writing specific, predictable patterns of data to a stick, and then read them back to see if it is the same. This catches most subtle errors, but cannot catch all of them. In your case, where we are pretty sure it's the RAM, but tests say RAM is fine, the only real way to test is to use some new RAM and see if the issue goes away. If it does, the old RAM was bad. If it doesn't, the issue is somewhere else. If a NEW issue comes up, the new RAM is probably worse. Whichever way it goes, at least you'll have a next step. Unfortunately, unless you happen to have some spare RAM lying around, the test is the same as simply assuming the RAM is bad and buying replacements. Also, yes, if it was bad RAM that has written changes back to the drive that have been piling up, you'll also need to reinstall Windows with the new RAM to rule that out. Skandranon fucked around with this message at 20:58 on Dec 2, 2014 |
# ? Dec 2, 2014 20:55 |
|
Crystael posted:What kind of issue could it be with the RAM slots, and is there a way I could diagnose that? Any part of a motherboard can have a failure. Visible physical defects are the easiest way of finding them but there's a lot of things that aren't even visible and a lot of trial and error/troubleshooting is the only way to be sure with something like this.
|
# ? Dec 3, 2014 00:49 |
|
|
# ? Apr 26, 2024 01:51 |
|
Zogo posted:Any part of a motherboard can have a failure. Visible physical defects are the easiest way of finding them but there's a lot of things that aren't even visible and a lot of trial and error/troubleshooting is the only way to be sure with something like this. Understood. I've got 8GB of fresh RAM arriving tomorrow so I'll reinstall Windows with the new sticks in the evening and see what happens. If no change then at the very least I'll have 16BG RAM (I work with audio a lot so this isn't actually a waste), even if I'll be back at square one. I'll also check the CPU temperature as advised above. I'm so glad that between OneDrive, File History and having Windows 8 on a USB stick, it doesn't take long to get up and running again after a fresh install! EDIT: Early signs are good. Reinstalled Windows with the new RAM last night and haven't encountered any issues so far. Fingers crossed. Crystael fucked around with this message at 12:14 on Dec 4, 2014 |
# ? Dec 3, 2014 01:01 |