Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Coca Koala
Nov 28, 2005

ongoing nowhere
College Slice
Problem description: Starting a couple days ago, my computer started shutting down randomly; as in, it can be on, idling, with only Firefox open, and it shuts down. It reboots immediately after; that said, after the most recent shutdown, it tried to turn on again and failed, and the button on the case stopped working. Turning it on via the button on the motherboard got it to boot.

Starting maybe a month ago, I noticed when I checked the BIOS that the "PWR_FAN 1" was running pretty slowly, around 600 RPMs. When I opened up the case to blow out the dust, I moved the fan that was hooked up to the PWR_FAN 1 header to a different header on the motherboard, and now that fan appears to be running at a more reasonable speed. I don't know if the crashes are related to permanent damage from having that fan run slowly; the fan in question was the top fan on the case.

On the suggestion of a friend of mine, I've been running Speccy to get a look at what temp the CPU is running at, to see if it's overheating. I haven't been looking at it immediately prior to a crash, but I've noticed that it usually runs at ~30 C while idling, and ~40 C if I'm using firefox or idling in Steam. Right now, it's hovering at around 47 C, and I've seen it spike up to 60, 62 for no apparent reason. So it seems like overheating is definitely a potential cause; I'm not sure if this means I should take it to pieces and do a more thorough job blowing out dust, move everything over to a new case with better ventilation, or if it's possible my CPU is just on its last legs because of prior heat damage from the fan running slow and I need to start thinking about upgrading.

Attempted fixes: I took the side off and blew all the dust out of the CPU, the GPU, the PSU, and the ventilation fans. I'm going to grab some thermal paste from Fryes on my way home from work tomorrow, and I'm going to see if any of my coworkers have a known-good PSU I can borrow for the day to rule that out as a cause.

Recent changes: None; I put in a second hard drive and set it up to dual-boot Windows and Ubuntu, but that was four or five months ago now.

--

Operating system: Windows 7 64 bit.

System specs: CPU: Intel i5-2500
GPU: Asus GTX 560ti
Motherboard: Asus P8Z68-V Pro
PSU: Antec HCG M-Series 620 watt
Case: Antec Nine Hundred Black Steel ATX Mid Tower

Location: USA

I have Googled and read the FAQ: Yes. By its nature, this is sort of a rough problem to google for, but I've done my best to figure it out.

Edit: I have three hypotheses about what the issues are, and hopefully somebody here can help me eliminate some of them.

A) The CPU is dying, and needs to be replaced. When the fan was running slowly, the CPU overheated, is now permanently damaged, and this damage is manifesting itself in shutdowns and a failure to regulate its temperature. Replace the CPU, and the problem goes away.

B) The PSU is dying, and needs to be replaced. Explains the shutdowns, does not explain the CPU temperature.

C) There's an issue with the case; the fans are busted, so I'm getting poor ventilation and that's why the CPU is getting hot. Some connection is failing, cutting power briefly to the motherboard, and that's why it shuts down randomly (and why it failed to boot from the case switch, but successfully booted from the motherboard button). The case was bought from Amazon back in January, so it might be under some form of warranty.

Coca Koala fucked around with this message at 16:06 on Sep 8, 2014

Adbot
ADBOT LOVES YOU

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
62 is a perfectly fine temperature for an i5-2500. The spikes are likely Turbo Boost activating to complete a short task. There's no issue of basic stability for a Core i5 until 95 C, unless you are overclocking and more than temperature is changing. I cannot overstate how trivial it is that the temperature reaches 62 C.

No damage is caused to the system by running a fan slowly. It might impact the life of the fan, but even that's fairly remote and requires the cheapest poo poo of cheap-poo poo fans.

Coca Koala posted:

A) The CPU is dying, and needs to be replaced. When the fan was running slowly, the CPU overheated, is now permanently damaged, and this damage is manifesting itself in shutdowns and a failure to regulate its temperature. Replace the CPU, and the problem goes away.

No. As a rule, CPUs are never the problem. You are not overclocking, so that removes the one exception. If the chip ever worked, it will continue to work for, like, a decade unless something external like a power surge goes wrong and murders it.

quote:

B) The PSU is dying, and needs to be replaced. Explains the shutdowns, does not explain the CPU temperature.

Possible, but PSU failures tend to have patterns, and the PSU tends to stay off rather than proceed to an immediate reboot.

quote:

C) There's an issue with the case; the fans are busted, so I'm getting poor ventilation and that's why the CPU is getting hot. Some connection is failing, cutting power briefly to the motherboard, and that's why it shuts down randomly (and why it failed to boot from the case switch, but successfully booted from the motherboard button). The case was bought from Amazon back in January, so it might be under some form of warranty.

Case switches are extremely simple devices and they tend not to break unless abused. When they break, they tend to just not work, rather than trip randomly. As well, a power switch cannot cause a reboot, even one that is malfunctioning. If anything case related (and I doubt it), it sounds like the reset button is being tripped.

The case will be under warranty if it's only a year old.

--

More likely: A hard drive or RAM failure. Run and screenshot Crystal Disk Info standard edition for every drive, then let the computer run memtest86+ overnight. If any errors are reported in memtest, then your RAM has gone bad. Bad sectors on your hard drive could also produce this behavior.

Zarc
Jul 25, 2014
Is there anything in the system logs about shutdown causes? Any BSOD dumps? Try running memtest86 to check the ram and crystal disk (sorry, don't have links, though I've seen them in other threads) to look for any memory/hd issues that could be contributing

Coca Koala
Nov 28, 2005

ongoing nowhere
College Slice
Hard drive 1:

Hard drive 2:

I'll run memtest86 tonight and post the results tomorrow. A coworker is bringing in a known-good PSU for me to borrow to try and rule that out as a cause definitively.

I looked through the system logs, but didn't see anything about shutdown causes; just a lot of "The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly".

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
The SSD is logging a lot of CRC errors, but not the hard drive. Could be RAM, but it's unlikely.

Do you have a spare SATA cable? Try replacing the SSD's cable. If you don't have a spare, swap the one from the hard drive on (and leave the hard drive unplugged for the moment).

It's also worth downloading and running SeaTools on the hard drive. There are two warning signs:

1) the reallocated sector. It's not a big warning sign, since it's just the one, but if there is ever a second, you should consider the drive failed and replace it.

2) the raw read error rate and ECC rate are very high. That suggests that the drive is loving up pretty much constantly, but in a way that shouldn't be causing any problems to reach Windows. Still, SMART is not the most reliable thing in the world, and "shouldn't" be causing problems may not mean that it isn't causing problems anyway.

My conclusion is that your storage be a wide variety of messed up and I'd work on that first. Swap the SSD's cable, and unplug the hard drive.

Factory Factory fucked around with this message at 03:25 on Sep 9, 2014

Coca Koala
Nov 28, 2005

ongoing nowhere
College Slice
As in, swap the cable from the HDD to the SDD, leave the hard drive unplugged for a bit, and let the computer sit for a day to see if it crashes? I can do that.

Memtest is running right now; halfway through the third of four passes on stick 1 of ram. When I go to bed, I'll put in the other stick and let it run overnight on that.

Coca Koala
Nov 28, 2005

ongoing nowhere
College Slice

Factory Factory posted:

Could be RAM, but it's unlikely.

Ram checked out after the overnight test; zero errors. So you were correct in that it's unlikely to be the problem.

I'll swing by the store and grab some spare SATA cables; I'm pretty sure the one the SSD is using is the cheapest I could find, so I'll go a step above that and figure that cables are still cheaper than basically any other component.

Zarc
Jul 25, 2014
I would agree with Factory that the HD stuff looks a little fishy. If something were to reach windows from the swap file, registry or something like that, it could mess windows up. Likely would be incredibly inconsistent as well.

Coca Koala
Nov 28, 2005

ongoing nowhere
College Slice
I swapped the SSD SATA cable with a better one, and am running SeaTools basic tests on both drives; I'll update with any failures.

Coca Koala
Nov 28, 2005

ongoing nowhere
College Slice
Both drives passed all the basic tests in SeaTools.

In other news, I've been running for about 5 hours with no crashes yet after swapping the cables; I'll leave my computer on tonight and if it hasn't crashed by the time I'm done with work, I'm probably calling that fixed for now; since the crashes started I've had multiples per day, so hours without seems like a good sign.

Factory Factory
Mar 19, 2010

This is what
Arcane Velocity was like.
Make sure you have backups of everything important to you. That hard drive is still a ticking time bomb. Every hard drive is a ticking time bomb.

Adbot
ADBOT LOVES YOU

Coca Koala
Nov 28, 2005

ongoing nowhere
College Slice
Been running for a little less than a week with no crashes, after multiple crashes per day when there was a problem. It seems like the cable was the issue; thanks for suggesting that, because I definitely would not have come up with that one on my own.

I'll grab another hard drive and transfer stuff over soon; the drive in question was pretty old when I got it, and it's gotten four years older since, so it's probably time to replace it anyways.

  • Locked thread