Determining the cause of a hard system freeze

cpucrashfreezegpumotherboard

Two years ago I built a new gaming PC, something I've done before without issue. Ever since then, it will freeze randomly, sometimes after an hour, sometimes not for days.

On freeze, the system becomes unresponsive, whatever was on screen is frozen, and I hear a horrendous screeching noise on my TV (connected via HDMI). The sound is an awful jumble of high pitched noises. It's loud and quite annoying. It continues forever until I shut off power to my PC.

The keyboard/mouse become unresponsive, the caps lock key no longer lights, the power/reset buttons won't work – the only solution is shutting off the power via the power supply.

I use this PC only for gaming, so it may not be accurate to say this happens only when in a game, but so far, that's how it happens.

Specs:

  • Windows 10, kept updated
  • Intel Core I5-6600K 3.50 GHz, 6 M Processor Cache 6 for LGA 1151
  • nVidia GeForce GTX 970
  • Mushkin MKNSSDRE1TB Reactor 1TB SATA III 6Gb 2.5inch SSD
  • ASUS H170 ATX Motherboard
  • Crucial 16GB Kit (4GBx4) DDR4-2133 MT/s (PC4-17000) SR x8 Non-ECC UDIMM 288-Pin Desktop Memory
  • Cooler Master Hyper 212 EVO – CPU Cooler with 120mm PWM Fan (RR-212E-20PK-R2)
  • EVGA 600 B1 80+ BRONZE, 600W Continuous Power, 3 Year Warranty Power Supply 100-B1-0600-KR

Troubleshooting so far:

  • I've updated windows and nvidia drivers
  • I've updated my bios to v3403 (first update since purchase)
  • I've run Windows Memory Diagnostics – no issues.
  • I've looked at the reliability history and event logs. Only related issues are "previous system shutdown was unexpected".
  • Memtest86, all tests (took a few hours, zero errors)
  • I've run GPU stress testing (furmark, and it never has an issue when coin mining)
  • I've tested the cpu with prime95, monitoring it for temps. I've let it run for longer than I've seen such temps (high 60 degrees celsius) from my games. No issues.
  • I've run the Intel Processor Diagnostic Tool, no issues

I have no idea what else to do. I need logging – something that gives me an indication of what went wrong but survives past hard reboots.

It happens randomly – sometimes once every few hours, sometimes only once, sometimes not at all. It's hard to say, but I believe it only occurs with about 50% of my games. For example it happens most with Rocket League but has never once happened with any of the Lego series games. It's happened with Kerbal Space Program but never heavily-modded Minecraft. It's happened with Just Cause 3 but never Fallout 4, etc.

Best Answer

Sadly, these problems are terrible to track down as it could be hardware or software. A game I run recently has "anti-cheat" sw installed (installs driver and 2 monitor processes). One of the things it does is hooks the keyboard interrupt. Unfortunately, it's buggy and doesn't always remove itself properly on game shutdown -- this causes an eventual lockup -- either on shutdown OR upon restarting the game.

The bit about terrible sound coming out -- does that only happen when sound is already coming out of the speaker? If it starts up on freeze, that's even weirder, but usually if sound is playing, and the process supplying the sound to the card doesn't provide new data to the sound-channel, whatever is in the sound-output buffer is often repeated ad nauseum causing a NON-random repeating sound of some sort.

Since you note that the keyboard light doesn't toggle, this points to a software problem -- as the only IRQ higher than the keyboard is the system time, which I ****think**** is used for scheduling. However, since drivers also deal with HW, it could still be HW giving some SW-driver bogus info, that causes it to lock up.

On the HW side, it could be a power spike (not real likely, but as someone mentioned, a power-conditioning UPS (one that emits a clean sine-wave) would be one 'test' (as well as a good addition in protecting against power spikes). A end guess would be temperature related (do you do any temperature monitoring? Might try free util "Open Hardware Monitor" from http://openhardwaremonitor.org/. Only really does GPU+CPU, but should give you an idea of temps and whether or not hangs happen when temps are up or not.

But on SW side, besides making sure you have latest drivers for your HW, you might try disconnecting any HW periphs you don't need while you are playing a game where hanging occurs, as well as shutting down all possible background SW and services.

Has this thing hung from day-1? Too bad you can't easily try Win7 as Win10 has been implicated in lots of SW-compat probs. If hang has become more frequent over time, have you cleaned dust out of inside of PC? (cooler and anywhere a fan might blow -- be sure to ground yourself before taking off parts... and be sure to unplug and bleed off capacitors. I've killed more than one piece of HW due to either static or not ensuring there was no residual power. I assume there is no time of day, day-of-week (or days of month) that hanging happens more often? Can I also assume that it doesn't matter what game you run?

The fact that it happens more on some games than others -- and from your description, seems to be more demanding games, makes me think graphics card power+temp. Are you able to try a newer graphics card? Specifically a GTX1070 or GTX1080. Don't laugh just yet....reason I asked... I had a GTX980 and had more flakey probs related to graphics.

Side note: I had to buy a new power supply because the 1100W included by Dell didn't have the right hookups to support two full 8-pin connectors for extra power (It was a design flaw in their T7500 -- that you could even see in their maintenance manual photos. Two of the 12V 75W pins available for extra graphics card power were on the same RAIL!). I had to replace it on my dime w/a 1300W -- and that eliminated a bunch of probs. However, the card was a dual-GPU card that ran hot -- and that added it's own flakeyness.

ANYWAY -- Nvidia's newer cards -- like the 1080 -- take less power! -- only 2 6-pin connectors -- and it runs cooler! So...if your current card is running toward the hot side, I found the 10XX series to be pin-compatible w/the older models (will auto adapt to older PCEe standards at some perf-loss).

When I had the Dell power supply in my unit -- usually things would run fine -- and usually did, except under certain types of graphical load. Like when I did the Win7 hardware-rating test -- one of those tests caused the machine to reliably reset with a 980 card.

The fact that you are noticing a pattern is GREAT -- gives some hope of figuring this out. Hope I gave you some ideas, -- since these issues are often notoriously hard to pin down. Good luck!