Cpu fan error checking nvram

cpu fan error checking nvram

www.youtube.com › watch. yIf any of the following situations arises, get the motherboard checked by service CPU Fan. Thermal Paste. Quick Start. Preparing Tools and Components. Have you pulled out the graphics card and ran through onboard chipset? Also clear your cmos or pull out the battery and re-insert.Also take out.

Similar video

Fix NVRAM Windows 10 - Songkhangluu✅

CPU IERR

IERR is issued when internal CPU operation error occurs. BIOS logs the event and identifies which CPU issues it. See “Jumper and Connector Information” in Appendix A

 for the location of CPU.

Check CPU.

CPU Thermal Trip

Thermtrip is issued when internal CPU temperature is too high. BIOS logs this event and identifies which CPU issues it. See “Jumper and Connector Information” in Appendix A

 for location of CPU.

Restart system.

Check CPU fan connector.

Check CPU fan.

Check CPU.

CPU Processor Disabled

CPU was disabled, as detect result shows abnormal result. It could be CPU thermal issue.

Check CPU fan.

Check CPU.

CPU Temperature

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

CPU temperature is abnormal. See note for threshold setting.

Restart system.

Check Fan connection.

Check Fan.

Check CPU.

Check System board.

CPU Voltage

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

CPU voltage is abnormal. See note for threshold setting.

Check CPU.

Check power supply.

Check system board.

CPU Fan

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

CPU fan is abnormal. See note for threshold setting.

Check CPU fan connector.

Check CPU fan.

System Board Temperature

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

System board temperature is abnormal. See note for threshold setting.

Restart system.

Check system fans.

Room Temperature

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

Room temperature is abnormal. See note for threshold setting.

Adjust room temperature.

System Board Voltage

1.5V/1.8V/2.5V/2.85V for SCSI 1/3.3V/3.3V Standby/5V/2.85V for SCSI 2/12V/-12V/Cache 1/2/Cache 3/4/5V Standby

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-Critical Going High

Upper Critical Going Low
Upper Critical Going High

System board voltage is abnormal. See note for threshold setting.

Check power supply.

Remove some devices which are using voltage to reduce system loading.

Check system board.

Keyboard/Mouse Fuse

Current of keyboard/mouse is cpu fan error checking nvram over the system limit.

Check keyboard/mouse.

USB1/USB2/USB3/USB4 Fuse

 Current of devices connected in USB1/USB2/USB3/USB4 is over the system limit.

Check devices connected in the designated USB port.

Power Supply Predictive Failure

Power supply is dead.

Check power supply.

Chassis Fan Assertion

Chassis fan is dead.

Check chassis fan.

Watchdog BIOS/POST

POST is not completed.

Check BIOS checkpoints list.

Watchdog OS/Load

Problem in loading OS.

Check hard disk.

Watchdog SMS/OS

OS hangs, after loaded.

Check BIOS event log.

Check OS.

Watchdog No Action

System hangs, setting is No Action.

Revise Watchdog settings, if prefer actions automatically carried out by systems.

Watchdog Hard Reset

System hangs, auto Reset.

Revise Watchdog settings, if prefer actions other than Reset automatically carried out by systems.

Watchdog Power Off

System hangs, auto Power Off.

Revise Watchdog settings, if prefer actions other than Power Off automatically carried out by systems.

Watchdog Power Cycle

System hangs, auto Power Cycle.

Revise Watchdog settings, if prefer actions other than Power Cycle automatically carried out by systems.

NVRAM SDR Checksum Error

NVRAM SDR data was damaged.

Rewrite NVRAM.

Replace NVRAM.

NVRAM SEL Checksum Error

NVRAM SEL data was damaged.

Replace NVRAM.

NVRAM FRU Checksum Error

NVRAM FRU data was damaged.

Rewrite NVRAM.

Replace NVRAM.

EMP Remote Login Password Fail

Password error.

Get the correct password.

EMP BMC Disable CPU

BMC disables CPU, after detecting abnormal status of CPU.

Reset the system. If the problem remains, replace CPU.

BIOS Post (Event Data 2)

The event data 2 is POST error code. Setup has to find the POST message table for displaying the message.

Check POST message table.

Secure Model Violation

Unauthorized access.

Follow the correct procedure to access units.

Pre-boot Password Violation-User Password

Incorrect user password.

Get the correct user password.

Pre-boot Password Violation-Setup Password

Incorrect setup password.

Get the correct setup password.

DIMM/RIMM Correctable ECC Error

Memory has ECC (error check and correction) error, but cpu fan error checking nvram is able to correct it automatically.

No action cpu fan error checking nvram, but if errors reoccur, check the memory.

DIMM/RIMM Uncorrectable ECC Error

Memory has ECC (error check and correction) error, and system is unable to fix it.

Check memory.

PCI PERR (Parity Error)

Error occurs on PCI-related on-board chipset while doing parity checking. This error message indicates the on-board http error 500 modx evo location which is bus 1 device 0 and function 1. See schematics for location of the chipset.

Check system board.

PCI SERR (System Error)

Error occurs on device or add on card of PCI slot.

Check add-on card.

Check PCI device.

Hard Disk Drive Fault

Errors occur in hard disk drive.

Check HDD.

Drive Backplane Fan Fault Assertion

Errors occur in drive backplane fan.

Check drive backplane fan.

Support Articles

NOTE: If the System will not power on, skip to the end of this article.

If the system boots, but takes a long time to boot, crashes, or reports other random, hard to track down errors, then the individual hardware components can be checked for failure.

Memory

We can test memory in your running OS with the 'memtest' package. You want to put most of your memory under test but still leave enough space for your normal workload and the OS to continue running. On a 8 GB system, testing 6 GB would be tested like this:

Memory test can take a number of hours. While this will not put all of memory under test, cpu fan error checking nvram, it will make any memory error likely to cause instability if not part of the tested memory or show up clearly with errors in the run

Memtest86++ also cpu fan error checking nvram ISO downloads for personal use. You would boot from a USB drive made with the ISO. Right as memtest loads (blue screen), press to enable multi-core mode. Wait at least 20 minutes for the tests to run, or until any errors are shown in red. If any errors are found, please run it again in single core mode, and let it run overnight to check for any memory errors, cpu fan error checking nvram. 6 to 8 passes are minimally recommended. If memory errors show up, the memory stick should be replaced.

Hard Drive

To check the hard drive for disk failures, start the program Disks, select the hard drive on the left, then click the icon in the top right, and choose SMART Data and Self-Tests, and then click Start Self-test and choose the Extended test. This test takes a few hours to run and will will give you a large amount of info about the health of the drive.

All of the values start at 100, and work their way down to 0. The terms "old-age" and "pre-fail" are normal. Pay attention to the overall assessment, and to how close the values are working towards the failure point, which is typically 0.

NVMe Drive

NVMe drives can't be checked with a SMART Test through the Disks application but the package smartmontools can be used for this. It can be installed with this command:

First, let's list the NVMe's that are installed:

Under 'Node' you will see a mount path for each drive something bodypaint exchande error creating temp file '/dev/nvme0n1', to access the smart-log you would type in the following:

Testing the CPU

Using the stress-ng program

Run this command to install stress-ng:

Using the s-tui program

Now this command:

From here use the to switch from Monitor to Stress by pressing the. Now watch the CPU temperatures raise as the system's CPU is tested.

Testing the GPU

Benchmarking

We can confirm whether there is an issue with the GPU in your system by using a benchmarking tool called Unigine Heaven.

Click the 'Free Download' button and choose the Linux option in the dropdown. Once the download is complete, there should be a file in the Downloads directory.

From a terminal, navigate to the folder with the Unigine Heaven download:

Run the following command:

Then, the application can be extracted:

Next, let's move to the new directory that was created:

Now, the application can be started:

Click the 'Run' button to begin the program.

GPU Burn (for NVIDIA GPU's only)

We can also test the GPU by using GPU Burn; first, if we're on Ubuntu, we'll need to install git and CUDA with this command:

Then, we will create the symlink for mp160 canon error 22, we can clone the repository with this command:

Now that we have cloned it, we can move into that directory like so:

Now we'll compile it:

And now we can run it like so (this example will run it for 60 minutes/1 hour):

Test CPU thermals

If the CPU fan is spinning erratically, or you are experiencing random shutdowns, this may be the result of a thermal issue. To investigate this, we'll use tools that can display CPU temperatures.

Modern hardware is designed to shut systems down when they reach temperatures that may be damaging to the internal components. Typically, these thresholds are in the upper 80s or 90s Celsius, depending on hardware.

If your system is spontaneously cpu fan error checking nvram down, this may be caused by overheating. Systems with dedicated GPUs tend to run hot under normal circumstances, so noticing an overheating problem can be challenging from ambient temperature alone.

The temperatures of your CPU cores and GPU card can be checked through software.

Run (command line tool)

(installed by default on Pop!_OS) is a text-based tool that runs in a .

  1. Install (If not installed)

    Open a with + (Pop!_OS) or ++ (Ubuntu) run the following commands:

  2. Get Sensor Output

This command will generate output like this:

Psensor (GUI)

If you prefer a GUI tool which provides graphing over time, the application Psensor can be installed from the Pop!_Shop, or through the with this command:

  1. Install ():

  2. Install Through Pop!_Shop

  3. Run Psensor:

    In a run:

Or, to launch through the OS interface, cpu fan error checking nvram, click on "Activities" in the top-left (Pop!_OS 20.04 LTS, or Ubuntu), or "Applications" (Pop!_OS COSMIC) and search for "Psensor"

High Temperatures

If the system temperatures are abnormally high, the fans may need replaced, and/or the thermal compound may need re-applied to the CPU and GPU cores.

Specific instructions for working on your hardware model can be found here

Quotes for replacement fans and thermal paste can be generated on open support tickets. To open a support ticket, visit this link

Machine Check Exceptions

Machine Check Exceptions are hardware failure events and can be logged with rasdaemon.service to journalctl. On Ubuntu based systems (and Pop!_OS) you can install via:

verify rasdaemon is active

Then, after the system has crashed or been used for a period of time, take a look at the log:

If there is no log or the log is empty, then the crash isn't related to a hardware failure. The log will stay empty until a MCE happens. Take a look for "uncorrected" errors, as most "corrected" errors can be ignored. If there are a consistent number of "uncorrected" errors, the hardware should be examined.

Won't Power On

NOTE: If the system fails to power on, please use the following articles to troubleshoot:

DesktopsLaptops

Support

Please contact support by opening a ticket to get the system repaired or to have failed components replaced.

[[[ SOLVED! ]]] NF750-G55 CMOS Checksum Error

So I just got my NF750-G55 motherboard, Phenom II x6 1090T, and G.SKILL Ripjaws Series 8GB (2 x 4GB) memory in. I hooked up a Zumax ZU-500W error initializing direct 3d supply, a hard drive, a DVD burner, and used onboard DVI real quick just to get started. I didn't put anything in a case, just layed it out on my table to get it set up real quick. Sorry for the crappy pics. Photos taken with my cell phone.

I powered it on and it just displayed the MSI NF750-G55 logo and wouldn't progress further. I had the memory in the 2 right BLUE modules so I switched them over to the two left BLACK modules. Powered the system on and same thing. Just displayed the NF750-G55 logo screen and wouldn't cpu fan error checking nvram past the logo.

So I switched PSU's to a Dell 450 watt that I know for a fact works. Same Result. Then I took everything out and just had the CPU/Fan, memory, and monitor hooked up. Same result just the logo screen. So then I had the idea to clear the CMOS with the jumpers and it finally booted past the logo screen into a checksum screen. It said CMOS checksum BAD press F1 to run setup. Press F2 to load default values and continue. If I go into settings and save them and exit, it boots back into the logo screen and will not continue.

So if I clear the cmos and go back to that screen and just hit F2 to save default settings and continue, it says checking NVRAM and will not continue past that screen. I have to clear the CMOS again otherwise it will not boot past the logo screen.

So if I clear the cmos and go back to that screen and hit F11 to enter the boot menu, it says entering boot menu and will not continue past that screen.  I have to clear the CMOS again otherwise it will not boot past the logo screen.

If I clear the CMOS and boot up with just one stick of RAM, CPU, and onboard graphics, it shows this screen.

If I do F1 and make changes and save, it boots back to logo screen and won't continue. If I hit F2 to load defaults and continue it does nothing. If I hit F11 to go to boot menu it does nothing. If I attach hard drive and DVD drive, it is the same results as before. Switching to other stick of RAM returns the same results as well.

Now I am all out of ideas. I  have tried a brand new motherboard battery for the CMOS but that didn't help either. Do i need to install everything in a case and continue from there? Motherboard DOA? Thanks snmp exec format error all your assistance!


EDIT: SOLVED! The mother board kept getting stuck at Checking NVRAM. For whatever reason the NVRAM was corrupt from the factory so flashing the BIOS rewrote over the NVRAM and it is working just fine now!

HELP!!!! "Checking NVRAM error"

Hi Dave,
I checked the specs on your motherboard. They are here:
<A HREF="http://www.msicomputer.com/product/chipset.asp?chipset=via_kt400" target="_new">http://www.msicomputer.com/product/chipset.asp?chipset=via_kt400</A>


First thing. hook up your in-case speaker, if you haven't already. It will give you beeps when there are errors.

It's an AMI bios, so the information on POST sequence and BEEP codes is here:
<A HREF="http://www.ami.com/support/doc/AMIBIOS-codes.pdf" target="_new">http://www.ami.com/support/doc/AMIBIOS-codes.pdf</A>

You should be able to tell from the AMIBIOS docs, where your system is bombing out.

From your description, it sounds like it could be bad ram. The BIOS needs a minimum of 64k of good ram to start up. I noticed, while checking your motherboard specs, that they say very clearly that it only supports a very short list of memory modules. You may have to buy from their list to get it to work.

Hope this helps.




<b>(</b>It ain't better if it don't work.<b>)</b>

Thread: Bios refusing to post, O4 error.

For some time I have been getting random post errors,*
These errors have become more frequent recently which started me thinking I had borked the delid, so as these errors were now at 7 out of ten attempts fail to boot, I decided to completely strip rig down ( even redo the delid) and rebuild it scrupulously careful.
Upon delidding cpu, it was a perfect spread, so noproblems there! Re applied the lm and resealed.
So, rig completely rebuilt, I power it up and exactly the same random errors, from D5, O4 test nvram, Install gpu bios *and check cpu errors,
I have flashed the bios, cleared the bios so many times, so I just continued rebooting in the hope it would settle eventually ( sometimes it does) and yes it settled on ‘04 test nvram’ *which according to the manual is “ pch initialisation before microcode loading”
O4 is now a permanent boot failwhich, cpu fan error checking nvram, at this minute I don’t have a clue how to rectify???
I think D5 is out of resources not enough space .
I have taken bottomshield off mobo but can’t find cmos batterie?
Any suggestions guys?
Thanks in advance.*
*Spec as below.*
All latest drivers and bios.*

System hangs at "Checking NVRAM."

I'm trying to figure out which MSI motherboard I have, cpu fan error checking nvram, since it's encountering problems. I know that I could just load up SiSoft Sandra to find out, but.the problem is that I can't boot into Windows  :undecided:  Or, at the least, the system boots into Windows only on rare occasions.

At first what it would do is just hang at that "Corecell" screen and not do anything. The odd thing is that it still seemed to be booting behind that screen.I could see cpu fan error checking nvram drive activity, and the mouse and keyboard would activate, cpu fan error checking nvram. Anyways, after about 20 restarts, it finally decided to boot and I went into the BIOS and disabled the logo screen.

So, I thought the problem was fixed when it immediately booted thereafter, but this morning it refused to start. Mudbox error 2381, at least now I can see what's going on. It's hanging at the spot where it says, "Checking NVRAM." So I removed one memory module and tried again, no luck. I put it back in, and removed the remaining memory module, still nothing. Removed the two PCI cards and still it doesn't boot. A quick Google search seems to point to the CMOS battery, but I'll have to identify the motherboard, I believe, before I can get a new one. Unless CMOS batteries are generic. Aside from all this, the system has an Athlon XP 1.15 Ghz processor, and I know the motherboard has a VIA chipset, but that's all I know. However, I can look at any numbers on the mainboard to see what kind it is, if it can be identified in such a manner.

Anyways, I know I sneaked in a thinly-disguised request for troubleshooting in there, but any suggestions that can be offered would be appreciated.

 



Note - This chapter applies to all Sun Fire X4100/X4100 M2 and X4200/X4200 M2 servers, unless otherwise noted.



This appendix contains information about how the servers process and log errors. See the following sections:

Handling of Uncorrectable Errors

This section lists facts and considerations about how the server handles uncorrectable errors.



Note - The BIOS ChipKill feature must be disabled if you are testing for failures of multiple bits within a DRAM (ChipKill corrects for the failure of a four-bit wide DRAM).



  • The BIOS logs the error to the SP system event log (SEL), through the board management controller (BMC).
  • The SP's SEL is updated with the failing DIMM pair's particular bank address.
  • The system reboots.
  • The BIOS logs the error in DMI.


System hangs at "Checking NVRAM..."

I'm trying to figure out which MSI motherboard I have, since it's encountering problems. I know that I could just load up SiSoft Sandra to find out, but...the problem is that I can't boot into Windows  :undecided:  Or, at the least, the system boots into Windows only on rare occasions.

At first what it would do is just hang at that "Corecell" screen and not do anything. The odd thing is that it still seemed to be booting behind that screen...I could see hard drive activity, and the mouse and keyboard would activate. Anyways, after about 20 restarts, it finally decided to boot and I went into the BIOS and disabled the logo screen.

So, I thought the problem was fixed when it immediately booted thereafter, but this morning it refused to start. However, at least now I can see what's going on. It's hanging at the spot where it says, "Checking NVRAM..." So I removed one memory module and tried again, no luck. I put it back in, and removed the remaining memory module, still nothing. Removed the two PCI cards and still it doesn't boot. A quick Google search seems to point to the CMOS battery, but I'll have to identify the motherboard, I believe, before I can get a new one. Unless CMOS batteries are generic. Aside from all this, the system has an Athlon XP 1.15 Ghz processor, and I know the motherboard has a VIA chipset, but that's all I know. However, I can look at any numbers on the mainboard to see what kind it is, if it can be identified in such a manner.

Anyways, I know I sneaked in a thinly-disguised request for troubleshooting in there, but any suggestions that can be offered would be appreciated.

 



Note - If the error is on low 1MB, the BIOS freezes after rebooting, cpu fan error checking nvram. Therefore, no DMI log is recorded. cpu fan error checking nvram example of the error is reported by the SEL through IPMI 2.0 is as follows:

  • When low memory is erroneous, the BIOS is frozen on pre-boot low memory test because the BIOS cannot decompress itself into faulty DRAM and execute the following items:

    • When the faulty DIMM is beyond the BIOS's low 1MB extraction space, proper boot happens:

  • Note the following considerations for this revision:
    • Uncorrectable ECC Memory Error is not reported.
    • Multi-bit ECC errors are reported as.
    • On first reboot, BIOS logs a HyperTransport Error in the DMI log.
    • The BIOS disables the DIMM.
    • The BIOS sends the SEL records to the BMC.
    • The BIOS reboots again.
    • The BIOS skips the faulty DIMM on the next POST memory test.
    • The BIOS reports available memory, excluding the faulty DIMM pair.

FIGURE E-1 shows an example of a DMI log screen from BIOS Setup Page.


Graphic showing a sample DMI log screen.


Handling of Correctable Errors

This section lists facts and considerations about how the server handles correctable errors.

  • During BIOS POST:
    • The BIOS polls the MCK registers.
    • The BIOS logs to DMI.
    • The BIOS logs to the SP SEL through the BMC.
  • The feature is turned off at OS boot time by default.
  • The following Linux versions report correctable ecc syndrome and memory fill errors incpu fan error checking nvram, if kernel flag is indicated at boot time, or if is enabled through kernel compile or installation:
    • RH3 Update5 single core
    • RH4 Update1+
    • SLES9 SP1+
  • The Linux kernel () repeats a report every 30 seconds until another error is cpu fan error checking nvram and a flag is reset.
  • Solaris support provides full self-healing and automated diagnosis for the CPU and Memory subsystems.
  • FIGURE E-2 shows an example of a DMI log screen from BIOS Setup Page:

Graphic showing a sample DMI log screen, <i>cpu fan error checking nvram</i>, with a correctable error shown.


  • If during any stage of memory testing the BIOS finds itself incapable of reading/writing to the DIMM, it error dc exec the following actions:
    • The BIOS disables the DIMM as indicated by the Memory Decreased message in the example in FIGURE E-3.
    • The BIOS logs an SEL cpu fan error checking nvram.
    • The BIOS logs an event in DMI.

Graphic showing a sample DMI log screen, with a correctable error and memory decreased message.


Handling of Parity Errors (PERR)

This section lists facts and considerations about how the server handles parity errors (PERR).

  • The handling of parity errors works through NMIs.
  • During BIOS POST the NMI is logged in the DMI and the SP SEL. See the following example command and output:

  • FIGURE E-4 shows an example of a DMI log screen from BIOS Setup Page, with a parity error.

Graphic showing a sample DMI log screen, <b>cpu fan error checking nvram</b>, with a PCI parity error shown.


  • The BIOS displays the following messages and freezes (during POST or DOS):
  • The Linux NMI cpu fan error checking nvram catches cpu fan error checking nvram interrupt and reports the following NMI "confusion report" sequence:



Note - The Linux system reboots, but does not inform the BIOS of this incident.



Handling of System Errors (SERR)

This section lists facts and considerations about how the server handles system errors (SERR).

  • System error handling works through the HyperTransport Synch Flood Error mechanism in the AMD controller.
  • The following events happen during BIOS POST:
    • POST reports of any previous system errors at the bottom of screen. See FIGURE E-5 for an example.

Graphic showing a sample POST screen, with system error listed.


    • SERR and HyperTransport Synch Flood Error are logged in DMI and the SP SEL. See the following sample output:

  • FIGURE E-6 shows an example DMI log screen from the BIOS Setup Page with a system error.

Graphic showing a sample DMI log screen, with a system error listed.


Handling Mismatching Processors

This section lists facts and considerations about how the server handles mismatching processors.

  • The BIOS performs a complete POST.
  • The BIOS displays a report of any cpu fan error checking nvram CPUs, as shown in the following example.


Note - The following example report, the names of the AMD controllers in the original Sun Fire X4100/X4200 are used.



  • No SEL or DMI event is recorded.
  • The system enters Halt mode and the following message is displayed.

Hardware Error Handling Summary

This section contains a table that summarizes the most-common hardware errors that you might encounter with these servers.

 


Error

Description

Handling

Logged (DMI Log or SP SEL)

Fatal?

SP failure

The SP fails to boot upon application of system power.

The SP controls the system reset so the system may power on but will not come out of reset.

  • During power up, the SP's boot loader turns on the power LED.
  • During SP boot, Linux startup, and SP sanity check The power LED blinks.
  • The LED is turned off when SP management code (the IPMI stack) is started.
  • At exit of BIOS POST the LED goes to STEADY ON state.

Not logged

Fatal

SP failure

SP boots but fails POST, cpu fan error checking nvram.

The SP controls the system RESET so the system will not come out of reset.

Not logged

Fatal

BIOS POST failure

Server BIOS does not pass POST.

There are fatal and non-fatal errors in POST. The BIOS does detect some errors that are announced during POST as POST codes on the bottom right corner of the display on the serial console and on the video display. Some POST codes are forwarded to the SP for logging.

The POST codes described above do not come out in sequential order and some are repeated, because some POST codes are issued by code in add-in card BIOS expansion ROMs.

In the case of early POST failures (for example, the BSP fails to operate correctly) BIOS just halts without logging.

For some other POST failures subsequent to memory and SP initialization, the BIOS logs a message to the SP's SEL.

 

 

Single-bit DRAM ECC error

With ECC enabled in the BIOS Setup, the CPU detects and corrects a single-bit error on the DIMM interface.

The CPU corrects the error in hardware. No interrupt or machine check is generated by the hardware. The polling is triggered every half-second by SMI timer interrupts, and is done by the BIOS SMI handler.

The BIOS SMI handler starts logging each detected error, and stops logging when the limit for the same error is reached. The BIOS's polling is disablable through a software interface.

SP SEL

Normal operation

Single four-bit DRAM error

With CKIP-KILL enabled in the BIOS Setup, the CPU detects and corrects for the failure of a four-bit-wide DRAM on the DIMM interface.

The CPU corrects the error in hardware. No interrupt or machine check is generated by the hardware. The polling is triggered every half-second by SMI timer interrupts, and is done by the BIOS SMI handler.

The BIOS SMI handler starts logging each detected error, and stops logging when the limit for the same error is reached. The BIOS's polling is disablable via a software interface.

SP SEL

Normal operation

Uncorrectable DRAM ECC error

The CPU detects an uncorrectable multiple-bit DIMM error.

The "sync flood" method of handling this is used to prevent the erroneous data from audio - error mp3 download propogated across the HyperTransport links. The system reboots, the BIOS recovers the machine check register information, maps this information to the failing DIMM (when CHIPKILL is disabled) or DIMM pair (when CHIPKILL is enabled), and logs that information to the SP.

The BIOS will halt the CPU.

SP SEL

Fatal

Unsupported DIMM configuration

Unsupported DIMMs are used, or supported DIMMs are loaded improperly.

The BIOS displays an error message, logs an error, and halts the system.

DMI Error 43 mimaki jv 33
SP SEL

Fatal

HyperTransport link failure

CRC error duplicate entry 0 for key primary link error on one of the HyperTransport Links

parity check/ memory parity error floods on HyperTransport links, the machine resets itself, and error information gets retained through reset.

The BIOS reports.

DMI Log
SP SEL

Fatal

PCI SERR, PERR

System or parity error on a PCI bus cpu fan error checking nvram rowspan="1" colspan="1">

Sync floods on HyperTransport links, the machine resets itself, and error information gets retained through reset.

The Cpu fan error checking nvram reports.

DMI Log
SP SEL

Fatal

BIOS POST Microcode Error

The BIOS could not find or load the CPU Microcode Update to the CPU. The message most likely appears when a new CPU is installed in a motherboard with an outdated BIOS. In this case, the BIOS must be updated.

The BIOS displays an error message, cpu fan error checking nvram, logs the error to DMI, and boots, cpu fan error checking nvram.

DMI Log

Non-fatal

BIOS POST CMOS Checksum Bad

CMOS contents failed the Checksum check.

The BIOS displays an error message, logs the error to DMI, and boots.

DMI Log

Non-fatal

Unsupported CPU configuration

The BIOS supports mismatched frequency and steppings in CPU configuration, but some CPUs might not be supported.

The BIOS displays an error message, logs the error, and halts the system.

DMI Log

Fatal

Correctable error

The CPU detects a variety samsung scx-4200 internal error false correctable errors in the MCi_STATUS registers.

The CPUcorrects the error in hardware. No interrupt or machine check is generated by the hardware. The polling is triggered every half-second by SMI timer interrupts, and is done by the BIOS SMI handler.

The SMI handler logs a message to the SP SEL if the SEL is available, otherwise SMI logs a message to DMI. The BIOS's polling is disablable through software SMI.

DMI Log
SP SEL

Normal operation

Single fan failure

Fan failure is detected by reading tach signals.

The Front Fan Fault, Service Action Required, and individual fan module LEDs are lit.

SP SEL

Non-fatal

Multiple fan failure

Fan failure is send error 0x800ccc0f connection to server interrupted by reading tach signals.

The Front Fan Fault, Service Action Required, and individual fan module LEDs are lit.

SP SEL

Fatal

Single power supply failure

When any of the AC/DC PS_VIN_GOOD or PS_PWR_OK signals are deasserted.

Service Action Required, and Power Supply/Rear Fan Tray Fault LEDs are lit.

SP SEL

Non-fatal

DC/DC power converter failure

Any POWER_GOOD signal is deasserted from the DC/DC converters.

The Service Action Required LED is lit, cpu fan error checking nvram, the system is powered down to standby power mode, and the Power LED enters standby blink state.

SP SEL

Fatal

Voltage above/below Threshold

The SP monitors cpu fan error checking nvram voltages and detects voltage above or below a given threshold.

The Service Action Required LED and Power Supply/Rear Fan Tray Fault LED blink.

SP SEL

Fatal

High temperature

the SP monitors CPU and system temperatures, and detects temperature above a given threshold.

The Service Action Required LED and System Overheat Fault LED blink. The motherboard is shut down above the specified critical level.

SP SEL

Fatal

Processor thermal trip

The CPU drives the THERMTRIP_L signal upon detecting an overtemp condition.

CPLD shuts down power to the CPU. The Service Action Required LED acpi bios error System Overheat Fault LED blink.

cpu fan error checking nvram SEL

Fatal

Boot device Failure

The BIOS is not able to boot from a device in the boot device list.

The BIOS goes to the next boot device in the list. If all devices inthe list fail, an error message is displayed, retry from beginning of list. SP can control/change boot order

DMI Log

Non-fatal


 

Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.

Cpu fan error checking nvram - question interesting

[[[ SOLVED! ]]] NF750-G55 CMOS Checksum Error

So I just got my NF750-G55 motherboard, Phenom II x6 1090T, and G.SKILL Ripjaws Series 8GB (2 x 4GB) memory in. I hooked up a Zumax ZU-500W power supply, a hard drive, a DVD burner, and used onboard DVI real quick just to get started. I didn't put anything in a case, just layed it out on my table to get it set up real quick. Sorry for the crappy pics. Photos taken with my cell phone.

I powered it on and it just displayed the MSI NF750-G55 logo and wouldn't progress further. I had the memory in the 2 right BLUE modules so I switched them over to the two left BLACK modules. Powered the system on and same thing. Just displayed the NF750-G55 logo screen and wouldn't boot past the logo.

So I switched PSU's to a Dell 450 watt that I know for a fact works. Same Result. Then I took everything out and just had the CPU/Fan, memory, and monitor hooked up. Same result just the logo screen. So then I had the idea to clear the CMOS with the jumpers and it finally booted past the logo screen into a checksum screen. It said CMOS checksum BAD press F1 to run setup. Press F2 to load default values and continue. If I go into settings and save them and exit, it boots back into the logo screen and will not continue.

So if I clear the cmos and go back to that screen and just hit F2 to save default settings and continue, it says checking NVRAM and will not continue past that screen. I have to clear the CMOS again otherwise it will not boot past the logo screen.

So if I clear the cmos and go back to that screen and hit F11 to enter the boot menu, it says entering boot menu and will not continue past that screen.  I have to clear the CMOS again otherwise it will not boot past the logo screen.

If I clear the CMOS and boot up with just one stick of RAM, CPU, and onboard graphics, it shows this screen.

If I do F1 and make changes and save, it boots back to logo screen and won't continue. If I hit F2 to load defaults and continue it does nothing. If I hit F11 to go to boot menu it does nothing. If I attach hard drive and DVD drive, it is the same results as before. Switching to other stick of RAM returns the same results as well.

Now I am all out of ideas. I  have tried a brand new motherboard battery for the CMOS but that didn't help either. Do i need to install everything in a case and continue from there? Motherboard DOA? Thanks for all your assistance!


EDIT: SOLVED! The mother board kept getting stuck at Checking NVRAM. For whatever reason the NVRAM was corrupt from the factory so flashing the BIOS rewrote over the NVRAM and it is working just fine now!

Note - This chapter applies to all Sun Fire X4100/X4100 M2 and X4200/X4200 M2 servers, unless otherwise noted.



This appendix contains information about how the servers process and log errors. See the following sections:

Handling of Uncorrectable Errors

This section lists facts and considerations about how the server handles uncorrectable errors.



Note - The BIOS ChipKill feature must be disabled if you are testing for failures of multiple bits within a DRAM (ChipKill corrects for the failure of a four-bit wide DRAM).



  • The BIOS logs the error to the SP system event log (SEL), through the board management controller (BMC).
  • The SP's SEL is updated with the failing DIMM pair's particular bank address.
  • The system reboots.
  • The BIOS logs the error in DMI.


Note - If the error is on low 1MB, the BIOS freezes after rebooting. Therefore, no DMI log is recorded.



  • An example of the error is reported by the SEL through IPMI 2.0 is as follows:
    • When low memory is erroneous, the BIOS is frozen on pre-boot low memory test because the BIOS cannot decompress itself into faulty DRAM and execute the following items:

    • When the faulty DIMM is beyond the BIOS's low 1MB extraction space, proper boot happens:

  • Note the following considerations for this revision:
    • Uncorrectable ECC Memory Error is not reported.
    • Multi-bit ECC errors are reported as .
    • On first reboot, BIOS logs a HyperTransport Error in the DMI log.
    • The BIOS disables the DIMM.
    • The BIOS sends the SEL records to the BMC.
    • The BIOS reboots again.
    • The BIOS skips the faulty DIMM on the next POST memory test.
    • The BIOS reports available memory, excluding the faulty DIMM pair.

FIGURE E-1 shows an example of a DMI log screen from BIOS Setup Page.


Graphic showing a sample DMI log screen.


Handling of Correctable Errors

This section lists facts and considerations about how the server handles correctable errors.

  • During BIOS POST:
    • The BIOS polls the MCK registers.
    • The BIOS logs to DMI.
    • The BIOS logs to the SP SEL through the BMC.
  • The feature is turned off at OS boot time by default.
  • The following Linux versions report correctable ecc syndrome and memory fill errors in , if kernel flag is indicated at boot time, or if is enabled through kernel compile or installation:
    • RH3 Update5 single core
    • RH4 Update1+
    • SLES9 SP1+
  • The Linux kernel () repeats a report every 30 seconds until another error is encountered and a flag is reset.
  • Solaris support provides full self-healing and automated diagnosis for the CPU and Memory subsystems.
  • FIGURE E-2 shows an example of a DMI log screen from BIOS Setup Page:

Graphic showing a sample DMI log screen, with a correctable error shown.


  • If during any stage of memory testing the BIOS finds itself incapable of reading/writing to the DIMM, it takes the following actions:
    • The BIOS disables the DIMM as indicated by the Memory Decreased message in the example in FIGURE E-3.
    • The BIOS logs an SEL record.
    • The BIOS logs an event in DMI.

Graphic showing a sample DMI log screen, with a correctable error and memory decreased message.


Handling of Parity Errors (PERR)

This section lists facts and considerations about how the server handles parity errors (PERR).

  • The handling of parity errors works through NMIs.
  • During BIOS POST the NMI is logged in the DMI and the SP SEL. See the following example command and output:

  • FIGURE E-4 shows an example of a DMI log screen from BIOS Setup Page, with a parity error.

Graphic showing a sample DMI log screen, with a PCI parity error shown.


  • The BIOS displays the following messages and freezes (during POST or DOS):
  • The Linux NMI trap catches the interrupt and reports the following NMI "confusion report" sequence:



Note - The Linux system reboots, but does not inform the BIOS of this incident.



Handling of System Errors (SERR)

This section lists facts and considerations about how the server handles system errors (SERR).

  • System error handling works through the HyperTransport Synch Flood Error mechanism in the AMD controller.
  • The following events happen during BIOS POST:
    • POST reports of any previous system errors at the bottom of screen. See FIGURE E-5 for an example.

Graphic showing a sample POST screen, with system error listed.


    • SERR and HyperTransport Synch Flood Error are logged in DMI and the SP SEL. See the following sample output:

  • FIGURE E-6 shows an example DMI log screen from the BIOS Setup Page with a system error.

Graphic showing a sample DMI log screen, with a system error listed.


Handling Mismatching Processors

This section lists facts and considerations about how the server handles mismatching processors.

  • The BIOS performs a complete POST.
  • The BIOS displays a report of any mismatching CPUs, as shown in the following example.


Note - The following example report, the names of the AMD controllers in the original Sun Fire X4100/X4200 are used.



  • No SEL or DMI event is recorded.
  • The system enters Halt mode and the following message is displayed.

Hardware Error Handling Summary

This section contains a table that summarizes the most-common hardware errors that you might encounter with these servers.

 


Error

Description

Handling

Logged (DMI Log or SP SEL)

Fatal?

SP failure

The SP fails to boot upon application of system power.

The SP controls the system reset so the system may power on but will not come out of reset.

  • During power up, the SP's boot loader turns on the power LED.
  • During SP boot, Linux startup, and SP sanity check The power LED blinks.
  • The LED is turned off when SP management code (the IPMI stack) is started.
  • At exit of BIOS POST the LED goes to STEADY ON state.

Not logged

Fatal

SP failure

SP boots but fails POST.

The SP controls the system RESET so the system will not come out of reset.

Not logged

Fatal

BIOS POST failure

Server BIOS does not pass POST.

There are fatal and non-fatal errors in POST. The BIOS does detect some errors that are announced during POST as POST codes on the bottom right corner of the display on the serial console and on the video display. Some POST codes are forwarded to the SP for logging.

The POST codes described above do not come out in sequential order and some are repeated, because some POST codes are issued by code in add-in card BIOS expansion ROMs.

In the case of early POST failures (for example, the BSP fails to operate correctly) BIOS just halts without logging.

For some other POST failures subsequent to memory and SP initialization, the BIOS logs a message to the SP's SEL.

 

 

Single-bit DRAM ECC error

With ECC enabled in the BIOS Setup, the CPU detects and corrects a single-bit error on the DIMM interface.

The CPU corrects the error in hardware. No interrupt or machine check is generated by the hardware. The polling is triggered every half-second by SMI timer interrupts, and is done by the BIOS SMI handler.

The BIOS SMI handler starts logging each detected error, and stops logging when the limit for the same error is reached. The BIOS's polling is disablable through a software interface.

SP SEL

Normal operation

Single four-bit DRAM error

With CKIP-KILL enabled in the BIOS Setup, the CPU detects and corrects for the failure of a four-bit-wide DRAM on the DIMM interface.

The CPU corrects the error in hardware. No interrupt or machine check is generated by the hardware. The polling is triggered every half-second by SMI timer interrupts, and is done by the BIOS SMI handler.

The BIOS SMI handler starts logging each detected error, and stops logging when the limit for the same error is reached. The BIOS's polling is disablable via a software interface.

SP SEL

Normal operation

Uncorrectable DRAM ECC error

The CPU detects an uncorrectable multiple-bit DIMM error.

The "sync flood" method of handling this is used to prevent the erroneous data from being propogated across the HyperTransport links. The system reboots, the BIOS recovers the machine check register information, maps this information to the failing DIMM (when CHIPKILL is disabled) or DIMM pair (when CHIPKILL is enabled), and logs that information to the SP.

The BIOS will halt the CPU.

SP SEL

Fatal

Unsupported DIMM configuration

Unsupported DIMMs are used, or supported DIMMs are loaded improperly.

The BIOS displays an error message, logs an error, and halts the system.

DMI Log
SP SEL

Fatal

HyperTransport link failure

CRC or link error on one of the HyperTransport Links

Sync floods on HyperTransport links, the machine resets itself, and error information gets retained through reset.

The BIOS reports, .

DMI Log
SP SEL

Fatal

PCI SERR, PERR

System or parity error on a PCI bus

Sync floods on HyperTransport links, the machine resets itself, and error information gets retained through reset.

The BIOS reports, .

DMI Log
SP SEL

Fatal

BIOS POST Microcode Error

The BIOS could not find or load the CPU Microcode Update to the CPU. The message most likely appears when a new CPU is installed in a motherboard with an outdated BIOS. In this case, the BIOS must be updated.

The BIOS displays an error message, logs the error to DMI, and boots.

DMI Log

Non-fatal

BIOS POST CMOS Checksum Bad

CMOS contents failed the Checksum check.

The BIOS displays an error message, logs the error to DMI, and boots.

DMI Log

Non-fatal

Unsupported CPU configuration

The BIOS supports mismatched frequency and steppings in CPU configuration, but some CPUs might not be supported.

The BIOS displays an error message, logs the error, and halts the system.

DMI Log

Fatal

Correctable error

The CPU detects a variety of correctable errors in the MCi_STATUS registers.

The CPUcorrects the error in hardware. No interrupt or machine check is generated by the hardware. The polling is triggered every half-second by SMI timer interrupts, and is done by the BIOS SMI handler.

The SMI handler logs a message to the SP SEL if the SEL is available, otherwise SMI logs a message to DMI. The BIOS's polling is disablable through software SMI.

DMI Log
SP SEL

Normal operation

Single fan failure

Fan failure is detected by reading tach signals.

The Front Fan Fault, Service Action Required, and individual fan module LEDs are lit.

SP SEL

Non-fatal

Multiple fan failure

Fan failure is detected by reading tach signals.

The Front Fan Fault, Service Action Required, and individual fan module LEDs are lit.

SP SEL

Fatal

Single power supply failure

When any of the AC/DC PS_VIN_GOOD or PS_PWR_OK signals are deasserted.

Service Action Required, and Power Supply/Rear Fan Tray Fault LEDs are lit.

SP SEL

Non-fatal

DC/DC power converter failure

Any POWER_GOOD signal is deasserted from the DC/DC converters.

The Service Action Required LED is lit, the system is powered down to standby power mode, and the Power LED enters standby blink state.

SP SEL

Fatal

Voltage above/below Threshold

The SP monitors system voltages and detects voltage above or below a given threshold.

The Service Action Required LED and Power Supply/Rear Fan Tray Fault LED blink.

SP SEL

Fatal

High temperature

the SP monitors CPU and system temperatures, and detects temperature above a given threshold.

The Service Action Required LED and System Overheat Fault LED blink. The motherboard is shut down above the specified critical level.

SP SEL

Fatal

Processor thermal trip

The CPU drives the THERMTRIP_L signal upon detecting an overtemp condition.

CPLD shuts down power to the CPU. The Service Action Required LED and System Overheat Fault LED blink.

SP SEL

Fatal

Boot device Failure

The BIOS is not able to boot from a device in the boot device list.

The BIOS goes to the next boot device in the list. If all devices inthe list fail, an error message is displayed, retry from beginning of list. SP can control/change boot order

DMI Log

Non-fatal


 

Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.

Thread: Bios refusing to post, O4 error.

For some time I have been getting random post errors,*
These errors have become more frequent recently which started me thinking I had borked the delid, so as these errors were now at 7 out of ten attempts fail to boot, I decided to completely strip rig down ( even redo the delid) and rebuild it scrupulously careful.
Upon delidding cpu, it was a perfect spread, so noproblems there! Re applied the lm and resealed.
So, rig completely rebuilt, I power it up and exactly the same random errors, from D5, O4 test nvram, Install gpu bios *and check cpu errors,
I have flashed the bios, cleared the bios so many times, so I just continued rebooting in the hope it would settle eventually ( sometimes it does) and yes it settled on ‘04 test nvram’ *which according to the manual is “ pch initialisation before microcode loading”
O4 is now a permanent boot fail , which, at this minute I don’t have a clue how to rectify???
I think D5 is out of resources not enough space .
I have taken bottomshield off mobo but can’t find cmos batterie?
Any suggestions guys?
Thanks in advance.*
*Spec as below.*
All latest drivers and bios.*

HELP!!!! "Checking NVRAM error"

Hi Dave,
I checked the specs on your motherboard... They are here:
<A HREF="http://www.msicomputer.com/product/chipset.asp?chipset=via_kt400" target="_new">http://www.msicomputer.com/product/chipset.asp?chipset=via_kt400</A>


First thing... hook up your in-case speaker, if you haven't already. It will give you beeps when there are errors.

It's an AMI bios, so the information on POST sequence and BEEP codes is here:
<A HREF="http://www.ami.com/support/doc/AMIBIOS-codes.pdf" target="_new">http://www.ami.com/support/doc/AMIBIOS-codes.pdf</A>

You should be able to tell from the AMIBIOS docs, where your system is bombing out.

From your description, it sounds like it could be bad ram. The BIOS needs a minimum of 64k of good ram to start up. I noticed, while checking your motherboard specs, that they say very clearly that it only supports a very short list of memory modules... You may have to buy from their list to get it to work.

Hope this helps...




<b>(</b>It ain't better if it don't work.<b>)</b>

Support Articles

NOTE: If the System will not power on, skip to the end of this article.

If the system boots, but takes a long time to boot, crashes, or reports other random, hard to track down errors, then the individual hardware components can be checked for failure.

Memory

We can test memory in your running OS with the 'memtest' package. You want to put most of your memory under test but still leave enough space for your normal workload and the OS to continue running. On a 8 GB system, testing 6 GB would be tested like this:

Memory test can take a number of hours. While this will not put all of memory under test, it will make any memory error likely to cause instability if not part of the tested memory or show up clearly with errors in the run

Memtest86++ also has ISO downloads for personal use. You would boot from a USB drive made with the ISO. Right as memtest loads (blue screen), press to enable multi-core mode. Wait at least 20 minutes for the tests to run, or until any errors are shown in red. If any errors are found, please run it again in single core mode, and let it run overnight to check for any memory errors. 6 to 8 passes are minimally recommended. If memory errors show up, the memory stick should be replaced.

Hard Drive

To check the hard drive for disk failures, start the program Disks, select the hard drive on the left, then click the icon in the top right, and choose SMART Data and Self-Tests, and then click Start Self-test and choose the Extended test. This test takes a few hours to run and will will give you a large amount of info about the health of the drive.

All of the values start at 100, and work their way down to 0. The terms "old-age" and "pre-fail" are normal. Pay attention to the overall assessment, and to how close the values are working towards the failure point, which is typically 0.

NVMe Drive

NVMe drives can't be checked with a SMART Test through the Disks application but the package smartmontools can be used for this. It can be installed with this command:

First, let's list the NVMe's that are installed:

Under 'Node' you will see a mount path for each drive something like '/dev/nvme0n1', to access the smart-log you would type in the following:

Testing the CPU

Using the stress-ng program

Run this command to install stress-ng:

Using the s-tui program

Now this command:

From here use the to switch from Monitor to Stress by pressing the . Now watch the CPU temperatures raise as the system's CPU is tested.

Testing the GPU

Benchmarking

We can confirm whether there is an issue with the GPU in your system by using a benchmarking tool called Unigine Heaven.

Click the 'Free Download' button and choose the Linux option in the dropdown. Once the download is complete, there should be a file in the Downloads directory.

From a terminal, navigate to the folder with the Unigine Heaven download:

Run the following command:

Then, the application can be extracted:

Next, let's move to the new directory that was created:

Now, the application can be started:

Click the 'Run' button to begin the program.

GPU Burn (for NVIDIA GPU's only)

We can also test the GPU by using GPU Burn; first, if we're on Ubuntu, we'll need to install git and CUDA with this command:

Then, we will create the symlink for gpu-burn:

Next, we can clone the repository with this command:

Now that we have cloned it, we can move into that directory like so:

Now we'll compile it:

And now we can run it like so (this example will run it for 60 minutes/1 hour):

Test CPU thermals

If the CPU fan is spinning erratically, or you are experiencing random shutdowns, this may be the result of a thermal issue. To investigate this, we'll use tools that can display CPU temperatures.

Modern hardware is designed to shut systems down when they reach temperatures that may be damaging to the internal components. Typically, these thresholds are in the upper 80s or 90s Celsius, depending on hardware.

If your system is spontaneously shutting down, this may be caused by overheating. Systems with dedicated GPUs tend to run hot under normal circumstances, so noticing an overheating problem can be challenging from ambient temperature alone.

The temperatures of your CPU cores and GPU card can be checked through software.

Run (command line tool)

(installed by default on Pop!_OS) is a text-based tool that runs in a .

  1. Install (If not installed)

    Open a with + (Pop!_OS) or ++ (Ubuntu) run the following commands:

  2. Get Sensor Output

This command will generate output like this:

Psensor (GUI)

If you prefer a GUI tool which provides graphing over time, the application Psensor can be installed from the Pop!_Shop, or through the with this command:

  1. Install ():

  2. Install Through Pop!_Shop

  3. Run Psensor:

    In a run:

Or, to launch through the OS interface, click on "Activities" in the top-left (Pop!_OS 20.04 LTS, or Ubuntu), or "Applications" (Pop!_OS COSMIC) and search for "Psensor"

High Temperatures

If the system temperatures are abnormally high, the fans may need replaced, and/or the thermal compound may need re-applied to the CPU and GPU cores.

Specific instructions for working on your hardware model can be found here

Quotes for replacement fans and thermal paste can be generated on open support tickets. To open a support ticket, visit this link

Machine Check Exceptions

Machine Check Exceptions are hardware failure events and can be logged with rasdaemon.service to journalctl. On Ubuntu based systems (and Pop!_OS) you can install via:

verify rasdaemon is active

Then, after the system has crashed or been used for a period of time, take a look at the log:

If there is no log or the log is empty, then the crash isn't related to a hardware failure. The log will stay empty until a MCE happens. Take a look for "uncorrected" errors, as most "corrected" errors can be ignored. If there are a consistent number of "uncorrected" errors, the hardware should be examined.

Won't Power On

NOTE: If the system fails to power on, please use the following articles to troubleshoot:

DesktopsLaptops

Support

Please contact support by opening a ticket to get the system repaired or to have failed components replaced.

CPU IERR

IERR is issued when internal CPU operation error occurs. BIOS logs the event and identifies which CPU issues it. See “Jumper and Connector Information” in Appendix A

 for the location of CPU.

Check CPU.

CPU Thermal Trip

Thermtrip is issued when internal CPU temperature is too high. BIOS logs this event and identifies which CPU issues it. See “Jumper and Connector Information” in Appendix A

 for location of CPU.

Restart system.

Check CPU fan connector.

Check CPU fan.

Check CPU.

CPU Processor Disabled

CPU was disabled, as detect result shows abnormal result. It could be CPU thermal issue.

Check CPU fan.

Check CPU.

CPU Temperature

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

CPU temperature is abnormal. See note for threshold setting.

Restart system.

Check Fan connection.

Check Fan.

Check CPU.

Check System board.

CPU Voltage

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

CPU voltage is abnormal. See note for threshold setting.

Check CPU.

Check power supply.

Check system board.

CPU Fan

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

CPU fan is abnormal. See note for threshold setting.

Check CPU fan connector.

Check CPU fan.

System Board Temperature

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

System board temperature is abnormal. See note for threshold setting.

Restart system.

Check system fans.

Room Temperature

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-critical Going High

Upper Critical Going Low
Upper Critical Going High

Room temperature is abnormal. See note for threshold setting.

Adjust room temperature.

System Board Voltage

1.5V/1.8V/2.5V/2.85V for SCSI 1/3.3V/3.3V Standby/5V/2.85V for SCSI 2/12V/-12V/Cache 1/2/Cache 3/4/5V Standby

Lower Critical Going Low
Lower Critical Going High

Upper Non-critical Going Low
Upper Non-Critical Going High

Upper Critical Going Low
Upper Critical Going High

System board voltage is abnormal. See note for threshold setting.

Check power supply.

Remove some devices which are using voltage to reduce system loading.

Check system board.

Keyboard/Mouse Fuse

Current of keyboard/mouse is over the system limit.

Check keyboard/mouse.

USB1/USB2/USB3/USB4 Fuse

 Current of devices connected in USB1/USB2/USB3/USB4 is over the system limit.

Check devices connected in the designated USB port.

Power Supply Predictive Failure

Power supply is dead.

Check power supply.

Chassis Fan Assertion

Chassis fan is dead.

Check chassis fan.

Watchdog BIOS/POST

POST is not completed.

Check BIOS checkpoints list.

Watchdog OS/Load

Problem in loading OS.

Check hard disk.

Watchdog SMS/OS

OS hangs, after loaded.

Check BIOS event log.

Check OS.

Watchdog No Action

System hangs, setting is No Action.

Revise Watchdog settings, if prefer actions automatically carried out by systems.

Watchdog Hard Reset

System hangs, auto Reset.

Revise Watchdog settings, if prefer actions other than Reset automatically carried out by systems.

Watchdog Power Off

System hangs, auto Power Off.

Revise Watchdog settings, if prefer actions other than Power Off automatically carried out by systems.

Watchdog Power Cycle

System hangs, auto Power Cycle.

Revise Watchdog settings, if prefer actions other than Power Cycle automatically carried out by systems.

NVRAM SDR Checksum Error

NVRAM SDR data was damaged.

Rewrite NVRAM.

Replace NVRAM.

NVRAM SEL Checksum Error

NVRAM SEL data was damaged.

Replace NVRAM.

NVRAM FRU Checksum Error

NVRAM FRU data was damaged.

Rewrite NVRAM.

Replace NVRAM.

EMP Remote Login Password Fail

Password error.

Get the correct password.

EMP BMC Disable CPU

BMC disables CPU, after detecting abnormal status of CPU.

Reset the system. If the problem remains, replace CPU.

BIOS Post (Event Data 2)

The event data 2 is POST error code. Setup has to find the POST message table for displaying the message.

Check POST message table.

Secure Model Violation

Unauthorized access.

Follow the correct procedure to access units.

Pre-boot Password Violation-User Password

Incorrect user password.

Get the correct user password.

Pre-boot Password Violation-Setup Password

Incorrect setup password.

Get the correct setup password.

DIMM/RIMM Correctable ECC Error

Memory has ECC (error check and correction) error, but system is able to correct it automatically.

No action needed, but if errors reoccur, check the memory.

DIMM/RIMM Uncorrectable ECC Error

Memory has ECC (error check and correction) error, and system is unable to fix it.

Check memory.

PCI PERR (Parity Error)

Error occurs on PCI-related on-board chipset while doing parity checking. This error message indicates the on-board chipset location which is bus 1 device 0 and function 1. See schematics for location of the chipset.

Check system board.

PCI SERR (System Error)

Error occurs on device or add on card of PCI slot.

Check add-on card.

Check PCI device.

Hard Disk Drive Fault

Errors occur in hard disk drive.

Check HDD.

Drive Backplane Fan Fault Assertion

Errors occur in drive backplane fan.

Check drive backplane fan.

cpu fan error checking nvram

0 Comments

Leave a Comment