Grub Load Read Errors; Is the drive salvageable?

Postby chicagocoyote » Fri Dec 24, am

I have a single drive in my system, a SeaGate 76GB SCSI drive.
I am running CentOS

My machine locked up recently with the following message on
all xterms:

journal commit I/O error.

This machine has a separate /boot partition _however_ that boot
partition is ext2, not ext3 (and therefore has no journalling).

On rebooting (multiple times) the machine would fail to reboot
and I would see one of the two following messages:

Grub Loading Stage2 Read Error
Grub Read Error

"smartctl -H /dev/sda" yielded:

SMART Heath Status OK

On booting of a CD in "linux rescue" mode, all disk partitions
mounted except /boot.

I executed "fdisk /dev/sda1" sda1 being the /boot partition
and though I executed no commands to alter the partition,
I exited with "w" to save changes as opposed to "q" which
would exit without saving changes (even though I executed
no "partition altering" commands).

Now, however, a "linux rescue" boot finds and mounts all
partitions including the /boot partition.

I tried

grub> root (hd0,0)
grub> setup (hd0)

and then I tried:

grub-install --root-directory=/ /dev/sda1

and both times, the machine would still not boot but again get
hung up on one of the following messages:

Grub Loading Stage Read Error
Grub Read Error

I deleted the Stage files, reran grub again, and sure,
the Stage error message went away but the machine would
now get hung up on "Stage2 error".

I finally INSTALLED a new OS, CentOS , the install completed
but on booting, I still get "Grub Stage2" errors.
On this re-install, I left all partitions sizes the same and
so /boot was the same size, same location.

And so, I can't boot off of this disk. I normally would suspect
that the disk has to be discarded _expect_ that in rescue mode
I can mount all partitions and have access to all files.

What more can I do? Is there some way to repair the MBR
that I have not done?

One more piece of information. In /var/log/messages, for the last
month, I was seeing the following information regularly at least
once every two days:

> Dec 21 localhost kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins Dec 21 localhost kernel: scsi0: Dumping Card State while idle, at SEQADDR 0x18
> Dec 21 localhost kernel: Card was paused
> Dec 21 localhost kernel: ACCUM = 0x3, SINDEX = 0x48, DINDEX = 0xe4, ARG_2 = 0x0
> Dec 21 localhost kernel: HCNT = 0x0 SCBPTR = 0x2
> Dec 21 localhost kernel: SCSIPHASE[0x0] SCSISIGI[0x4] ERROR[0x0] SCSIBUSL[0x0]
> Dec 21 localhost kernel: LASTPHASE[0x1] SCSISEQ[0x1a] SBLKCTL[0xa] SCSIRATE[0x0]
> Dec 21 localhost kernel: SEQCTL[0x10] SEQ_FLAGS[0xc0] SSTAT0[0x0] SSTAT1[0x0]
> Dec 21 localhost kernel: SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xa4]
> Dec 21 localhost kernel: SXFRCTL0[0x80] DFCNTRL[0x0] DFSTATUS[0x89]
> Dec 21 localhost kernel: STACK: 0xe2 0x 0x 0x17

See the attached file for the complete messages.

Thank you to those who stayed with this post!

