Read error of raw partition

Susan_Dorsey · ‎09-08-2006

Hey all,

My first post to the forum, and of course like all noobs it's a problem. I'm hoping someone else has some experience with this strange error.

Site background:
NBU 6 MP2
Solaris 9 sparc master/media server
Linux RHEL AS 64-bit media server to handle the raw partition backups, since Solaris wants to write a label on the disk.
L180 tape library
All on a SAN connected to Sun 6920 storage

Problem description:
Carved a test 36 GB LUN out of the 6920 to test our brand spanking new (and not yet a proven concept) backup solution. Put test data on it and ran a backup of the three partitions raw. Raw is the only way the proprietary appliance system can be backed up.

Backup failed on the third partition, after first two were successful. Error:

09/08/2006 14:50:31 - Error bpbrm (pid=20876) from client mirabu2: ERR - Read error at block 67850524 reading 262144 bytes in file /dev/sdg3. Errno = 5: Input/output error
09/08/2006 14:50:45 - end writing; write time: 0:21:01 the backup failed to back up the requested files (6)

At first we figured this was a bad block on the disk that simply needed to be written to before the system would recognize that bad block. So, we made another 36 GB test LUN from different segment of the storage, loaded test data, and started a backup. Same error message. Exactly the same, even down to the same "bad" block number.

Meanwhile, wrote 0's and 1's to the first test LUN in order to write to every block to trigger the recognition of the bad disk. But no failure occurred.

So, now it seems like NetBackup is having a problem with the block size of the read, perhaps the size represented by the storage feed and the block size that NBU is trying to read are causing this error. My questions are:

1. Has anyone encountered this error while trying to read/backup raw filesystems?

2. Where does one tune the block size for NBU 6 MP2? I know about the buffer size tuning in SIZE_DATA_BUFFERS and NET_BUFFER_SZ, but is block size tuneable?

Thanks in advance for all responses.

Susan_Dorsey · ‎09-11-2006

Apparently this is a pretty unusual situation and error. I opened a case with Symantec and will post the answer to this puzzle if/when it's resolved. Since NetBackup has no problem backing up 700 GB LUNs, we're going to try a different sized test LUN to determine if it's simply a block size mismatch.

I'm surprised no one out there has any thoughts on the block size question...

Thanks

zippy · ‎09-12-2006

Susan,

What kind of data is on the raw disk?

Oracle, Informix?

Where do you see the "bad block error"?

Netbackup takes "blocks of data" and dumps the "blocks of data" to tape, the term "block" within netbackup that you referance only pertains to netbackup and how it sends data to the tape drive.

As for a bad block on your UNIX servers / disk arrary....

Raw partition is accessed in character mode, so IO is faster than fs partition which is accessed by block mode. with raw partitions you can do bulk IO's.

Raw partitions are generally used for databases by RDBMS softwares.

You cant access individual files from raw partition. Because files stored in raw partitions are managed by the softwares that use that. OS does'nt know what is stored on this partition.

You cant fsck and raw partitions is absolutely meaningless. There is no underlying structure to a raw partition. From the view of the host, it's simply a contigious bunch of disk blocks. You can dd it to look for bad disk blocks but that's it.

JD

Dennis_Strom · ‎09-12-2006

James,
Thank you. I wish I could have posted that. Oh by the way I stole your post and put it in my notes to read again later.

desMessage was edited by:
Dennis Strom

VOX

Read error of raw partition