01-01-2012 09:59 AM
Cyclic redundancy error in log means media went bad?
Generally in what circustances we decide media went bad.
Solved! Go to Solution.
01-02-2012 11:27 PM
From TN http://www.symantec.com/docs/TECH169477, CRC issues are specifically mentioned under 'Read/ Write errors' section
Troubleshooting Drive/ Library Issues in NetBackup
This Document provides you with information on various tape drive issues that maybe encountered whilst using NetBackup and how to deal with them.
It is important to understand that NBU does not write data to a drive, for example when using Solaris, NetBackup relies on the operating system to write the data to the tape using the st tape driver. The only 'slight' involvement with NetBackup, is that it specifies the blocksize to use, but this is still passed to the operating system. Other operating systems work in a similar manner.
The scsi pass-through driver (sg driver on solaris) - allows scsi commands to be passed directly to the drive. These are scsi 'commands' such as 'test-unit-ready', which is used, for example, when mounting a tape. On occasion it is necessary to recreate/ rebuild the pass-through driver. The common symptom that involves the pass-through driver is that the scan command does not show the devices. Other issues involving the pass-through driver are very rare.
The Scan command shows no devices at all, or, that some of the devices, or all of the devices appear and reappear when the command is run repeatedly.
Firstly, it must be confirmed that the operating system can see and communicate correctly with the tape drives.
The devices appearing in (for example) 'Device Manager' (Windows) or cfgadm (Solaris) is NOT necessarily sufficient confirmation that the devices are correctly configured to the operating system.
It has been seen that although devices 'appear' to be visible to the operating system, san issues prevented full/ correct communication, and as a result, the scan command failed.
Two things need to be checked before further troubleshooting is carried out:
1/ Check no backups are running on the drives (only applicable if the drives are shared). A scsi reservation of a drive due to a backup, may prevent the drive from responding to, and thus appearing in the output of the scan command.
2/ Rebuild the 'pass through' driver (Unix only). If the drive/ operating system configuration has not changed, this is very unlikely to be the issue, but it can be eliminated from being the cause by recreating the 'pass through' files. See the device configuration guide for information on how to do this.
Aside of the exceptions above issues with the scan command are not caused by NetBackup, when it is understood how the scan command works, it is clear how the issues are outside of NetBackup.
Although the scan command is supplied by Symantec, it does not issue any NetBackup commands, or interact with NetBackup in any way. When run, it issues 'operating system' SCSI commands to the devices configured in the operating system, the output of the command is sent from the devices. There are no settings, 'tuning' or troubleshooting that can be performed on the scan command.
Windows servers do not require a pass through driver. Providing that there are no backups running on other servers that may share the drives, then the issue will be caused by either a san issue, firmware, hardware or driver issue. Consideration should be given to san infrastructure (eg switches), HBAs or the physical drive/ library.
Unix servers require a pass through driver, for example, on Solaris this is called the sg driver. This is required as the scsi commands issued to query the device cannot be passed to the devices via the regular operating system driver.
Once the sg driver is configured, providing the configuration is not changed, there should be no issue with the pass through driver. If the scan command shows devices appearing and re-appearing, then the pass through driver is not the cause. If the devices, or device, permanently disappear, it may be worth reconfiguring the pass through driver. If the issue is not resolved, then the issue will be as per Windows servers, that is, san infrastructure (eg switches), HBAs or the physical drive/ library. Consideration should also be given to HBA configeration files, as incorrect settings in these have been seen to prevent output from the scan command being returned.
Providing the 'pass through' driver is configured (Unix only) Symantec recommends that to further investigate scan command issues, the operating system /san administrators, or hardware vendors are consulted.
Oct 11 08:59:31 media bptm[3771]: [ID 228150 daemon.warning] TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive TLD0_LTO4_DRIVE1 (index 4), Media Id R0TP01
<2> write_data: block position check: actual 62504, expected 31254
1/11/2010 7:50:13 AM - Error bptm(pid=3364) ioctl (MTREW) failed on media id W00229, drive index 0, The I/O bus was reset. (1111) (bptm.c.8039)
Error bptm (pid=2164) ioctl (MTWEOF) failed on media id V01497, drive index 0, The physical end of the tape has been reached.
Example 1
write_data: cannot write image to media id XXXXXX, drive index #, Data error (cyclic redundancy check).
Example 2
io_write_block: write error on media id MIR107, drive index 0, writing header block, 1117
Example 3
Error bptm(pid=5268) cannot read image from media id 500507, drive index 1, err = 234
Associated Documentation:
http://www.symantec.com/docs/TECH124594 - "Description of Tape Alerts and code definitions"
http://www.symantec.com/docs/TECH83129 - "Robtest command that can be used to test the SCSI functionality of a robot"
|
Article URL http://www.symantec.com/docs/TECH169477
01-01-2012 10:58 AM
Some TechNotes - some old, some new, even BE (as media errors are not unique to NBU)....
http://www.symantec.com/docs/TECH35336
http://www.symantec.com/docs/TECH5433
http://www.symantec.com/docs/TECH5325
http://www.symantec.com/docs/TECH139183
http://www.symantec.com/docs/TECH169477
If you Google 'Event ID 23 cyclic redundancy check' you will see that this is not a NetBackup error - this is an error reported by the OS while attempting to write to media. Extract from http://www.symantec.com/docs/TECH43243 :
As an application, NetBackup has no direct access to a device, instead relying on the operating system (OS) to handle any communication with the device. This means that during a write operation NetBackup asks the OS to write to the device and report back the success or failure of that operation. If there is a failure, NetBackup will merely report that a failure occurred, and any troubleshooting should start at the OS level. If the OS is unable to perform the write, there are three likely causes; OS configuration, a problem on the SCSI path, or a problem with the device.
01-01-2012 01:16 PM
01-01-2012 07:10 PM
in what circustances we decide media went bad?
If the error occurs only on a particular media and all other medias are working fine then you can suspect that media.
If the error occurs with all the medias, then tape drive could be the culprit. If OS is windows and prior to 2008 stop and disable Removable Storage Manager service. And check the functionality of tape drive with vendor specific diagnostic utility.
Give a read: https://www-secure.symantec.com/connect/blogs/facing-any-issues-your-tape-device-try
01-02-2012 11:27 PM
From TN http://www.symantec.com/docs/TECH169477, CRC issues are specifically mentioned under 'Read/ Write errors' section
Troubleshooting Drive/ Library Issues in NetBackup
This Document provides you with information on various tape drive issues that maybe encountered whilst using NetBackup and how to deal with them.
It is important to understand that NBU does not write data to a drive, for example when using Solaris, NetBackup relies on the operating system to write the data to the tape using the st tape driver. The only 'slight' involvement with NetBackup, is that it specifies the blocksize to use, but this is still passed to the operating system. Other operating systems work in a similar manner.
The scsi pass-through driver (sg driver on solaris) - allows scsi commands to be passed directly to the drive. These are scsi 'commands' such as 'test-unit-ready', which is used, for example, when mounting a tape. On occasion it is necessary to recreate/ rebuild the pass-through driver. The common symptom that involves the pass-through driver is that the scan command does not show the devices. Other issues involving the pass-through driver are very rare.
The Scan command shows no devices at all, or, that some of the devices, or all of the devices appear and reappear when the command is run repeatedly.
Firstly, it must be confirmed that the operating system can see and communicate correctly with the tape drives.
The devices appearing in (for example) 'Device Manager' (Windows) or cfgadm (Solaris) is NOT necessarily sufficient confirmation that the devices are correctly configured to the operating system.
It has been seen that although devices 'appear' to be visible to the operating system, san issues prevented full/ correct communication, and as a result, the scan command failed.
Two things need to be checked before further troubleshooting is carried out:
1/ Check no backups are running on the drives (only applicable if the drives are shared). A scsi reservation of a drive due to a backup, may prevent the drive from responding to, and thus appearing in the output of the scan command.
2/ Rebuild the 'pass through' driver (Unix only). If the drive/ operating system configuration has not changed, this is very unlikely to be the issue, but it can be eliminated from being the cause by recreating the 'pass through' files. See the device configuration guide for information on how to do this.
Aside of the exceptions above issues with the scan command are not caused by NetBackup, when it is understood how the scan command works, it is clear how the issues are outside of NetBackup.
Although the scan command is supplied by Symantec, it does not issue any NetBackup commands, or interact with NetBackup in any way. When run, it issues 'operating system' SCSI commands to the devices configured in the operating system, the output of the command is sent from the devices. There are no settings, 'tuning' or troubleshooting that can be performed on the scan command.
Windows servers do not require a pass through driver. Providing that there are no backups running on other servers that may share the drives, then the issue will be caused by either a san issue, firmware, hardware or driver issue. Consideration should be given to san infrastructure (eg switches), HBAs or the physical drive/ library.
Unix servers require a pass through driver, for example, on Solaris this is called the sg driver. This is required as the scsi commands issued to query the device cannot be passed to the devices via the regular operating system driver.
Once the sg driver is configured, providing the configuration is not changed, there should be no issue with the pass through driver. If the scan command shows devices appearing and re-appearing, then the pass through driver is not the cause. If the devices, or device, permanently disappear, it may be worth reconfiguring the pass through driver. If the issue is not resolved, then the issue will be as per Windows servers, that is, san infrastructure (eg switches), HBAs or the physical drive/ library. Consideration should also be given to HBA configeration files, as incorrect settings in these have been seen to prevent output from the scan command being returned.
Providing the 'pass through' driver is configured (Unix only) Symantec recommends that to further investigate scan command issues, the operating system /san administrators, or hardware vendors are consulted.
Oct 11 08:59:31 media bptm[3771]: [ID 228150 daemon.warning] TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive TLD0_LTO4_DRIVE1 (index 4), Media Id R0TP01
<2> write_data: block position check: actual 62504, expected 31254
1/11/2010 7:50:13 AM - Error bptm(pid=3364) ioctl (MTREW) failed on media id W00229, drive index 0, The I/O bus was reset. (1111) (bptm.c.8039)
Error bptm (pid=2164) ioctl (MTWEOF) failed on media id V01497, drive index 0, The physical end of the tape has been reached.
Example 1
write_data: cannot write image to media id XXXXXX, drive index #, Data error (cyclic redundancy check).
Example 2
io_write_block: write error on media id MIR107, drive index 0, writing header block, 1117
Example 3
Error bptm(pid=5268) cannot read image from media id 500507, drive index 1, err = 234
Associated Documentation:
http://www.symantec.com/docs/TECH124594 - "Description of Tape Alerts and code definitions"
http://www.symantec.com/docs/TECH83129 - "Robtest command that can be used to test the SCSI functionality of a robot"
|
Article URL http://www.symantec.com/docs/TECH169477