I find the error code 41 to make the backup failed, check /var/adm/ there are number of the I/O error for the month, with this error , the backup job failed from time to time. I use the different media to run backup but still the same error on the one policy. I check the policy to run backup successful today, but last few days it run failed. To check the server side , it is no connectivity error, I don't think it relatives the timeout issue, as the backup job working fine today, so I never set the timout paramenter from 300 to 1800. This library and backup occur the problem from time to time, and feel very depress and human power to find the solution to fix the problem. Therefore I would like if there is way to fix this problem. Thank you
Solved! Go to Solution.
Ignore the status 41.
You have a problem with media and/or tape drive.
You need to check /var/adm/message and bptm log to determine whether you have bad media or faulty tape drive(s).
NBU is merely reporting the error.
You need to troubleshoot hardware issues at OS level and with your hardware vendor.
We can try to assist with you troubleshooting if you could collect the text files, copy them to .txt files (bptm.txt and messages.txt) and upload here.
The text file contain a lot of row, so I can't put it here, it limitation.
Btw, i use another UAT library to connect this server, but I follow the step to re-connect the library, run test backup it shows the error code 96, I follow the instruction to do previously , but it doen't work , and also I do inventory again , it always come out some media id not in database, i have no idea what happen.
i follow the article ID 100021294 to replace tape drive in netbackup configuration.
Luckily you did not try to paste the contents of a log file into a forum post.
You can copy the log.<date> file to bptm<date>.txt and upload as attachment - e.g.
cp log.031518 bptm031518.txt.
The .txt extention ensures that the file is readable on all kinds of devices.
About the replacement library - have you done some sort of troubleshooting to determine that the problem is with the library and/or tape drives and not perhaps with a bunch of faulty media?
Did you first delete the existing tape drives and robot and then re-add devices in NBU for the replacement robot?
A different library may have different settings that will read and report the barcode differently.
Some libraries will read and report the 1st 6 characters only, e.g. CP0015.
Another library will read and report all 8 characters, e.g. CP0015L3.
You need to be aware of this so that you can prepare NBU or either change robot settings.
We should be able to resolve the library issue if you show us output for one of the tapes, and then cmd version of robot inventory 'check and compare contents'.
1. vmquery -m CP0015
2. vmcheckxxx -rn <robot-number> -rt tld
(so, if the robot is TLD(0), the command will be vmcheckxxx -rn 0 -rt tld )
I don't delete the all the media record on the NBU as there are media designated on the volume pool, anyway, I have to re-assigned the media to volume pool again. The current production and uat library are the same model and same firmware code on the tape drive. The UAT library already test on the UAT which is working fine and temporary for end usr on going backup. All the backup job failed for a week, so I have to fix it tomorrow on the UAT library
I will do the follow tomorrow.
1. Delete all the media on the NBU console.
2. Delete the drive and robot in the NBU
3. Kill NBU daemon
4. Start up the NBU daemon
5. tpconfig and scan -tape to make sure OS and NBU to know the UAT library
6. Device wizard to scan the media and robotic .
7 Set to STU on the policy
8. Manual to run the backup policy to testing.
Noooo..... PLEASE do NOT "Delete all the media on the NBU console."
I have posted these steps in your other post:
Delete drives and robot in NBU. (This will change all tapes (media) in robot to Standalone.)
Physically disconnect robot and drives and ensure devices are removed from OS view.
Connect new devices and ensure OS can see devices.
Run device config wizard.
If you will re-use old tapes, put tapes removed from old robot in the newly attached robot.
Run Inventory but do not update yet - Do 'Preview recommended changes'. Ensure new robot read tape labels the SAME way as old robot (if old robot used 1st 6 chars of label, ensure that new robot does the same.) If not the same, add Media ID Generation rule.
Thanks for the .txt files.
Firstly, the logging level is too low - VERBOSE = 0.
This does not give us anything more than same errors as the Activity Monitor.
For the 14th, we see this type of error:
02:33:35.279  <16> io_ioctl: ioctl (MTBSF) failed on media id CP0040, drive index 1, I/O error (bptm.c.8164)
02:33:35.279  <2> send_MDS_msg: DEVICE_STATUS 1 67970 imduarbak01 CP0040 4000231 HP.ULTRIUM3-SCSI.001 2000307 POSITION_ERROR 0 0
02:33:35.279  <4> send_MDS_msg: Called by interrupt, NOP
02:33:35.285  <2> log_media_error: successfully wrote to error file - 03/14/18 02:33:35 CP0040 1 POSITION_ERROR HP.ULTRIUM3-SCSI.001
for a number of media IDs:
I see 1 TapeAlert that you can pass onto your hardware vendor:
TAPE_ALERT HP.ULTRIUM3-SCSI.001 0x00000000 0x00004000
Please look in /var/adm/messages for hardware errors and pass that onto the hardware vendor.
You can search all of the bptm logs for 'error file' to locate all the tape errors.
Higher level logs will be good to get more useful info.