cancel
Showing results for 
Search instead for 
Did you mean: 

error code 41 network time out

Home_224
Level 6

 I find the error code 41 to make the backup failed, check /var/adm/ there are number of the I/O error for the month, with this error , the backup job failed from time to time.  I use the different media to run backup but still the same error on the one policy.   I check the policy to run backup successful today, but last few days it run failed.  To check the server side , it is no connectivity error, I don't think it relatives the timeout issue, as the backup job working fine today, so I never set the timout paramenter from 300 to 1800.   This library and backup occur the problem from time to time, and feel very depress and human power to find the solution to fix the problem.  Therefore I would like if there is way to fix this problem.  Thank you 

27972247_10155633110926032_12893001512350496_n.jpg28167122_10155633111036032_614067166396106509_n.jpg27972540_10155633111116032_6836007973266247995_n.jpg27973485_10155633111436032_4176909690151478583_n.jpg27973432_10155633111231032_9010499624560255153_n.jpg

1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Level 6
Partner    VIP    Accredited Certified

As per my post of a month ago:

The status 41 seems to be a red herring -  nothing to do with network read timeout, but rather a device I/O error. 
Faulty media or tape drive. 

View solution in original post

15 REPLIES 15

Marianne
Level 6
Partner    VIP    Accredited Certified

@Home_224

Is there a possibility that you can copy the text and post here? The photos are not very clear.

The status 41 seems to be a red herring -  nothing to do with network read timeout, but rather a device I/O error.
Faulty media or tape drive. 

Yes, I try to ask for end usrs to logon and capture it

I have error 41.jpgerror 41_2.jpg

I have no idea the problem start error code 14 more frequency , it failed on media id , drive index 0 and index 1, do I need to increase timeout of client parameter ?  Please advice 

Marianne
Level 6
Partner    VIP    Accredited Certified

Ignore the status 41.

You have a problem with media and/or tape drive.

You need to check /var/adm/message and bptm log to determine whether you have bad media or faulty tape drive(s). 

NBU is merely reporting the error.
You need to troubleshoot hardware issues at OS level and with your hardware vendor. 

We can try to assist with you troubleshooting if you could collect the text files, copy them to .txt files (bptm.txt and messages.txt) and upload here. 

Hi Marianne,

Thank you for your comment.

I attach the log for your review 

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified

I cannot open files with unknown extentions on my mobile device.
That is why I asked that you copy the files to .txt.

Hopefully someone else can assist.

Hi Marianne,

The text file contain a lot of row, so I can't put it here, it limitation.

Btw, i use another UAT library to connect this server, but I follow the step to re-connect the library, run test backup it shows the error code 96, I follow the instruction to do previously , but it doen't work , and also I do inventory again , it always come out some media id not in database, i have no idea what happen. 

i follow the article ID 100021294 to replace tape drive in netbackup configuration.

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Luckily you did not try to paste the contents of a log file into a forum post. 
You can copy the log.<date> file to bptm<date>.txt and upload as attachment - e.g.
cp log.031518 bptm031518.txt.
The .txt extention ensures that the file is readable on all kinds of devices.

About the replacement library - have you done some sort of troubleshooting to determine that the problem is with the library and/or tape drives and not perhaps with a bunch of faulty media?

Did you first delete the existing tape drives and robot and then re-add devices in NBU for the replacement robot?

A different library may have different settings that will read and report the barcode differently.
Some libraries will read and report the 1st 6 characters only, e.g. CP0015.
Another library will read and report all 8 characters, e.g. CP0015L3.

You need to be aware of this so that you can prepare NBU or either change robot settings.

We should be able to resolve the library issue if you show us output for one of the tapes, and then cmd version of robot inventory 'check and compare contents'.

1. vmquery -m CP0015
2. vmcheckxxx -rn <robot-number> -rt tld
(so, if the robot is TLD(0), the command will be vmcheckxxx -rn 0 -rt tld )

 

Hi Maraaine,

I don't delete the all the media record on the NBU as there are media designated on the volume pool, anyway, I have to re-assigned the media to volume pool again.  The current production and uat library are the same model and same firmware code on the tape drive. The UAT library already test on the UAT which is working fine and temporary for end usr on going backup.  All the backup job failed for a week, so I have to fix it tomorrow on the UAT library

I will do the follow tomorrow.

1. Delete all the media on the NBU console.

2. Delete the drive  and robot in the NBU

3. Kill NBU daemon

4. Start up the NBU daemon

5.  tpconfig and scan -tape to make sure OS and NBU to know the UAT library

6.  Device wizard to scan the media and robotic .

7 Set to STU on the policy

8. Manual to run the backup policy to testing.

 

Hi Marianne,

Thank you for your help

Marianne
Level 6
Partner    VIP    Accredited Certified

Noooo..... PLEASE do NOT "Delete all the media on the NBU console."

I have posted these steps in your other post: 

Delete drives and robot in NBU. (This will change all tapes (media) in robot to Standalone.)
Physically disconnect robot and drives and ensure devices are removed from OS view.
Connect new devices and ensure OS can see devices.
Run device config wizard.
If you will re-use old tapes, put tapes removed from old robot in the newly attached robot.
Run Inventory but do not update yet - Do 'Preview recommended changes'. Ensure new robot read tape labels the SAME way as old robot (if old robot used 1st 6 chars of label, ensure that new robot does the same.) If not the same, add Media ID Generation rule.
Complete Inventory.

Marianne
Level 6
Partner    VIP    Accredited Certified

Thanks for the .txt files. 

Firstly, the logging level is too low - VERBOSE = 0.
This does not give us anything more than same errors as the Activity Monitor. 

For the 14th, we see this type of error:

02:33:35.279 [29217] <16> io_ioctl: ioctl (MTBSF) failed on media id CP0040, drive index 1, I/O error (bptm.c.8164)
02:33:35.279 [29217] <2> send_MDS_msg: DEVICE_STATUS 1 67970 imduarbak01 CP0040 4000231 HP.ULTRIUM3-SCSI.001 2000307 POSITION_ERROR 0 0
02:33:35.279 [29217] <4> send_MDS_msg: Called by interrupt, NOP
02:33:35.285 [29217] <2> log_media_error: successfully wrote to error file - 03/14/18 02:33:35 CP0040 1 POSITION_ERROR HP.ULTRIUM3-SCSI.001

for a number of media IDs:
CP0040
CF0128
CF0129

I see 1 TapeAlert that you can pass onto your hardware vendor:
TAPE_ALERT HP.ULTRIUM3-SCSI.001 0x00000000 0x00004000

Please look in /var/adm/messages for hardware errors and pass that onto the hardware vendor. 

You can search all of the bptm logs for 'error file' to locate all the tape errors.
Higher level logs will be good to get more useful info. 

I try to increase the network time out from 300 to 1400 on the Master server , then this code not to happen again.

Marianne
Level 6
Partner    VIP    Accredited Certified

As per my post of a month ago:

The status 41 seems to be a red herring -  nothing to do with network read timeout, but rather a device I/O error. 
Faulty media or tape drive. 

Hi Marianne,

After long time ago , the problem occur on the hardware and media tape.

 

All problem relevant for backup are fixed it .

 

Thank you very much for help