Backups failing with 23

Venki009 · ‎09-01-2014

Hi Team,

My all backups for one client are failed with 23 .Again initiate, after some time mean backups for 30 min and again failing with 23.

Pls help.

Thanks in advance

Marianne · ‎09-01-2014

Please start by showing us all text in Details tab of failed job.

You have selected Solaris as OS. Is the master? Or client? Or media server?
Tell us OS and NBU version on master, media server and client.

Is there a firewall anywhere in the picture?

Firewall timeout has been known for causing status 23/24.

Handy NetBackup Links

NBU35 · ‎09-01-2014

In Additon to Marianne's post kindly share bpcd logs from client.

Venki009 · ‎09-05-2014

Hi,

09/05/2014 18:35:58 - Info bpbrm (pid=23327) spawning a brm child process

09/05/2014 18:35:58 - Info bpbrm (pid=23327) child pid: 18097

09/05/2014 18:35:58 - Info bpbrm (pid=23327) sending bpsched msg: CONNECTING TO CLIENT FOR erpdb-madg_1409922357

09/05/2014 18:35:58 - Info bpbrm (pid=23327) start bpbkar on client

09/05/2014 18:35:58 - Info bpbkar (pid=20605) Backup started

09/05/2014 18:35:58 - Info bpbrm (pid=23327) Sending the file list to the client

09/05/2014 18:35:58 - connecting

09/05/2014 18:35:58 - connected; connect time: 0:00:00

09/05/2014 18:35:58 - begin writing

09/05/2014 20:13:18 - current media 0057L5 complete, requesting next media Any

09/05/2014 20:13:18 - current media -- complete, awaiting next media Any. Waiting for resources.

Reason: Drives are in use, Media server: SRILANKA,

Robot Type(Number): TLD(0), Media ID: N/A, Drive Name: N/A,

Volume Pool: 3week, Storage Unit: SRILANKA -HCART2-TLD-0, Drive Scan Host: N/A,

Disk Pool: N/A, Disk Volume: N/A

09/05/2014 20:14:52 - current media -- complete, awaiting next media Any. Waiting for resources.

Reason: Media is in use, Media server: SRILANKA,

Robot Type(Number): TLD(0), Media ID: N/A, Drive Name: N/A,

Volume Pool: 3week, Storage Unit: SRILANKA -HCART2-TLD-0, Drive Scan Host: N/A,

Disk Pool: N/A, Disk Volume: N/A

09/05/2014 20:35:58 - Info bpbrm (pid=23327) sending message to media manager: STOP BACKUP erpdb-madg_1409922357

09/05/2014 20:36:00 - Info bpbrm (pid=23327) media manager for backup id erpdb-madg_1409922357 exited with status 150: termination requested by administrator

09/05/2014 20:36:00 - end writing; write time: 2:00:02

socket read failed (23)

Venki009 · ‎09-05-2014

Hi,

Master and media is same

master-7.5,os= solaris10

client;7.5,os=solaris10 ,5.10

Marianne · ‎09-06-2014

Is there a firewall between the server and the client?

Maybe a firewall timeout?

Try to set 'keep alive' settings on client and server to 5 minutes.

See http://www.symantec.com/docs/TECH188129

Handy NetBackup Links

Michael_G_Ander · ‎10-26-2014

Agree with Marianne it looks like a timeout, maybe increasing the CLIENT_READ_TIMEOUT to 1800 would also be an idea as the gap is 22 minutes

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

jmontagu · ‎10-29-2014

The detail status indicates that 'media is in use' and 'drives are in use'. Netbackup timed out waiting for resources.

When a backup job is run, Netbackup reserves resources in the EMM database. This includes Media and Drives. If there has been some sort of hardware failure or the Device Manager went down these resources get hung up in the EMM database and Netbackup thinks that they are still in use even though there are no jobs running.

There is a wonderful utility to troubleshoot resource allocations (nbrbutil) it lives in usr/openv/netbackup/bin on a unix master and <install path> Program Files\veritas\netbackup\bin on a Windows Master.

The following documentation describes the many uses of this utility:

http://www.symantec.com/docs/HOWTO43779

In your situation I would recommend the following actions:

1. When no jobs are running run nbrbutil -dump

If there are allocations listed they are 'hung' in EMM run nbrbutil -resetall

2. Run nbrbutil -dump again to verify the hung resources have been cleared. If there are many the resetall command may need to be run again.

If there are no hung resources then I would recommend continued troubleshooting for possible hardware errors.