cancel
Showing results for 
Search instead for 
Did you mean: 

Backups failing with error 6, 41 and 196

NetPaul
Level 3

Last week we upgraded our Master and Media servers to Netbackup 7.6.0.4 from 7.5.0.7 completing on the 17th April. Everything worked fine until 19th April when the we noticed Oracle and SQL backups were failing with an error 6 and an Exchange backup which normally takes 2.5 hours was still running after 2 days. Client servers are running a variety of versions from 7.1 to 7.5.0.7. We have two media servers and we only have this problem on one media server and the problem only affects some of the servers.

Error codes we are getting are 6, 41 and 196.

 

Investigating the problem the we found that the problem is the backups to the MSDP storage times out or is very slow in the case of the Exchange backup. Looking in the logs I have seen the following errors:

 

bptm

00:00:56.003 [7068.8068] <16> 86851:bptm:7068:hpwp-med01-bkp.hcuk.pri: [ERROR] PDVFS: pdvfs_lib_log: mtstrm_is_server_started: Cr_ErrnoException: Get Process Status command failed: Cr_ErrnoException: Timed out after waiting 1200s to send command Get Process Status to mtstrmd
00:00:56.003 [7068.8068] <16> 86851:bptm:7068:hpwp-med01-bkp.hcuk.pri: [ERROR] PDVFS: PdvfsMtstrmStartDaemon: The multi-threaded streaming daemon is not ready to receive requests even after 4 attempts.  Check if mtstrmd is running and look for errors in mtstrmd.log.
00:00:56.003 [7068.8068] <16> 86851:bptm:7068:hpwp-med01-bkp.hcuk.pri: [ERROR] PDSTS: start_mtstrm_session: PdvfsMtstrmStartDaemon failed (18:Improper link)

00:30:28.676 [8116.13800] <16> 84077:bptm:8116:hpwp-med01-bkp.hcuk.pri: libsts openp() 15/04/21 00:30:28: stsm_open_module C:\Program Files\Veritas\NetBackup\bin\\ost-plugins\libstspizapi.dll failed err 2060053 platerr 193: LoadLibrary

 

dbclient

00:45:29.702 [11258] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/11258.0.1429572612

01:00:30.755 [11258] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/11258.0.1429572612

01:00:30.756 [11258] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/11258.0.1429572612>

01:00:30.756 [11258] <16> CreateNewImage: ERR - serverResponse() failed

01:00:30.756 [11258] <16> VxBSACreateObject: ERR - Could not create new image with file /arch_ultq4t6kk_s17085_p1_t877566612.

 

bpdm

00:00:17.860 [900.10140] <16> 87153:bptm:900:hpwp-med01-bkp.hcuk.pri: libsts openp() 15/04/22 00:00:17: stsm_open_module C:\Program Files\Veritas\NetBackup\bin\\ost-plugins\libstspizapi.dll failed err 2060053 platerr 193: LoadLibrary failed

 

These errors seem to show the backups are unable to write to the storage as the backups seem to go to the media server ok and then fail because they cannot write to the storage. Apart from the upgrade the Netbackup configuration has not changed.

 

My gut feeling is that the failures are not caused by the upgrade and possibly not by Netbackup as these errors seem to be a consquence rather than a cause and we had at least a days worth of successful backups beform we started to have problems but as always for a Netbackup Administrator the problem is proving it.

 

Is anyone able give me any ideas? I've worked on this for 2 days and I am out of them.

6 REPLIES 6

sdo
Moderator
Moderator
Partner    VIP    Certified

The err 2060053 looks like:

https://www-secure.symantec.com/connect/forums/nbu-7602-windows-media-server-zero-length-libstspizapidll

sdo
Moderator
Moderator
Partner    VIP    Certified

In the first line of bptm error messages we see:

PDVFS: pdvfs_lib_log: mtstrm_is_server_started

...which appears to be bptm checking to see whether the CR (Content Router) is up and started, but it timed out after 20 minutes.

Maybe check your MSDP pool on the problematic media server - it could be struggling for some reason.

Is there a CR queue processing back log?

Run the 'queueinfo' type commands from here:

https://www-secure.symantec.com/connect/forums/netbackup-7104-msdp-storaged-critical-error

 

NetPaul
Level 3

Thanks sdo

 

That is what I am seeing. The output of the queueinfo commando is:

 

c:\Program Files\Veritas\pdde>crcontrol --queueinfo
total queue size : 3781148
creation date of oldest tlog : Wed Apr 22 12:29:24 2015

sdo
Moderator
Moderator
Partner    VIP    Certified

What does this show:

crcontrol --dsstat

...which is the quick version of the command (which I believe is effectively what it showed last time that it ran the longer version itself - or if you ran the longer version).

.

Don't run this next command yet - which is sometimes referred to as teh detailed version, but really it just means get an upto date picture:

crcontrol --dsstat 1

NetPaul
Level 3

Here is the crcontrol --dsstat output:

c:\Program Files\Veritas\pdde>crcontrol --dsstat

************ Data Store statistics ************
Data storage      Raw    Size   Used   Avail  Use%
                  58.2T  55.8T  41.2T  14.7T  74%

Number of containers             : 248264
Average container size           : 183904950 bytes
Space allocated for containers   : 45656978603092
Reserved space                   : 2558972882944 b
Reserved space percentage        : 4.0%

Use "--dsstat 1" to get more accurate statistics

NetPaul
Level 3

We raised a support call and an old fashioned reboot seemed to have got the backups working. Will need to now investigate why the problem occurred.