NDMP backups fail with Status: 99

emartens · ‎06-04-2012

I have a S10 Master server running NBU 7.1.0.4. and three NDMP clients (7210's). Now I was told that the NDMP backups were working before the upgrade to 7.1.

Full backups fail with a status: 99: NDMP backup failure. All three clients fail with the same error. I've attached the snapshot from the activity monitor. I've been searching and have been unable to find any information relating the ndmpagent errors that I am receiving. The NDMP backups are setup to run across a private network. At first I thought that the client couldn't talk back to the Master server but I can see traffic both ways when I snoop the interfaces and tpautoconf -verify doesn't error out.

I know that there is output from about a dozen other log files that I haven't attached but I wanted to see if anyone has any idea what the cause might be.

Thanks...

06/04/2012 13:37:47 - Info nbjm (pid=5296) starting backup job (jobid=770721) for client linz-pvt, policy linz_NDMP, schedule Full
06/04/2012 13:37:47 - Info nbjm (pid=5296) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=770721, request id:{C31B4A14-AE7C-11E1-B77B-00144F866FDC})
06/04/2012 13:37:47 - requesting resource dutch-tape-hcart2
06/04/2012 13:37:47 - requesting resource dutch.NBU_CLIENT.MAXJOBS.linz-pvt
06/04/2012 13:37:47 - requesting resource dutch.NBU_POLICY.MAXJOBS.linz_NDMP
06/04/2012 13:37:47 - granted resource dutch.NBU_CLIENT.MAXJOBS.linz-pvt
06/04/2012 13:37:47 - granted resource dutch.NBU_POLICY.MAXJOBS.linz_NDMP
06/04/2012 13:37:47 - granted resource TTC151
06/04/2012 13:37:47 - granted resource t10k-16_0212
06/04/2012 13:37:47 - granted resource dutch-tape-hcart2
06/04/2012 13:37:47 - estimated 0 kbytes needed
06/04/2012 13:37:47 - Info nbjm (pid=5296) started backup job for client linz-pvt, policy linz_NDMP, schedule Full on storage unit dutch-tape-hcart2
06/04/2012 13:37:48 - Info bpbrm (pid=7035) linz-pvt is the host to backup data from
06/04/2012 13:37:48 - Info bpbrm (pid=7035) reading file list from client
06/04/2012 13:37:48 - Info bpbrm (pid=7035) starting ndmpagent on client
06/04/2012 13:37:49 - Info ndmpagent (pid=7040) Backup started
06/04/2012 13:37:49 - Info bpbrm (pid=7035) bptm pid: 7041
06/04/2012 13:37:50 - Info bptm (pid=7041) start
06/04/2012 13:37:51 - Info bptm (pid=7041) using 64 data buffers
06/04/2012 13:37:51 - Info bptm (pid=7041) using 2097152 data buffer size
06/04/2012 13:37:51 - Info bptm (pid=7041) start backup
06/04/2012 13:37:51 - Info bptm (pid=7041) Waiting for mount of media id TTC151 (copy 1) on server dutch.
06/04/2012 13:37:48 - started process bpbrm (pid=7035)
06/04/2012 13:37:48 - connecting
06/04/2012 13:37:48 - connected; connect time: 0:00:00
06/04/2012 13:37:51 - mounting TTC151
06/04/2012 13:38:21 - Info bptm (pid=7041) media id TTC151 mounted on drive index 13, drivepath /dev/rmt/13cbn, drivename t10k-16_0212, copy 1
06/04/2012 13:38:24 - Error ndmpagent (pid=7040) ndmp_enable_extensions: ndmp_config_get_ext_list failed. status = 30 (NDMP_EXT_DANDN_ILLEGAL_ERR)
06/04/2012 13:38:25 - Error ndmpagent (pid=7040) setsockopt error 132 (No buffer space available)
06/04/2012 13:38:25 - Error ndmpagent (pid=7040) NDMP backup failed, path = /pool-0/local/ngd_digital_t10kd/ngd_digital_t10kd
06/04/2012 13:38:21 - mounted TTC151; mount time: 0:00:30
06/04/2012 13:38:21 - positioning TTC151 to file 1
06/04/2012 13:38:24 - positioned TTC151; position time: 0:00:03
06/04/2012 13:38:24 - begin writing
06/04/2012 13:38:44 - Info bptm (pid=7041) EXITING with status 99 <----------
06/04/2012 13:38:44 - Info ndmpagent (pid=0) done. status: 99: NDMP backup failure
06/04/2012 13:38:44 - end writing; write time: 0:00:20
NDMP backup failure (99)

ndmpagent log output...

13:38:24.972 [7040] <2> vnet_pbxConnect: pbxConnectEx Succeeded
13:38:24.973 [7040] <2> job_connect: SO_KEEPALIVE set on socket 11 for client dutch
13:38:24.974 [7040] <2> logconnections: BPJOBD CONNECT FROM 10.80.2.43.36737 TO 10.80.2.43.1556 fd = 11
13:38:24.974 [7040] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
13:38:24.976 [7040] <2> job_connect: Connected to the host dutch contype 10 jobid <770721> socket <11>
13:38:24.976 [7040] <2> job_connect: Connected on port 36737
13:38:24.978 [7040] <2> job_monitoring_exex: ACK disconnect
13:38:24.978 [7040] <2> job_disconnect: Disconnected
13:38:25.008 [7040] <2> vnet_pbxConnect: pbxConnectEx Succeeded
13:38:25.009 [7040] <2> job_connect: SO_KEEPALIVE set on socket 11 for client dutch
13:38:25.009 [7040] <2> logconnections: BPJOBD CONNECT FROM 10.80.2.43.36738 TO 10.80.2.43.1556 fd = 11
13:38:25.009 [7040] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
13:38:25.011 [7040] <2> job_connect: Connected to the host dutch contype 10 jobid <770721> socket <11>
13:38:25.012 [7040] <2> job_connect: Connected on port 36738
13:38:25.013 [7040] <2> job_monitoring_exex: ACK disconnect
13:38:25.013 [7040] <2> job_disconnect: Disconnected
13:38:25.019 [7040] <2> vnet_pbxConnect: pbxConnectEx Succeeded
13:38:25.020 [7040] <2> job_connect: SO_KEEPALIVE set on socket 11 for client dutch
13:38:25.020 [7040] <2> logconnections: BPJOBD CONNECT FROM 10.80.2.43.36739 TO 10.80.2.43.1556 fd = 11
13:38:25.020 [7040] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
13:38:25.022 [7040] <2> job_connect: Connected to the host dutch contype 10 jobid <770721> socket <11>
13:38:25.022 [7040] <2> job_connect: Connected on port 36739
13:38:25.024 [7040] <2> job_monitoring_exex: ACK disconnect
13:38:25.024 [7040] <2> job_disconnect: Disconnected

dutch# tpautoconf -verify linz-pvt
Connecting to host "linz-pvt" as user "root"...
Waiting for connect notification message...
Opening session--attempting with NDMP protocol version 4...
Opening session--successful with NDMP protocol version 4
host supports TEXT authentication
host supports MD5 authentication
Getting MD5 challenge from host...
Logging in using MD5 method...
Host info is:
host name "linz"
os type "SunOS"
os version "5.11"
host id "0"
Login was successful
Host supports LOCAL backup/restore
Host supports 3-way backup/restore

rizwan84tx · ‎06-05-2012

Did you try changing the NDMP buffer size below 2097152, you can try values 262144 (256kb) or 524288 (512 kb).

Create file SIZE_DATA_BUFFERS_NDMP if its not present and enter the values in it.

Path: /usr/openv/netbackup/db/config/

emartens · ‎06-05-2012

Rizwan,

Thanks for the post. I changed the SIZE_DATA_BUFFERS_NDMP as well as the SIZE_DATA_BUFFERS to 524288 (512 kb). I know that there could be an issue if the values for both are different. After the changes I still had the same status 99.

Yogesh9881 · ‎06-05-2012

which NAS box ?? EMC ? NatApp ? IBM ... ?

b'coz need to check on NAS level also

Approx what is data size that u want to backup ?

emartens · ‎06-05-2012

The appliances are Sun Storage 7210's

SunOS 5.11 ak/generic@2010.08.17.4.2,1-1.37 64 bit

On the appliance that I'm doing most of my testing, I'm backing up approx. 1.4 TB

Yogesh9881 · ‎06-05-2012

was it working before ?? or it is new configuration that u r working on ?

also can you plz check logs on NAS level at the exactly time where backup failed from 06/04/2012 13:38:21 to onwards ...can u attached here in txt frmat ?

emartens · ‎06-05-2012

It was setup when the Master server was at NBU 6.5.3 for testing. The initial backups ran without incident. I believe it was after upgrading to 7.1 then immediately to 7.1.0.4 that the NDMP backups stopped working. I have rolled back the master server to 7.1 and it still wasn't working, so I upgraded back to 7.1.0.4. I wasn't about to try and roll back to 6.5.3. I'm trying to get more logging information from the appliance.

watsons · ‎06-07-2012

That error message you highlighted look like something from the Filer side, I think it will be helpful to seek vendor help, at least we know what that error means from the filer point of view.

Have you tried to split the backup selection so it won't be taking that much data in one stream (1.4TB)?

Another thought from what you described, issue surfaced after the upgrade to 7.1. Before the upgrade, was there any EEB (Netbackup bug fix) being applied specifically to address the problem, or it just worked without any EEB? You can consider downgrading to get down to the cause, rollback to 6.5.3 first, then patch to 6.5.6. That way you can tell from which version it stops working with Netbackup, and you can then raise a request to Support to have that fix.

VOX

NDMP backups fail with Status: 99