cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP backups very slow - VERY!

alstarian
Level 3

Hi Guys,

Since the last couple of weeks by NDMP backups have been failing, prior to that they were just very very slow. All this drama started when our Netapp storage was upgraded from 7 - Mode to cluster mode by our SAN administrator

I used to get speeds of  120,000 KB/Sec and now after the update if I am lucky if I get 50,000 KB/Sec (I mean really lucky), but I have noticed something my backups are slow when backing up huge NDMP data.. like my projects folder is about 32 Tb big and I know its going to take time backing that up but from the past it use to take around 3 days and now its takes 7 days and off late they have been failing at random times... last time it failed with 28Tb!

I am performing local/direct NDMP bacukups using a Brocade300 switch, so I am fiber cables from 2 Netapp controllers going to my brocade 300 SAN switch and one fiber cable from my backup server to that switch as well, together with 4 other fiber cables from my tape drives (we use Quantum tape library).

I cant seem to figure out why my backups are so slow for this site because I have another site with a similar setup and I am getting very good speeds and the data is as big as  this one...

Any Ideas?

-A

5 REPLIES 5

PatS729
Level 5

Hi,

Whats the NBU Version used ?

Also, Can you share us Detailed job status ? It will help us to findout if backup was stuck at any point.

Additionally, you can enable NDMP, NDMPAGENT, BPTM and BPBRM logs from NDMPHOST.

/usr/openv/netbackup/bin/vxlogcfg -a -p 51216 -o 151 -s DebugLevel=6     <------ enable NDMP log

/usr/openv/netbackup/bin/vxlogcfg -a -p 51216 -o 134 -s DebugLevel=6     <------ enable NDMPAGENT log

For BPTM and BPBRM just increase VERBOSE = 5. 

Review the logs and try to findout where NBU is waiting to write or read the data from / to.

alstarian
Level 3

Hi Pat,

 

The netbackup version that I have is 7.6.1, the last time my backup failed with the below error in the job logs..

===========

08/05/2016 00:59:45 - Error bptm(pid=1000) io_ioctl_ndmp (MTBSF) failed on media id XA1040, drive index 3, return code 19 (NDMP_ILLEGAL_STATE_ERR) (bptm.c.8897) 
08/05/2016 01:03:56 - Info ndmpagent(pid=6556) zaedxa1601: DUMP: Sun May 8 00:53:17 2016 : We have written 28200030469 KB. 
08/05/2016 01:09:02 - Info ndmpagent(pid=6556) zaedxa1601: MOVER: signaling backup EOW at 28902317424640 (441014365 records) 
08/05/2016 01:09:02 - Info bptm(pid=1000) EXITING with status 23 <---------- 
08/05/2016 01:09:02 - Error ndmpagent(pid=6556) send error status = 18 (NDMP_XDR_DECODE_ERR) 
08/05/2016 01:09:02 - Error ndmpagent(pid=6556) SendControlMessage failed, disabling connection 00000000012257C0 and exiting 
08/05/2016 01:09:02 - Error ndmpagent(pid=6556) Connection was closed but has not yet been destroyed. 
08/05/2016 01:09:02 - Error ndmpagent(pid=6556) Connection was closed but has not yet been destroyed. 
08/05/2016 01:09:02 - Error ndmpagent(pid=6556) MoverGetState called with no session 
08/05/2016 01:09:02 - Error ndmpagent(pid=6556) NDMP backup failed, path = /zaedxa1601/AEDXAProjects/ 
08/05/2016 01:09:02 - Error ndmpagent(pid=6556) Connection was closed but has not yet been destroyed. 
08/05/2016 01:09:02 - Error ndmpagent(pid=6556) Connection was closed but has not yet been destroyed. 

I had some socket read errors as well

======================

Now that I have restarted the backup its stuck on the Pass III phase where it dumps all the directories, now that I have restarted the backup if I enable the logs for BPTM and BPBRM will thet take affect or I will have to restart the backups? I can give it a shot I suppose and I after enabling the logs I can run the command  "bprdreq -rereadconfig" ?

 

 

 

 

PatS729
Level 5

Hi Alstarian,

Yes, you would need to restart the backup again, as just running bprdreq -rereadconfig will not help to set loggings for backups in progress.

However, looking at the failed job status it could be a performance issue with "bpbrm" on Media (I dont deny a network problem between the datamover and media server). You can try implementiing resolution from article https://www.veritas.com/support/en_US/article.TECH47412

See if this helps or post us logs from the failed attempt.

Thanks.

alstarian
Level 3

Hi Pat,

As a matter of fact I had created that touch file MAX_FILES_PER_ADD with a value of 25000 but then removed it... I can try adding it again and also when they say a touch file.. we create a file without at extention correct?

Also it really sucks that I have to restart my backups after enabling the logging coz I have 2 backups running since 10 hours and still stuck in that PASS III phase and this would mean I will loose all that time again!

 

PatS729
Level 5

Yes, touch file should be created without an extension...