cancel
Showing results for 
Search instead for 
Did you mean: 

SAN Client backups are getting hung happening over FT media server

PrathapReddy
Level 4
Certified

Hi All,

My SAN Client backup jobs over Fiber are getting hung and when I cancel it and restart the job manually backup goes fine over fiber transport.The back job will be active for hours and write nothing and when I cancel & restart it will go fine.Also noticed my backups some time  fail over to LAN rather than FT, it suppose to be FT as SAN Client has been configured.

My master is Solaris10, FT media server is Solaris 10, Clients are Windows 2008.

Netbackup version on all the servers is 7.6.1.1.

Please provide your thoughts on this.

 

Regards,

Reddy

 

1 ACCEPTED SOLUTION

Accepted Solutions

Finally the issue has been resolved. VERITAS has identified that Solaris FT media server runs on Spark can not handle more load and hence they recommended Linux machine with good memory and processor. We have replaced the Solaris with Linux and the issue is resolved.

Regards,

Prathap

View solution in original post

11 REPLIES 11

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

How many streams are you pushing to the FT media server at the same time? FT can't handle that many connections, only 32 max is recommended.

PrathapReddy
Level 4
Certified

Hi Riaan,

Thanks.

Number of streams at a time is lesser than 32...If I cancel the job it will work fine in next attempt.

As they are SQL backups, it is happening one by one and max streams for all the policies at a time is lesser than 20 streams.

 

Regards,

Reddy.

 

 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

Ok that sounds fine. Do you ever let the jobs fail before canceling and restarting them. It would be good to get an error code.

 

I had issue last year on 2.6.0.4 where the jobs would also fail (status 13/41) and then if rerun it would work fine. In the end I got an EEB from support that resolved the issue.

 

 

PrathapReddy
Level 4
Certified

Hi Riaan,

Thanks for the update.

Yes, backups will fail with E.C 13 at the end after running morethan 12 hours writing nothing. I have raised a support case, changed multiple engineers due to they are not very good on FT area. however today another BL engineer has bee assigned for this and waiting for the call back...Let's see what he suggests...

 

pats_729
Level 6
Employee

Hi,

Any scope if you could paste failed job details here ?

PrathapReddy
Level 4
Certified

Hi All,

Below is the job details status:

 

11/25/2015 9:33:37 AM - Info nbjm(pid=680) starting backup job (jobid=325363) for client holon-ns01v0, policy win_sql_spsql_sites1_san_client_holon-ns01v0_incre_ns, schedule Default-Application-Backup 

11/25/2015 9:33:37 AM - Info nbjm(pid=680) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=325363, request id:{22221260-9379-11E5-99A6-002128100696}) 

11/25/2015 9:33:37 AM - requesting resource msdp_nbuft_ns-stu

11/25/2015 9:33:37 AM - requesting resource nbumaster-ns00.NBU_CLIENT.MAXJOBS.holon-ns01v0

11/25/2015 9:33:37 AM - requesting resource nbumaster-ns00.NBU_POLICY.MAXJOBS.win_sql_spsql_sites1_san_client_holon-ns01v0_incre_ns

11/25/2015 9:33:40 AM - granted resource nbumaster-ns00.NBU_CLIENT.MAXJOBS.holon-ns01v0

11/25/2015 9:33:40 AM - granted resource nbumaster-ns00.NBU_POLICY.MAXJOBS.win_sql_spsql_sites1_san_client_holon-ns01v0_incre_ns

11/25/2015 9:33:40 AM - granted resource MediaID=@aaaaj;DiskVolume=PureDiskVolume;DiskPool=msdp_nbuft_ns;Path=PureDiskVolume;StorageServer=nbuft-ns00;MediaServer=nbuft-ns00

11/25/2015 9:33:40 AM - granted resource msdp_nbuft_ns-stu

11/25/2015 9:33:40 AM - granted resource TRANSPORT

11/25/2015 9:33:42 AM - estimated 0 Kbytes needed

11/25/2015 9:33:42 AM - Info nbjm(pid=680) started backup (backupid=holon-ns01v0_1448458421) job for client holon-ns01v0, policy win_sql_spsql_sites1_san_client_holon-ns01v0_incre_ns, schedule Default-Application-Backup on storage unit msdp_nbuft_ns-stu

11/25/2015 9:33:44 AM - started process bpbrm (23789)

11/25/2015 9:33:45 AM - Info bpbrm(pid=23789) holon-ns01v0 is the host to backup data from    

11/25/2015 9:33:45 AM - Info bpbrm(pid=23789) reading file list for client       

11/25/2015 9:33:45 AM - connecting

11/25/2015 9:33:46 AM - Info bpbrm(pid=23789) listening for client connection        

11/25/2015 9:33:49 AM - Info bpbrm(pid=23789) INF - Client read timeout = 1800     

11/25/2015 9:33:49 AM - Info bpbrm(pid=23789) accepted connection from client        

11/25/2015 9:33:49 AM - Info dbclient(pid=8912) Backup started          

11/25/2015 9:33:49 AM - Info bpbrm(pid=23789) bptm pid: 23795         

11/25/2015 9:33:49 AM - connected; connect time: 0:00:04

11/25/2015 9:33:51 AM - Info bptm(pid=23795) start           

11/25/2015 9:33:51 AM - Info bptm(pid=23795) using 262144 data buffer size       

11/25/2015 9:33:52 AM - Info bptm(pid=23795) using 20 data buffers        

11/25/2015 9:33:52 AM - Info bptm(pid=23795) USING 262144 data buffer size for FT     

11/25/2015 9:33:53 AM - Opening Fibre Transport Connection, Backup Id: holon-ns01v0_1448458421

11/25/2015 9:34:00 AM - Info bptm(pid=23795) start backup          

11/25/2015 9:34:06 AM - begin writing

11/25/2015 9:34:10 AM - Info dbclient(pid=8912) dbclient(pid=8912) wrote first buffer(size=65536) 

 

 

 

 

The job stuck at the above point and wait for the hours and write nothing.

PrathapReddy
Level 4
Certified

Even Veritas support struggling to fix this issue.

Veritas Experts: Please try to provide your inputs on this.

sdo
Moderator
Moderator
Partner    VIP    Certified

Did you check the FT logs on both media server and client?

.

Also I saw this:

NetBackup 7.6 EEB Guide

http://www.veritas.com/docs/000003552

...says on page 37:

3053482 A Fibre Transport (FT) media server can encounter a problem with the FT server process (nbftsrvr) stopping with the following error: "SCPALDMA Assertion failed". 

...but it's not clear to me whether this issue (noted in v7.6.1) is still present in v7.6.1.1 - or whether it is fixed in any particular version.  In which case, it's probably best to open an official case with Veritas Support.

.

This doc describes how to collect FT logs:

NetBackup 7.6 SAN Client and Fibre Transport Guide

http://www.veritas.com/docs/000003727

...I think you need to collect and review the FT related logs from media server and client.

PrathapReddy
Level 4
Certified

Hi Sdo/Riaan,

Very good information.

I have already raised the support case, they have been revieving the logs. But see very poor support from Veritas at the moment, might be due to transition phase from Syamtec.

However I will check with Veritas engineer with this angel and keep you posted.

 

 

 

 

PrathapReddy
Level 4
Certified

Hi All,

 

even veritas struggling to fix the issue. Any expertise from others please.

Finally the issue has been resolved. VERITAS has identified that Solaris FT media server runs on Spark can not handle more load and hence they recommended Linux machine with good memory and processor. We have replaced the Solaris with Linux and the issue is resolved.

Regards,

Prathap