11-23-2015 10:23 AM
Hi All,
My SAN Client backup jobs over Fiber are getting hung and when I cancel it and restart the job manually backup goes fine over fiber transport.The back job will be active for hours and write nothing and when I cancel & restart it will go fine.Also noticed my backups some time fail over to LAN rather than FT, it suppose to be FT as SAN Client has been configured.
My master is Solaris10, FT media server is Solaris 10, Clients are Windows 2008.
Netbackup version on all the servers is 7.6.1.1.
Please provide your thoughts on this.
Regards,
Reddy
Solved! Go to Solution.
07-18-2017 11:08 PM
Finally the issue has been resolved. VERITAS has identified that Solaris FT media server runs on Spark can not handle more load and hence they recommended Linux machine with good memory and processor. We have replaced the Solaris with Linux and the issue is resolved.
Regards,
Prathap
11-24-2015 05:22 AM
How many streams are you pushing to the FT media server at the same time? FT can't handle that many connections, only 32 max is recommended.
11-24-2015 06:12 AM
Hi Riaan,
Thanks.
Number of streams at a time is lesser than 32...If I cancel the job it will work fine in next attempt.
As they are SQL backups, it is happening one by one and max streams for all the policies at a time is lesser than 20 streams.
Regards,
Reddy.
11-24-2015 09:19 AM
Hi,
Ok that sounds fine. Do you ever let the jobs fail before canceling and restarting them. It would be good to get an error code.
I had issue last year on 2.6.0.4 where the jobs would also fail (status 13/41) and then if rerun it would work fine. In the end I got an EEB from support that resolved the issue.
11-24-2015 10:38 AM
Hi Riaan,
Thanks for the update.
Yes, backups will fail with E.C 13 at the end after running morethan 12 hours writing nothing. I have raised a support case, changed multiple engineers due to they are not very good on FT area. however today another BL engineer has bee assigned for this and waiting for the call back...Let's see what he suggests...
11-24-2015 10:26 PM
Hi,
Any scope if you could paste failed job details here ?
11-25-2015 05:47 AM
Hi All,
Below is the job details status:
11/25/2015 9:33:37 AM - Info nbjm(pid=680) starting backup job (jobid=325363) for client holon-ns01v0, policy win_sql_spsql_sites1_san_client_holon-ns01v0_incre_ns, schedule Default-Application-Backup
11/25/2015 9:33:37 AM - Info nbjm(pid=680) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=325363, request id:{22221260-9379-11E5-99A6-002128100696})
11/25/2015 9:33:37 AM - requesting resource msdp_nbuft_ns-stu
11/25/2015 9:33:37 AM - requesting resource nbumaster-ns00.NBU_CLIENT.MAXJOBS.holon-ns01v0
11/25/2015 9:33:37 AM - requesting resource nbumaster-ns00.NBU_POLICY.MAXJOBS.win_sql_spsql_sites1_san_client_holon-ns01v0_incre_ns
11/25/2015 9:33:40 AM - granted resource nbumaster-ns00.NBU_CLIENT.MAXJOBS.holon-ns01v0
11/25/2015 9:33:40 AM - granted resource nbumaster-ns00.NBU_POLICY.MAXJOBS.win_sql_spsql_sites1_san_client_holon-ns01v0_incre_ns
11/25/2015 9:33:40 AM - granted resource MediaID=@aaaaj;DiskVolume=PureDiskVolume;DiskPool=msdp_nbuft_ns;Path=PureDiskVolume;StorageServer=nbuft-ns00;MediaServer=nbuft-ns00
11/25/2015 9:33:40 AM - granted resource msdp_nbuft_ns-stu
11/25/2015 9:33:40 AM - granted resource TRANSPORT
11/25/2015 9:33:42 AM - estimated 0 Kbytes needed
11/25/2015 9:33:42 AM - Info nbjm(pid=680) started backup (backupid=holon-ns01v0_1448458421) job for client holon-ns01v0, policy win_sql_spsql_sites1_san_client_holon-ns01v0_incre_ns, schedule Default-Application-Backup on storage unit msdp_nbuft_ns-stu
11/25/2015 9:33:44 AM - started process bpbrm (23789)
11/25/2015 9:33:45 AM - Info bpbrm(pid=23789) holon-ns01v0 is the host to backup data from
11/25/2015 9:33:45 AM - Info bpbrm(pid=23789) reading file list for client
11/25/2015 9:33:45 AM - connecting
11/25/2015 9:33:46 AM - Info bpbrm(pid=23789) listening for client connection
11/25/2015 9:33:49 AM - Info bpbrm(pid=23789) INF - Client read timeout = 1800
11/25/2015 9:33:49 AM - Info bpbrm(pid=23789) accepted connection from client
11/25/2015 9:33:49 AM - Info dbclient(pid=8912) Backup started
11/25/2015 9:33:49 AM - Info bpbrm(pid=23789) bptm pid: 23795
11/25/2015 9:33:49 AM - connected; connect time: 0:00:04
11/25/2015 9:33:51 AM - Info bptm(pid=23795) start
11/25/2015 9:33:51 AM - Info bptm(pid=23795) using 262144 data buffer size
11/25/2015 9:33:52 AM - Info bptm(pid=23795) using 20 data buffers
11/25/2015 9:33:52 AM - Info bptm(pid=23795) USING 262144 data buffer size for FT
11/25/2015 9:33:53 AM - Opening Fibre Transport Connection, Backup Id: holon-ns01v0_1448458421
11/25/2015 9:34:00 AM - Info bptm(pid=23795) start backup
11/25/2015 9:34:06 AM - begin writing
11/25/2015 9:34:10 AM - Info dbclient(pid=8912) dbclient(pid=8912) wrote first buffer(size=65536)
The job stuck at the above point and wait for the hours and write nothing.
11-27-2015 07:40 AM
Even Veritas support struggling to fix this issue.
Veritas Experts: Please try to provide your inputs on this.
11-27-2015 11:14 AM
Did you check the FT logs on both media server and client?
.
Also I saw this:
NetBackup 7.6 EEB Guide
http://www.veritas.com/docs/000003552
...says on page 37:
3053482 A Fibre Transport (FT) media server can encounter a problem with the FT server process (nbftsrvr) stopping with the following error: "SCPALDMA Assertion failed".
...but it's not clear to me whether this issue (noted in v7.6.1) is still present in v7.6.1.1 - or whether it is fixed in any particular version. In which case, it's probably best to open an official case with Veritas Support.
.
This doc describes how to collect FT logs:
NetBackup 7.6 SAN Client and Fibre Transport Guide
http://www.veritas.com/docs/000003727
...I think you need to collect and review the FT related logs from media server and client.
11-27-2015 12:18 PM
Hi Sdo/Riaan,
Very good information.
I have already raised the support case, they have been revieving the logs. But see very poor support from Veritas at the moment, might be due to transition phase from Syamtec.
However I will check with Veritas engineer with this angel and keep you posted.
12-21-2015 11:00 AM
Hi All,
even veritas struggling to fix the issue. Any expertise from others please.
07-18-2017 11:08 PM
Finally the issue has been resolved. VERITAS has identified that Solaris FT media server runs on Spark can not handle more load and hence they recommended Linux machine with good memory and processor. We have replaced the Solaris with Linux and the issue is resolved.
Regards,
Prathap