12-13-2012 01:41 AM
Netbackup version : 7.1.0.4
Master OS : windows 2008
Backup is failing with error code 25 (cannot connect to socket) for windows client [Multistream enabled].
The connectivity between the master and client is OK.
I have tried the backup of the same client without Multistream and it completed successfully.
Why the backup is failing??
12-13-2012 01:54 AM
hi
it seems there is a reverse communication issue.that is from client to Master server.
what is the netbackup and OS versions of client.
and give the output of bpclntcmd -pn
bpcltcmd -self from the client..
and also you need to enable the bpfis logs in the client and need to look for error message. provide the log also., if log dir is not avaliable, please create it and trigger the backup.
installpath\veritas\netbackup/logs/bpfis.
12-13-2012 02:05 AM
I would doubt it's a config issue if it works one way but not the other....
What's the OS of the client?
Is it purely load on the client when multi-streaming is enabled?
Can you check resource utlization during a multistreamed backup?
i.e. instead of the single job utilising x% of resources you now have 'y' jobs using ~y*x% of resources.
You could leave multiple data streams enabled but reduce max jobs per client (or policy) to see if this alleviates the issue - this could essentially be the same as switching multiple data streams off tho'
12-14-2012 05:02 AM
Client Netbackup version : 7.1.0.4
OS: windows 2003
During the backup the cpu utilization is below 50% and memory utilization is around 50%
there is no much load on the client.
I have attached the bpfis logs from client.
Failed job logs.
13/12/2012 4:19:05 PM - Info nbjm(pid=5080) starting backup job (jobid=214265) for client HSLCP2, policy HSLCP2, schedule CC-CP_Wed
13/12/2012 4:19:05 PM - estimated 0 Kbytes needed
13/12/2012 4:19:05 PM - Info nbjm(pid=5080) started backup job for client HSLCP2, policy HSLCP2, schedule CC-CP_Wed on storage unit symantec-hcart-robot-tld-0
13/12/2012 4:19:27 PM - Info bpbrm(pid=2556) HSLCP2 is the host to backup data from
13/12/2012 4:19:27 PM - Info bpbrm(pid=2556) telling media manager to start backup on client
13/12/2012 4:19:27 PM - Info bptm(pid=8092) using 65536 data buffer size
13/12/2012 4:19:27 PM - Info bptm(pid=8092) using 256 data buffers
13/12/2012 4:21:32 PM - Info bpbrm(pid=10096) sending bpsched msg: CONNECTING TO CLIENT FOR HSLCP2_1355395745
13/12/2012 4:21:32 PM - connecting
13/12/2012 4:22:14 PM - Error bpbrm(pid=10096) cannot create data socket, The operation completed successfully. (0)
13/12/2012 4:23:18 PM - Info bpbrm(pid=2556) child done, status 25
13/12/2012 4:23:18 PM - Info bpbrm(pid=2556) sending message to media manager: STOP BACKUP HSLCP2_1355395745
13/12/2012 4:23:19 PM - Info bpbrm(pid=2556) media manager for backup id HSLCP2_1355395745 exited with status 150: termination requested by administrator
13/12/2012 4:23:19 PM - end writing
cannot connect on socket(25)
================================================================================================================================
13/12/2012 4:19:05 PM - Info nbjm(pid=5080) starting backup job (jobid=214266) for client HSLCP2, policy HSLCP2, schedule CC-CP_Wed
13/12/2012 4:19:05 PM - Info nbjm(pid=5080) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=214266, request id:{1B990197-4962-48DB-87E2-1F79A4C1848B})
13/12/2012 4:19:05 PM - requesting resource symantec-hcart-robot-tld-0
13/12/2012 4:19:05 PM - requesting resource symantec.NBU_CLIENT.MAXJOBS.HSLCP2
13/12/2012 4:19:05 PM - requesting resource symantec.NBU_POLICY.MAXJOBS.HSLCP2
13/12/2012 4:19:05 PM - granted resource symantec.NBU_CLIENT.MAXJOBS.HSLCP2
13/12/2012 4:19:05 PM - granted resource symantec.NBU_POLICY.MAXJOBS.HSLCP2
13/12/2012 4:19:05 PM - granted resource DC2WD2
13/12/2012 4:19:05 PM - granted resource IBM.ULTRIUM-TD4.007
13/12/2012 4:19:05 PM - granted resource symantec-hcart-robot-tld-0
13/12/2012 4:19:06 PM - estimated 0 Kbytes needed
13/12/2012 4:19:06 PM - Info nbjm(pid=5080) started backup job for client HSLCP2, policy HSLCP2, schedule CC-CP_Wed on storage unit symantec-hcart-robot-tld-0
13/12/2012 4:21:32 PM - Info bpbrm(pid=2556) HSLCP2 is the host to backup data from
13/12/2012 4:21:32 PM - Info bpbrm(pid=2556) telling media manager to start backup on client
13/12/2012 4:21:32 PM - Info bptm(pid=8092) using 65536 data buffer size
13/12/2012 4:21:32 PM - Info bptm(pid=8092) using 256 data buffers
13/12/2012 4:23:41 PM - Info bpbrm(pid=9272) sending bpsched msg: CONNECTING TO CLIENT FOR HSLCP2_1355395746
13/12/2012 4:23:41 PM - connecting
13/12/2012 4:24:22 PM - Error bpbrm(pid=9272) cannot create data socket, The operation completed successfully. (0)
13/12/2012 4:25:46 PM - Info bpbrm(pid=2556) child done, status 25
13/12/2012 4:25:46 PM - Info bpbrm(pid=2556) sending message to media manager: STOP BACKUP HSLCP2_1355395746
13/12/2012 4:25:47 PM - Info bpbrm(pid=2556) media manager for backup id HSLCP2_1355395746 exited with status 150: termination requested by administrator
13/12/2012 4:25:47 PM - end writing
cannot connect on socket(25)
=====================================================================================================================================
3/12/2012 4:19:05 PM - Info nbjm(pid=5080) starting backup job (jobid=214267) for client HSLCP2, policy HSLCP2, schedule CC-CP_Wed
13/12/2012 4:19:05 PM - Info nbjm(pid=5080) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=214267, request id:{A4B0AD11-E6F4-4C74-9D6E-8180EA355058})
13/12/2012 4:19:05 PM - requesting resource symantec-hcart-robot-tld-0
13/12/2012 4:19:05 PM - requesting resource symantec.NBU_CLIENT.MAXJOBS.HSLCP2
13/12/2012 4:19:05 PM - requesting resource symantec.NBU_POLICY.MAXJOBS.HSLCP2
13/12/2012 4:19:05 PM - granted resource symantec.NBU_CLIENT.MAXJOBS.HSLCP2
13/12/2012 4:19:05 PM - granted resource symantec.NBU_POLICY.MAXJOBS.HSLCP2
13/12/2012 4:19:05 PM - granted resource DC2WD2
13/12/2012 4:19:05 PM - granted resource IBM.ULTRIUM-TD4.007
13/12/2012 4:19:05 PM - granted resource symantec-hcart-robot-tld-0
13/12/2012 4:19:07 PM - estimated 0 Kbytes needed
13/12/2012 4:19:07 PM - Info nbjm(pid=5080) started backup job for client HSLCP2, policy HSLCP2, schedule CC-CP_Wed on storage unit symantec-hcart-robot-tld-0
13/12/2012 4:24:03 PM - Info bpbrm(pid=2556) HSLCP2 is the host to backup data from
13/12/2012 4:24:03 PM - Info bpbrm(pid=2556) telling media manager to start backup on client
13/12/2012 4:24:04 PM - Info bptm(pid=8092) using 65536 data buffer size
13/12/2012 4:24:04 PM - Info bptm(pid=8092) using 256 data buffers
13/12/2012 4:26:09 PM - Info bpbrm(pid=6288) sending bpsched msg: CONNECTING TO CLIENT FOR HSLCP2_1355395747
13/12/2012 4:26:09 PM - connecting
13/12/2012 4:26:50 PM - Error bpbrm(pid=6288) cannot create data socket, The operation completed successfully. (0)
13/12/2012 4:28:15 PM - Info bpbrm(pid=2556) child done, status 25
13/12/2012 4:28:15 PM - Info bpbrm(pid=2556) sending message to media manager: STOP BACKUP HSLCP2_1355395747
13/12/2012 4:28:16 PM - Info bpbrm(pid=2556) media manager for backup id HSLCP2_1355395747 exited with status 150: termination requested by administrator
13/12/2012 4:28:16 PM - end writing
cannot connect on socket(25)
===================================================================================================================================
13/12/2012 4:16:32 PM - Info nbjm(pid=5080) starting backup job (jobid=214263) for client HSLCP2, policy HSLCP2, schedule CC-CP_Wed
13/12/2012 4:16:32 PM - Info nbjm(pid=5080) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=214263, request id:{C0B97312-C3E4-4688-8F99-76CA89051FD2})
13/12/2012 4:16:32 PM - requesting resource symantec-hcart-robot-tld-0
13/12/2012 4:16:32 PM - requesting resource symantec.NBU_CLIENT.MAXJOBS.HSLCP2
13/12/2012 4:16:32 PM - requesting resource symantec.NBU_POLICY.MAXJOBS.HSLCP2
13/12/2012 4:16:32 PM - granted resource symantec.NBU_CLIENT.MAXJOBS.HSLCP2
13/12/2012 4:16:32 PM - granted resource symantec.NBU_POLICY.MAXJOBS.HSLCP2
13/12/2012 4:16:32 PM - granted resource DC2WD2
13/12/2012 4:16:32 PM - granted resource IBM.ULTRIUM-TD4.007
13/12/2012 4:16:32 PM - granted resource symantec-hcart-robot-tld-0
13/12/2012 4:16:32 PM - estimated 0 Kbytes needed
13/12/2012 4:16:32 PM - begin Parent Job
13/12/2012 4:16:32 PM - begin Snapshot, Start Notify Script
13/12/2012 4:16:32 PM - started
13/12/2012 4:16:33 PM - Info RUNCMD(pid=8552) started
13/12/2012 4:16:33 PM - Info RUNCMD(pid=8552) exiting with status: 0
Status 0
13/12/2012 4:16:33 PM - end Snapshot, Start Notify Script; elapsed time: 00:00:01
13/12/2012 4:16:33 PM - begin Snapshot, Step By Condition
Status 0
13/12/2012 4:16:33 PM - end Snapshot, Step By Condition; elapsed time: 00:00:00
13/12/2012 4:16:33 PM - begin Snapshot, Stream Discovery
Status 0
13/12/2012 4:16:33 PM - end Snapshot, Stream Discovery; elapsed time: 00:00:00
13/12/2012 4:16:33 PM - begin Snapshot, Read File List
Status 0
13/12/2012 4:16:33 PM - end Snapshot, Read File List; elapsed time: 00:00:00
13/12/2012 4:16:33 PM - begin Snapshot, Create Snapshot
13/12/2012 4:16:33 PM - started process bpbrm (9676)
13/12/2012 4:17:19 PM - Info bpbrm(pid=9676) HSLCP2 is the host to backup data from
13/12/2012 4:17:19 PM - Info bpbrm(pid=9676) reading file list from client
13/12/2012 4:17:19 PM - Info bpbrm(pid=9676) start bpfis on client
13/12/2012 4:17:19 PM - begin Create Snapshot
13/12/2012 4:17:43 PM - Info bpfis(pid=928) Backup started
13/12/2012 4:18:43 PM - Info bpfis(pid=928) done. status: 0
13/12/2012 4:18:43 PM - end Create Snapshot; elapsed time: 00:01:24
13/12/2012 4:19:04 PM - end writing
Status 0
13/12/2012 4:19:04 PM - end Snapshot, Create Snapshot; elapsed time: 00:02:31
13/12/2012 4:19:04 PM - begin Snapshot, Policy Execution Manager Preprocessed
13/12/2012 4:19:09 PM - Info bpfis(pid=0) done. status: 0: the requested operation was successfully completed
Status 25
13/12/2012 4:31:48 PM - end Snapshot, Policy Execution Manager Preprocessed; elapsed time: 00:12:44
13/12/2012 4:31:48 PM - begin Snapshot, Stop On Error
Status 0
13/12/2012 4:31:48 PM - end Snapshot, Stop On Error; elapsed time: 00:00:00
13/12/2012 4:31:48 PM - begin Snapshot, Delete Snapshot On Exit
13/12/2012 4:31:48 PM - begin Snapshot, Delete Snapshot On Exit
13/12/2012 4:31:48 PM - started process bpbrm (10100)
13/12/2012 4:33:29 PM - end writing
Status 0
13/12/2012 4:33:29 PM - end Snapshot, Delete Snapshot On Exit; elapsed time: 00:01:41
13/12/2012 4:33:29 PM - begin Snapshot, End Notify Script
13/12/2012 4:33:29 PM - Info RUNCMD(pid=10208) started
13/12/2012 4:33:30 PM - Info RUNCMD(pid=10208) exiting with status: 0
Status 0
13/12/2012 4:33:30 PM - end Snapshot, End Notify Script; elapsed time: 00:00:01
Status 25
13/12/2012 4:33:30 PM - end Snapshot, Delete Snapshot On Exit; elapsed time: 00:01:42
cannot connect on socket(25)
12-14-2012 05:10 AM
hi,
what is bpclntcmd -pn saying when you run this on client side..
12-14-2012 05:24 AM
Is Master server also the media server?
Do you have bpcd log on the client?
Can you see that bpcd log is written to when backup fails with status 25?
12-14-2012 05:25 AM
hi ,
below is the Critical message from bpfis log that you have attached...
16:34:49.937 [4736.4884] <32> fis_rebuild_from_db: FTL - cannot open C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\bpfis.fim.HSLCP2_1355395592.0
below is the discussion
https://www-secure.symantec.com/connect/forums/failed-snapshot
copied form above discussion
#####
19:22:53.559 [6052.5744] <32> fis_rebuild_from_db: cannot open C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\bpfis.fim.Client.com_1311903706.0
... which compared to yours ...
18:06:42.368 [6632.8844] <32> fis_rebuild_from_db: FTL - cannot open C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\bpfis.fim.tus3monclupib02_1340154395.0
... is very similar.
Here was the solution :
"I raised a case with Symantec, took the Vxms logs and found that, they receiving a null on some data index, it could be caused by corruption.
So, adviced to run chkdsk/defrag command in volume D drive.
After doing a chkdsk/defrag on the drive, the backups are happening and no failures till date."
##########
you can also check
http://www.symantec.com/docs/TECH168824
and sometimes reboot of the client serves also would help..
hope this helps.
12-14-2012 07:26 AM
C:\Program Files\VERITAS\NetBackup\bin>Bpclntcmd.exe -pn
expecting response from server 10.60.0.120
hslcp2 hslcp2 124.30.56.102 1541
12-14-2012 07:47 AM
I ahve attached the bpcd logs.
C:\Program Files\VERITAS\NetBackup\bin>Bpclntcmd.exe -pn
expecting response from server 10.60.0.120
hslcp2 hslcp2 124.30.56.102 1541
12-14-2012 10:53 AM
Errors in bpcd are similar to those seen in TN http://www.symantec.com/docs/TECH162303
Note that this delay will not occur;
· If a firewall is not present and PBX is not running on the remote client, because the TCP stack on the remote host will return a TCP RST immediately upon receipt of the TCP SYN for port 1556.
· If a firewall is present, and does not permit connection requests to port 1556, and returns a TCP RST to the originator when it blocks connection to the port.
Solution
Other possible workarounds include:
A) Disabling multiplexing in the backup policy. Because bptm is started much later in the process flow, there are fewer cumulative delays and the vnetd process is less likely to timeout.
B) Extending the vnetd timeout. While this will prevent this particular failure, it will allow these delays to accumulate which may have undesirable effects on concurrent processes or later events in this process. This should only be used of if other options are not available.