Solved: Re: NetBackup for VMware error 13

Iwan_Tamimi · ‎02-14-2016

Hi All,

Our master server running on RHel 6.1 NB 7.7.1 and Media Server on Windows 2008 NB 7.7.1. VMware is 5.5 (The client is a RHel5.6) We have couple hunders of VMware client that were already running fine. about more that a week ago some the the client begin to failed (but other on the same folder still running fined BTW the percentage which failed are around only bellow 1% but still has to be resolved)

During the error 13 failure some file already successfully (?) backup by seeing the log:

02/15/2016 11:36:05 - Info bpbkar32 (pid=8236) 285000 entries sent to bpdbm
02/15/2016 11:36:11 - Info bpbkar32 (pid=8236) 301604 entries sent to bpdbm
02/15/2016 11:36:11 - Info bpbkar32 (pid=8236) 301605 entries sent to bpdbm
02/15/2016 11:36:15 - Error bpbrm (pid=8132) socket read failed, An existing connection was forcibly closed by the remote host. (10054)
02/15/2016 11:36:23 - Error bpbrm (pid=8132) could not send server status message
02/15/2016 11:36:25 - Info bpbkar32 (pid=0) done. status: 13: file read failed
02/15/2016 11:36:25 - end writing; write time: 0:01:44
file read failed (13)

(it is failed after "Current File Written" 148817 funny thing after I consolidate the snapshot and rerun again, it exacly failed at 148817 again even I tried the 3rd times, I don't know if it meant somthing)

I check the logs /var/log/hostd.log on esx server

There is some error:

2016-02-15T03:34:12.993Z [3B580B70 verbose 'Vmsvc.vm:/vmfs/volumes/561361d9-cf3232d4-10da-0017a4770404/vascasms01s/vascasms01s.vmx' opID=4978794e-b6 user=vpxuser] Create Snapshot: NBU_SNAPSHOT ebs5-bck 1455507248, memory=false, quiescent=true state=4
2016-02-15T03:34:12.993Z [3B580B70 info 'Vmsvc.vm:/vmfs/volumes/561361d9-cf3232d4-10da-0017a4770404/vascasms01s/vascasms01s.vmx' opID=4978794e-b6 user=vpxuser] State Transition (VM_STATE_ON -> VM_STATE_CREATE_SNAPSHOT)
2016-02-15T03:34:15.070Z [3B580B70 info 'Vimsvc.ha-eventmgr'] Event 73292 : The dvPort 68 link was down in the vSphere Distributed Switch in ha-datacenter
2016-02-15T03:34:15.071Z [3B580B70 info 'Vimsvc.ha-eventmgr'] Event 73293 : The dvPort 68 was not in passthrough mode in the vSphere Distributed Switch in ha-datacenter.
2016-02-15T03:34:15.507Z [39DC2B70 info 'Vimsvc.ha-eventmgr'] Event 73294 : The dvPort 68 was not in passthrough mode in the vSphere Distributed Switch in ha-datacenter.
2016-02-15T03:34:15.508Z [39DC2B70 info 'Vimsvc.ha-eventmgr'] Event 73295 : The dvPort 68 was unblocked in the vSphere Distributed Switch in ha-datacenter.
2016-02-15T03:34:15.509Z [39DC2B70 info 'Vimsvc.ha-eventmgr'] Event 73296 : The dvPort 68 was not in passthrough mode in the vSphere Distributed Switch in ha-datacenter.
2016-02-15T03:34:15.510Z [39DC2B70 info 'Vimsvc.ha-eventmgr'] Event 73297 : The dvPort 68 link was up in the vSphere Distributed Switch in ha-datacenter
2016-02-15T03:34:15.512Z [39DC2B70 info 'Vimsvc.ha-eventmgr'] Event 73298 : The dvPort 68 was not in passthrough mode in the vSphere Distributed Switch in ha-datacenter.
2016-02-15T03:34:16.148Z [FFCEEB70 info 'Hostsvc' opID=hostd-ab43] VsanSystemVmkProvider : GetConfig: Start
2016-02-15T03:34:16.148Z [FFCEEB70 info 'Hostsvc' opID=hostd-ab43] VsanSystemVmkProvider : GetConfig: Complete

Does it mean something?

Anyone know how to solve this?

Regards,

Iwan

sdo · ‎02-18-2016

@Iwan - it is highly likely that the Winsock errors are an "artefact" of the whatever the real problem is. When other processes related to backups die unexpectadly then this can cause surviving processes to report Winsock errors because the 'far side' of the TCP protocol converstation has died unexpectedly because the process no longer exists. IMO, personally, I wouldn't look for TCP issues right now.

So, Iwan, there are plenty of topics in this forum regarding how to capture the logs.

View solution in original post

sdo · ‎02-15-2016

Do you have VMware support?

I would look to them first, because your have hundreds of working backups, so it would not appear to be a configuration or NetBackup issue per-se, but rather would seem to be something quite odd with one or two VMs.

Did you upgrade anything recently, and then the problem appeared after the upgrade? If so, what was upgraded, and from what original version, to what new version?

Iwan_Tamimi · ‎02-17-2016

Hi sdo,

Thank you for your response.

Yes we upgrade to 7.7.1 on the Media Server (from 7.6.0.4) but it just affect several VM and all the affected VM are RedHat Linux. We have VMs with different OSs (ok other than RedHat we only have Windows servers) on the same esx or the same datastorage.

I am still escalating to Veritas, but still cannot find why.

Any suggestion?

Regards,

Iwan

sdo · ‎02-18-2016

Without digging through detailed logs, we won't be able to fathom what is hapenning.

Are you sure there is nothing "odd" or different about the VMs that fail? Something about them that is different to others?

Do they have SYMCquiesce installed?

What file system types (Ext2/3/4, XFS, other?) are used inside the VMs which have failing backups?

Maybe the VMs need to have fsck run?

Do you know how to create the NetBackup VxMS logs, and other traditional normal NetBackup logs?

Suchet_Siddu_J_ · ‎02-18-2016

02/15/2016 11:36:15 - Error bpbrm (pid=8132) socket read failed, An existing connection was forcibly closed by the remote host. (10054)

Winsock errors 10053 and 10054 are TCP/IP errors that occur at the networking layer of the International Organization for Standardization (ISO) model.

Refer : https://www.veritas.com/support/en_US/article.TECH37372

sdo · ‎02-18-2016

@Iwan - it is highly likely that the Winsock errors are an "artefact" of the whatever the real problem is. When other processes related to backups die unexpectadly then this can cause surviving processes to report Winsock errors because the 'far side' of the TCP protocol converstation has died unexpectedly because the process no longer exists. IMO, personally, I wouldn't look for TCP issues right now.

So, Iwan, there are plenty of topics in this forum regarding how to capture the logs.

areznik · ‎02-18-2016

You're attempting to take a snap with memory=false, quiescent=true and its only failing for RHEL VMs - i'd say the most likely culprit is SYMCquiesce (as sdo mentioned). Check if its installed (its required if you want to quiesce on linux) and see if you can get any logs from it. By default the SYMCquiesce logs are in /opt/SYMCquiesce/logs

nbutech · ‎02-19-2016

About the SYMCquiesce utility

https://www.veritas.com/support/en_US/article.HOWTO70978

VOX

NetBackup for VMware error 13

About the SYMCquiesce utility