Solved: Error opening the snapshot disks using given trans...

Doctorski · ‎07-22-2015

Hi,

Bit of a strange one this !

7.6.0.3 HP-UX Master, VMware backups using SAN transport to 5230's as the backup hosts.

All was working ok.

New LUN added, made visible to the appliances, cant back any VM up on new LUN. Get the status 23 error.

My storage guy assures me LUN has same access as all other LUNs so should be ok.

Now a few other previously working backups have started to fail. Ones residing on the old LUNs !

I have enabled vxms logging up to 8 but all I see in the logs is below.

Its says disk open error. I know that, just need to know why !

Any help much appreciated - Darren

07/21/2015 16:13:19 : setTimeouts:VMWareVIClient.cpp:110 <TRACE> : out
         07/21/2015 16:13:19 : connectToHost:VixGuest.cpp:937 <TRACE> : out
         07/21/2015 16:13:19 : openLeafSnapshotDisks:VixGuest.cpp:462 <TRACE> : connectToHost() success
         07/21/2015 16:13:19 : openLeafSnapshotDisks:VixGuest.cpp:483 <DEBUG> : Calling vdOpen(Disk: [3par_Live_LUN20] VM_NAME/VM_NAME-000002.vmdk, Flags: 0x4)
         07/21/2015 16:13:19 : vdOpen:VixInterface.cpp:437 <TRACE> : in
          07/21/2015 16:13:19 : vdOpen:VixInterface.cpp:464 <INFO> : Calling VixDiskLib_Open()
          07/21/2015 16:13:28 : vdOpen:VixInterface.cpp:471 <DEBUG> : Done with VixDiskLib_Open(): 0
          07/21/2015 16:13:28 : vdOpen:VixInterface.cpp:492 <ERROR> : VixDiskLib_Open() error: 13
          07/21/2015 16:13:28 : g_vixInterfaceLogger:libvix.cpp:1825 <DEBUG> : [VFM_ESINFO] 2015-07-21T16:13:28.476+01:00 [7FB0F153E700 trivia 'HttpConnectionPool-000000'] [DecC
onnectionCount] Number of connections to <cs p:00007fb0e402d950, TCP:vcenter-b:443> decremented to 0

         07/21/2015 16:13:28 : vdOpen:VixInterface.cpp:437 <TRACE> : out
         07/21/2015 16:13:28 : openLeafSnapshotDisks:VixGuest.cpp:494 <DEBUG> : vdOpen() error = 13. Calling closeLeafSnapshotDisks()
         07/21/2015 16:13:28 : g_vixInterfaceLogger:libvix.cpp:1825 <DEBUG> : [VFM_ESINFO] 2015-07-21T16:13:28.476+01:00 [7FB0EBFFF700 trivia 'ThreadPool'] HandleWork() leaving

         07/21/2015 16:13:28 : closeLeafSnapshotDisks:VixGuest.cpp:382 <TRACE> : in
         07/21/2015 16:13:28 : closeLeafSnapshotDisks:VixGuest.cpp:382 <TRACE> : out
        07/21/2015 16:13:28 : openLeafSnapshotDisks:VixGuest.cpp:407 <TRACE> : out
        07/21/2015 16:13:28 : g_vixInterfaceLogger:libvix.cpp:1825 <DEBUG> : [VFM_ESINFO] 2015-07-21T16:13:28.476+01:00 [7FB0F157F700 trivia 'ThreadPool'] ThreadPool[idle:3, bus
y_io:1, busy_long:0] HandleWork(type: 0, fun: N5boost3_bi6bind_tIvNS_4_mfi3mf3IvN7Vmacore6System14ThreadPoolAsioERKNS_10shared_ptrINS4_9ExceptionEEEiRKNS4_7FunctorIvPS8_iNS4_3Ni
lESE_SE_SE_SE_EEEENS0_5list4INS0_5valueINS4_3RefIS6_EEEENSK_IS9_EENSK_IiEENSK_ISF_EEEEEE)

Doctorski · ‎08-03-2015

This has now been fixed in conjunction with Support.

There were conflicts between the scsci layer and the VxDMP layer and we were seeing some disks twice and some disks not at all.

To clear these conflicts we had to rebuild the device tree.

Commands performed in conjunction with support as below.

mv /etc/vx/array.info /etc/vx/array.info.old
mv /etc/vx/disk.info /etc/vx/disk.info.old
mv /etc/vx/jbod.info /etc/vx/jbod.info.old
mv /etc/vx/dmppolicy.info /etc/vx/dmppolicy.info.old --> If this file exists else ignore.

Remove all the entries from the /dev/dsk, /dev/rdsk, /dev/vxdmp, and /dev/vx/rdmp directories except entires for bootdisk or bootmirror from the following directories.
rm -rf /dev/vx/dmp/*
rm -rf /dev/vx/rdmp/*
rm -rf /dev/sd*

vxconfigd -k -x cleartempdir

We are now back with successful SAN based backups.

This fix applied to us and may not apply to your issue so I would not recommend these being issued without support in attendance.

- Darren

View solution in original post

revarooo · ‎07-22-2015

Have you tried using transport method NBD and also snapshotting the client from the Vsphere client?

RamNagalla · ‎07-22-2015

could you attach the vxms log to this post...? make sure as an attachment..

what is the Vcetner Version?

Doctorski · ‎07-22-2015

Hi Revaroo, NBD works but isnt reall an option moving forward. The snapshot is working both in NBU and Vsphere.

RamNagall, log attached.

RamNagalla · ‎07-22-2015

07/21/2015 16:13:28 : log:Error.cpp:265 <WARN> : Error: 0x00000017 occured in file VixGuest.cpp, at line 640
07/21/2015 16:13:28 : log:Error.cpp:265 <WARN> : Error: 0x00000017 occured in file VixGuest.cpp, at line 640
07/21/2015 16:13:28 : vixMapObjCtl:VixCoordinator.cpp:987 <ERROR> : Returning: 23

have a look into the below Tech note

https://support.symantec.com/en_US/article.TECH225372.html

what are the permissions does the accout being used for backup over the Vcetner? does it have Full admin Privilages?

you could try with some other accout having the Full privilages and see how it goes.

Doctorski · ‎07-22-2015

Hi RamNagall,

I have lots of other SAN transport backups working using the exact same mechanisms so I dont think its priviledges.

RamNagalla · ‎07-22-2015

hi Darren,

i certainly understand that. however when the new datastore is created in Vmware end if the privilates are not properly defiend it can caause these type of issues...

to eliminate this , i have asked about the type of privilages that is set to user account, if its a Full Privilages across the Vcenter, privilages can not be the case..

but if the privilages definded seperatly for each and every component we cannot overlook it.

Doctorski · ‎07-22-2015

Ok thanks RamNagalla, will confirm with my VCenter guy and post back.

Doctorski · ‎07-23-2015

Hi RamNagalla, full access is granted to the new datastore.

I also have an existing backup that has started failing.

Marianne · ‎07-23-2015

Get Storage guy to double check zoning as well as LUN assignment.

Compare lun zoning, lun assignment as well as permissions with other working luns.

We are not going to solve this from NBU point of view...

Handy NetBackup Links

maurijo · ‎07-23-2015

Check for stuck "deltas" or snapshots on storage.

Like Marianne says, this error is usually not nbu related. Last time I had this there was a snapshot stuck on the virtual machine (snapshotting from vcenter still worked).

Doctorski · ‎07-23-2015

Hi Marianne,

Thanks, yes I don't think its an NBU issue but don't have the ammunition to show so.

I have a call in with Symantec and now have an EEB to put the full vxms logging back when VERBOSE = 8.

Apparently there was an issue previously with sensitive info being displayed when vxms logging was set to VERBOSE = 8, user / passwords etc and this was taken out in 2603 / 2604.

The EEB puts logging back to the old "full" level.

Will post back what happens.

Thanks - Darren

Doctorski · ‎07-24-2015

My sysadmin guy has done a bit of poking around on the appliance and has come up with the following comment.

We can see 23 unique scsi ids but only 19 unique ones from vxdmpadm command

vxdmpadm list dmpnode | awk '/^dmpdev/{d=$3}/^scsi3_vpd/{w=$3}/^num_paths/{print d,w,$3 ; d="" ; w=""}' | grep 3par

3pardata0_0 60002AC0000000000E00071200004D62 2

3pardata0_0_1 60002AC0000000000D000E1700004D62 2

3pardata0_0_2 60002AC000000000040011BC00004D62 2

3pardata0_0_3 60002AC000000000040011BD00004D62 2

3pardata0_0_4 60002AC000000000040011BE00004D62 2

3pardata0_0_5 60002AC00000000000001AF700004D62 2

3pardata0_0_7 60002AC00000000000001AF900004D62 2

3pardata0_0_8 60002AC00000000000001AFA00004D62 2

3pardata0_0_9 60002AC00000000000001AFB00004D62 2

3pardata0_0_10 60002AC00000000000001AFC00004D62 2

3pardata0_0_11 60002AC00000000000001AFD00004D62 2

3pardata0_0_12 60002AC00000000000001AFE00004D62 2

3pardata0_0_13 60002AC00000000000001AFF00004D62 2

3pardata0_0_14 60002AC00000000000001B0B00004D62 2

3pardata0_0_15 60002AC000000000570031AB00004D62 2

3pardata0_0_16 60002AC000000000570031AC00004D62 2

3pardata0_0_17 60002AC000000000FA00600900004D62 2

3pardata0_0_18 60002AC000000000FA00602200004D62 2

3pardata0_0_19 60002AC000000000FA00603100004D62 2

From this we see there are 23 DMP nodes, however there is duplication. If we sort and count these: -

1 3pardata0_0 60002AC0000000000E00071200004D62 2

1 3pardata0_0_1 60002AC0000000000D000E1700004D62 2

1 3pardata0_0_10 60002AC00000000000001AFC00004D62 2

1 3pardata0_0_11 60002AC00000000000001AFD00004D62 2

1 3pardata0_0_12 60002AC00000000000001AFE00004D62 2

1 3pardata0_0_13 60002AC00000000000001AFF00004D62 2

1 3pardata0_0_14 60002AC00000000000001B0B00004D62 2

2 3pardata0_0_15 60002AC000000000570031AB00004D62 2

2 3pardata0_0_16 60002AC000000000570031AC00004D62 2

3 3pardata0_0_17 60002AC000000000FA00600900004D62 2

1 3pardata0_0_18 60002AC000000000FA00602200004D62 2

1 3pardata0_0_19 60002AC000000000FA00603100004D62 2

1 3pardata0_0_2 60002AC000000000040011BC00004D62 2

1 3pardata0_0_3 60002AC000000000040011BD00004D62 2

1 3pardata0_0_4 60002AC000000000040011BE00004D62 2

1 3pardata0_0_5 60002AC00000000000001AF700004D62 2

1 3pardata0_0_7 60002AC00000000000001AF900004D62 2

1 3pardata0_0_8 60002AC00000000000001AFA00004D62 2

1 3pardata0_0_9 60002AC00000000000001AFB00004D62 2

It appears as though something is odd between the raw SCSI layer on the appliance and the VxDMP layer.

Perhaps there’s some LUN persistence cached somewhere?

Any ideas ?

Thanks - Darren

sdo · ‎07-25-2015

I'm not sure if this is relevant or not, but might be another thing to check. The more advanced storage arrays usually have a method/setting which allows storage admins to configure the host storage group (i.e. the logical construct of initiator WWPN plus target WWPN plus LUNs) mode to a 'type' i.e. an OS family type... e.g. Windows, Unix, VMware, etc... When I have previously presented LUNs (from HDS and NetApp) to VMware ESXi servers I would set storage array presentation group and/or LUNs to VMware type - and present this storage to both VMware ESXi hosts and Symantec NetBackup Appliances... thus the storage array presentation group and/or LUNs are still of type "VMware" even when presenting to a SuSE Linux based Symantec Appliance - therefore maybe your storage admin/team could check to make sure that they have not presented the storage presentation group and/or LUNs as Linux or Unix type - because I suspect that the VDDK (i.e. the actual API binaries provided to Symantec from VMware, which Symantec then embed within their own software) needs to see the VMFS storage LUNs as type 'VMware' even though the actual VMware backup host (i.e. master/media) is actually of OS family type Linux. Also some storage arrays require special 'flags' to be set on the 'storage presentation group' for certain OSes, i.e. maybe your storage array vendor has some documentation regarding 'type/mode' and or 'flags' to be used when co-presenting VMware LUNs to a Linux (NetBackup Appliance) server.

Doctorski · ‎08-03-2015

This has now been fixed in conjunction with Support.

There were conflicts between the scsci layer and the VxDMP layer and we were seeing some disks twice and some disks not at all.

To clear these conflicts we had to rebuild the device tree.

Commands performed in conjunction with support as below.

mv /etc/vx/array.info /etc/vx/array.info.old
mv /etc/vx/disk.info /etc/vx/disk.info.old
mv /etc/vx/jbod.info /etc/vx/jbod.info.old
mv /etc/vx/dmppolicy.info /etc/vx/dmppolicy.info.old --> If this file exists else ignore.

Remove all the entries from the /dev/dsk, /dev/rdsk, /dev/vxdmp, and /dev/vx/rdmp directories except entires for bootdisk or bootmirror from the following directories.
rm -rf /dev/vx/dmp/*
rm -rf /dev/vx/rdmp/*
rm -rf /dev/sd*

vxconfigd -k -x cleartempdir

We are now back with successful SAN based backups.

This fix applied to us and may not apply to your issue so I would not recommend these being issued without support in attendance.

- Darren

Doctorski · ‎08-03-2015

Thanks for all your help.

VOX

Error opening the snapshot disks using given transport mode: Status 23