http://sunsolve.sun.com/search/document.do?assetkey=1-62-210997-1

sunguru · ‎11-20-2009

Hi

I have SUN M5000 running Solaris 10 Generic_141414-01 ( Solaris 10 10/08 s10s_u6wos_07b SPARC)

Foundation suite and cluster version :-

36 1346648 3a728 289   1 vxdmp (VxVM 5.0MP1-2007-01-22a: DMP Dr)
38 7c002000 33fd48 290   1 vxio (VxVM 5.0MP1-2007-01-22a I/O dri)
40 137cf10    d48 291   1 vxspec (VxVM 5.0MP1-2007-01-22a control)
211 7b37b2a0    c30 292   1 vxportal (VxFS 5.0_REV-5.0MP1u_sol portal)
212 7ae00000 1bc8a0 21   1 vxfs (VxFS 5.0_REV-5.0MP1u_sol SunOS )
215 7b620000 4fe28 296   1 vxfen (VRTS Fence 5.0MP1)
230 7afd4000   a2c8 293   1 fdd (VxQIO 5.0_REV-5.0MP1u_sol Quick)

Recently there was a disk failure in testdg disk group , SUN camein and i run the below command to remove the disk from Solaris before he replace the drive physically

cfgadm -c unconfigure c1::dsk/c1t0d0

got the messages device being in use vxvm

so i removed the failed disk pls and disk completely from veritas and now the volume is associated with only one disk .

so SUN replaced the disk and i executed cfgadm -al , devfsadm -Cv, vxdctl enable but not able to see the new disk
so i tried many commands but no luck , so can someone pls suggest if you have come across similar problem before and is there any way to fix without rebooting the box.

cfgadm command does list the replaced drive c1t0d0
cfgadm -f -c configure c1::dsk/c1t0d0
cfgadm: Hardware specific failure: failed to configure SCSI device: I/O error
c1::sd3                        disk         connected    unconfigured unknown

sunguru · ‎11-24-2009

Lee, sorry for the typo , i just rebooted the server and no disk reseating was done, this cleared all underlying problems,
i am sure this problem will occur again and will give it a try thr'o vxdiskadm , and also i didn't notice this problem in other SUN models like T5120, Fuji Primepower, SUN 240 , etc running solaris 10 , looks like this could be a uniq to SUN M series boxes, lets's see how it goes in future.
and will keep posting here the questions.

View solution in original post

g_lee · ‎11-20-2009

as cfgadm -c configure is complaining, it sounds as though it's more a hardware/Solaris issue than Veritas (since you mentioned you removed the disk from veritas completely prior to physically replacing the disk) - suggest following up with Sun to resolve the cfgadm problem.

Note: for future disk replacements. rather than deleting config details from the dg, remove disk for replacement:
- run: vxdiskadm -> option 4 (Remove a disk for replacement) [follow prompts]
- then the OS steps (cfgadm -c unconfigure... etc)
- physically replace the disk
- run OS commands to pick up the replaced disk (cfgadm -c configure ...)
- then replace disk in vxvm with vxdiskadm -> option 5 (Replace a failed or removed disk)

Unfortunately as the disk has already been replaced / already getting cfgadm errors you need to resolve this first before putting the disk back into VxVM (and you will need recreate config manually if you deleted it, etc).

sunguru · ‎11-20-2009

Ty g_lee, i just found the below article from sunsolve.sun.com , can u pls let me know if this can be tried on Solaris 10 and does this going to help ?

http://sunsolve.sun.com/search/document.do?assetkey=1-62-210997-1

The following example assumes the failed boot disk is c0t0d0.
1. Change the dump device from c0t0d0s1 to alternate boot disk.
Example of swap device of alternate boot disk is c1t1d0s1:
# dumpadm -d /dev/dsk/c1t1d0s1
2. Execute /usr/sbin/vxdiskadm option 4, then execute /usr/sbin/vxdiskadm option 11 for the failed boot disk.
3. Run cfgadm -c unconfigure c#::dsk/c#t#d0. To identify "c#::dsk/c#t#d0", we need to execute the following command in advance if you are not sure.
Example:
# /usr/sbin/cfgadm -al | grep c0t0d0
c0::dsk/c0t0d0 disk connected configured unknown
Then execute cfgadm -c unconfigure c0::dsk/c0t0d0. However, if the failed boot disk is parallel SCSI disk and VERITAS Volume Manager[TM]'s version is 4.0 or above, the following error may occur.
# cfgadm -c unconfigure c0::dsk/c0t0d0
cfgadm: Component system is busy, try again: failed to offline:
    Resource              Information
------------------ -------------------------
/dev/dsk/c0t0d0s2   Device being used by VxVM
This error prevents to proceed next step for the disk replacement procedure.

Resolution

If "Device being used by VxVM" error occurred while executing "cfgadm -c unconfigure c0::dsk/c0t0d0", this issue can be avoided with the following procedure.

1) Run the following command to rename 'es_rcm.pl' script.
# mv /etc/rcm/scripts/es_rcm.pl /etc/rcm/scripts/DONTUSE
NOTE: VxVM 4.1 and above 'es_rcm.pl' script is under /usr/lib/rcm/scripts
# mv /usr/lib/rcm/scripts/es_rcm.pl /usr/lib/rcm/scripts/DONTUSE
2) Run "/usr/sbin/cfgadm -c unconfigure ......" again. This command will finish successfully.
Example:
# /usr/sbin/cfgadm -c unconfigure c0::dsk/c0t0d0
3) After that, rename back file "DONTUSE" to the original name.
# mv /etc/rcm/scripts/DONTUSE /etc/rcm/scripts/es_rcm.pl
NOTE: VxVM 4.1 and above 'es_rcm.pl' script is under /usr/lib/rcm/scripts
# mv /usr/lib/rcm/scripts/DONTUSE /usr/lib/rcm/scripts/es_rcm.pl
4) Then physically replace the failed boot disk with new one.
5) Run the following command.
# cfgadm -c configure c0::dsk/c0t0d0
6) Run VERITAS Volume Manager[TM]'s usual procedure.
# /usr/sbin/vxdctl enable
Then run "/usr/sbin/vxdiskadm" option 10 for c0t0d0 disk. Finally, run "/usr/sbin/vxdiskadm" option 5 for c0t0d0 disk.
If dealing with single path internal mirror disk not boot disk and/or the
file above does not exist issue maybe due to vxdmp not letting go of only path to disk. So do following, see example below:
hostname:/etc/vx/bin# vxdmpadm getsubpaths ctlr=c1
NAME         STATE      PATH-TYPE[M] DMPNODENAME ENCLR-TYPE   ENCLR-NAME   ATTRS
================================================================================
c1t0d0s2     ENABLED       -          c1t0d0s2     Disk         Disk           -
c1t1d0s2     ENABLED       -          c1t1d0s2     Disk         Disk           -
hostname:/etc/vx/bin# vxdmpadm -f disable path=c1t1d0s2
hostname:/etc/vx/bin# vxdmpadm getsubpaths ctlr=c1
NAME         STATE      PATH-TYPE[M] DMPNODENAME ENCLR-TYPE   ENCLR-NAME   ATTRS
================================================================================
c1t0d0s2     ENABLED      -          c1t0d0s2     Disk         Disk           -
c1t1d0s2     DISABLED     -          c1t1d0s2     Disk         Disk           -
hostname:/etc/vx/bin# vxdctl enable
hostname:/etc/vx/bin# cfgadm -c unconfigure c1::dsk/c1t1d0 <<< worked now
vxdmpadm getsubpaths ctlr=c1 showed as enabled again, vxdctl enable
then customer proceeded with vxdiskadm option 5.

Additional Information

Note: Symantec considers this as an RFE and will not be fixing this issue in VxSF 4.0, but will consider this to be fixed in future release (VxSF 4.1x and 5.x) since it needs a major re-design - not a trivial bug fix.

As of May 29th 2007, VxSF 4.1x and 5.x patch for this issue are still pending.

Product
VERITAS Storage Foundation 4.0 Software (Localized)
VERITAS Storage Foundation 4.0 Software
VERITAS Volume Manager 4.0 Software
VERITAS Volume Manager 4.0 Software (Localized)
VERITAS Volume Manager 4.1 Software
VERITAS Storage Foundation 5.0 Software
VERITAS Storage Foundation 4.1 Software

Gaurav_S · ‎11-22-2009

Hello,

I don't see any harm in opting the above mentioned procedure on Solaris 10. Since most of the commands & options are same, moreever solaris functionality on cfgadm is same.

Just out of above solution, was thinking, does Sun M5000 uses an internal FC disk ? If yes, then there could be steps involved like luxadm.... I would consider following steps:

a) Remove disk from Veritas (vxdiskadm option 4)
b) Offline disk from OS (luxadm offline)
c) Remove disk from OS (luxadm remove)
d) replace the disk
e) Scan in OS (devfsadm)
f) Scan in Veritas (vxdctl enable)
g) vxdiskadm option 5 to add back in DG

Gaurav

g_lee · ‎11-23-2009

sunguru,

Per the Sunsolve document, that procedure is only applicable to Solaris 8 and 9, so although you can attempt using it, it may not work/would not be effective on Solaris 10. Moreover, although I can't find the exact reference to confirm, am fairly sure that the es_rcm.pl issue was resolved by 5.0MP1 (which you have) so it still wouldn't apply even if you did have Solaris 8/9 (realise the infodoc says it's not resolved, however the date given is 2007! my recollection is that it was fixed by 4.1MP2, so imagine the same fix would have made it to 5.0)

Have you tried to unconfigure the disk again, and then reconfiguring it? (and/or unconfigure, then reseat the disk, then configure?). If this is still not successful then you can try the Infodoc as last resort, but otherwise it appears you may need to reboot to resolve the issue this time (and next time use vxdiskadm before running OS commands to remove the disk, etc)

fyi Gaurav:

Solaris 10 M5000 would use cfgadm. For most newer machines/models on Solaris 10, cfgadm gets used rather than luxadm for disk replacement (ie: cfgadm does the tasks that would have been done by luxadm on older versions/older models. luxadm still can be used to display output/details of paths for applicable FC-AL, etc devices).

sunguru · ‎11-23-2009

Thanks Lee and Gaurav,

its not a fibre channel disk , the problem is reasolved by rebooting the disk , i agree with Lee , veritas might have already over come ths es_rcm.pl" proble, since its a quite old article, but i will try for future disk failures .

thanks both for your help.

g_lee · ‎11-24-2009

sunguru,

just to clarify, was the problem resolved by rebooting the server, or reseating the disk? (you mentioned "rebooting the disk" above, hence not sure what was meant exactly)

Regarding future disk replacements - please use the vxdiskadm steps (ie: remove disk for replacement with option 4) to see if this works before attempting to modify any of the es_rcm.pl parts, as it was most likely the fact that they hadn't been run before running the cfgadm -c unconfigure that caused the issue in the first place.

sunguru · ‎11-24-2009

Lee, sorry for the typo , i just rebooted the server and no disk reseating was done, this cleared all underlying problems,
i am sure this problem will occur again and will give it a try thr'o vxdiskadm , and also i didn't notice this problem in other SUN models like T5120, Fuji Primepower, SUN 240 , etc running solaris 10 , looks like this could be a uniq to SUN M series boxes, lets's see how it goes in future.
and will keep posting here the questions.

g_lee · ‎11-24-2009

thanks for clarifying.

Best practice to replace a disk in a dg is to always use vxdiskadm to remove the disk before running the OS commands to replace the disk, regardless of model/hwtype - see here for basic/overview of steps:
http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/html/vxvm_admin/ch02s18.htm
there are also several Sun Infodocs that mention this, with detailed instructions depending on version of VxVM, scsi rather than fc drives, etc

If this hasn't been done before it's extremely surprising that you haven't encountered issues before as you would have problems putting the disk back into the dg once it was replaced (even if the OS did replace OK) - were the disks replaced earlier also under VxVM control or were they SDS/SVM/other disks?

sunguru · ‎11-24-2009

In most cases once the disk( mirrored) is failed means its almost out of the veritas control ( technically not)
most case standard pratcie is to replace the disk, run devfsadm , label and use vxdiskadm option 5 which will replace the disk automaticlly and resync's
unless if there is fibre channel use luxadm commands , etc.

still this process works pretty good in solaris 8 and solaris 10 boxes as well and most of the SUN's hardware, but first time i have come across this kind of problem in solaris 10 and never seen this before after release of 4.x foundation suite.
when we try to remove the failed disk for replacement using vxdiskadm option 4 --> i have seen vxvm saying disk is already removed or nothing to remove --> similar messages , ( since disk is already failed and nothing is there to remove for vxvm) so choice was to go option 5 and replace the disk.

untill the reboot was done i am not sure why cfgmgr/vxvm holding the devid info in perticular OS though disk is replaced and device paths have been cleaned.i think this could be a sun bug but didn't find any results with google search.

VOX

Device being used by VxVM

http://sunsolve.sun.com/search/document.do?assetkey=1-62-210997-1