Forum Discussion

symsonu's avatar
symsonu
Level 6
13 years ago

vxdisk list showing dg disabled

Hello Friends,

 

I issues vxdctl enable command  in a two node VCS cluster and  it made all the dgs disabled as shown below

[root@ruabon1 ~]# /opt/scripts/stgmgr/qlogic_scan_v23.sh
Issuing LIP on host0
Scanning HOST: host0
.............
Issuing LIP on host1
Scanning HOST: host1
.............
No changes discovered in existing devices.
[root@ruabon1 ~]# vxdctl enable

You have new mail in /var/spool/mail/root
[root@ruabon1 ~]#
[root@ruabon1 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
sda          auto:cdsdisk    POEBS_ORAAPPL_01  POEBS_ORAAPPL_DG online dgdisabled
sdaa         auto:cdsdisk    POEBS_ORADATA01_15  POEBS_ORA_IMP_EXP_DG online dgdisabled
sdah         auto:cdsdisk    POEBS_ORAAPPL_03  POEBS_ORAAPPL_NFS_DG online dgdisabled
sdb          auto:cdsdisk    POEBS_ORADATA01_01  POEBS_ORADATA01_DG online dgdisabled
sdbq         auto:cdsdisk    POEBS_ORADATA01_17  POEBS_ORADATA01_DG online dgdisabled
sdc          auto:cdsdisk    POEBS_ORADATA01_02  POEBS_ORADATA01_DG online dgdisabled
sdd          auto:cdsdisk    POEBS_ORAREDO01_01  POEBS_ORAREDO01_DG online dgdisabled
sde          auto:cdsdisk    POEBS_ORALOGS_01  POEBS_ORALOGS_DG online dgdisabled
sdf          auto:cdsdisk    POEBS_ORAREDO02_01  POEBS_ORAREDO02_DG online dgdisabled
sdg          auto:cdsdisk    POEBS_ORADATA01_09  POEBS_ORADATA01_DG online dgdisabled
sdh          auto:cdsdisk    POEBS_ORADATA01_16  POEBS_ORADATA01_DG online dgdisabled
sdi          auto:cdsdisk    POEBS_ORAREDO01_02x  POEBS_ORAREDO01_DG online dgdisabled
sdj          auto:cdsdisk    POEBS_ORAREDO02_02x  POEBS_ORAREDO02_DG online dgdisabled
sdk          auto:cdsdisk    POEBS_ORA_IMP_EXP_DG01  POEBS_ORA_IMP_EXP_DG online dgdisabled
sdl          auto:cdsdisk    POEBS_ORAAPPL_02  POEBS_ORAAPPL_NFS_DG online dgdisabled
sdm          auto:cdsdisk    POEBS_MQ_01  POEBS_MQ_DG  online dgdisabled
sdn          auto:cdsdisk    POEBS_ORADATA01_03  POEBS_ORADATA01_DG online dgdisabled
sdo          auto:cdsdisk    POEBS_ORADATA01_04  POEBS_ORADATA01_DG online dgdisabled
sdp          auto:cdsdisk    POEBS_ORADATA01_05  POEBS_ORADATA01_DG online dgdisabled
sdq          auto:cdsdisk    POEBS_ORADATA01_06  POEBS_ORADATA01_DG online dgdisabled
sdr          auto:cdsdisk    POEBS_ORADATA01_07  POEBS_ORADATA01_DG online dgdisabled
sds          auto:cdsdisk    POEBS_ORADATA01_08  POEBS_ORADATA01_DG online dgdisabled
sdt          auto:cdsdisk    POEBS_ORADATA01_10  POEBS_ORADATA01_DG online dgdisabled
sdu          auto:cdsdisk    POEBS_ORADATA01_11  POEBS_ORADATA01_DG online dgdisabled
sdv          auto:cdsdisk    POEBS_ORADATA01_12  POEBS_ORADATA01_DG online dgdisabled
sdw          auto:cdsdisk    POEBS_ORADATA01_13  POEBS_ORADATA01_DG online dgdisabled
sdx          auto:cdsdisk    POEBS_ORADATA01_14  POEBS_ORADATA01_DG online dgdisabled
sdy          auto            -            -            error
sdz          auto            -            -            error

 

[root@ruabon1 ~]# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen             

A  ruabon1              RUNNING              0                   
A  ruabon2              RUNNING              0                   

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State         

B  POEBS_DB        ruabon1              Y          N               PARTIAL       
B  POEBS_DB        ruabon2              Y          N               OFFLINE       
B  POEBS_MQ        ruabon1              Y          N               PARTIAL       
B  POEBS_MQ        ruabon2              Y          N               OFFLINE       

-- RESOURCES FAILED
-- Group           Type                 Resource             System             

C  POEBS_DB        Volume               mozart01_vol         ruabon1            
C  POEBS_DB        Volume               mozart02vol          ruabon1            
C  POEBS_DB        Volume               oraappl_nfs_vol      ruabon1            
C  POEBS_DB        Volume               oraappl_vol          ruabon1            
C  POEBS_DB        Volume               oradata01_vol        ruabon1            
C  POEBS_DB        Volume               oraexport_vol        ruabon1            
C  POEBS_DB        Volume               oraimport_vol        ruabon1            
C  POEBS_DB        Volume               oralogs_vol          ruabon1            
C  POEBS_DB        Volume               oraredo01_vol        ruabon1            
C  POEBS_DB        Volume               oraredo02_vol        ruabon1            
C  POEBS_MQ        Application          poebs_mq             ruabon1            

-- RESOURCES OFFLINING
-- Group           Type            Resource             System               IState

F  POEBS_DB        Mount           oraappl_nfs_fs       ruabon1              W_OFFLINE_PATH

 

How to get rid of this situation in the cluster status output..Its not going offline.

I am looking for best solution apart from reboot.

 

Regards

  • When the "dgdisabled" flag is displayed like that for your diskgroup, it means that vxconfigd lost access to all enabled configuration copies for the disk group.  Despite the loss of access to the configuration copies, file systems remain enabled because no IO resulted in fatal errors.

    This is typical for a brief SAN outtage on the order of a few seconds, allowing file system IO to complete after retrying.

    The condition itself means that the disk group in-memory is no longer connected to the underlying disk group configuration copies.  VM will not allow any further modification to the disk group.  This condition is to allow you to sanely bring down file systems / applications.

    If you have a high confidence that your disks are all in a good state and no corruption has occurred, you can attempt to restart vxconfigd.  When vxconfigd starts back up, it will scan the underlying disks and if everything is clean and correct, it will reattach those devices to the disk group.

    NOTE however that this procedure will further degrade your environment if it fails.

    1. Freeze all Service Groups with VM resources:

    # hagrp -freeze <group> -sys <system>

    2. Restart vxconfigd

    # vxconfigd -k -x syslog

    3. Confirm resulting status:

    # vxdg list

    # vxdisk list

    4. Unfreeze Service Groups if DG is now corrected

    # hagrp -unfreeze <group> -sys <system>

     

    After correcting the disk group condition, you will need to look at your cluster configuration for the "oraappl_nfs_fs" resource and determine what its mount point and block device are.  From the block device you can determine disk group and volume name.  

    Umount the mount point and remount.  

    Verify that the volume is in a good state in vxprint -htg <diskgroup>.

     

    A node reboot could potentially correct all of this automatically as well.

5 Replies

  • What to do when "vxdisk list" shows status of 'online dgdisabled'
    http://www.symantec.com/business/support/index?page=content&id=TECH7407

    Per the above technote, you need to deport the diskgroup and reimport it to resolve the dgdisabled state.

    Offline the application(s) and unmount the filesystems using the volumes (you will probably have to do this manually - freeze the SG first to prevent it from failing over to the other node)

    Once the dg has been deported, verify all the disks are visible, manually import to check it imports OK:

    # vxdg -t import <diskgroup>

    #### should import without errors

    If this seems OK, deport again, clear the resource faults and probe the resources to ensure VCS sees them offline; then unfreeze the SG and online in VCS:

    # vxdg deport <dg>

    # hares -clear <resname> ### repeat for resources as needed

    # hares -probe <resname> -sys <system>

    # hastatus -summ

    # hagrp -unfreeze <SG>

    # hagrp -online <SG> -sys <system>

  •  

    Hi Lee,

     

    Thank You very much for the reply.

    Will the reboot of the node a simpler approach than this ? as there will be outage in both the cases

     

    Regards

    Sumit Gupta

  • If all the diskgroups on the node are in dgdisabled state, then technically you could reboot the node as this would deport the diskgroups on shutdown and then you could online the group to re-import them once the node rejoined the cluster.

    However, if there are still problems on reboot (eg: the system cannot see all the disks in the diskgroup) then the diskgroup will still not import and you will have to troubleshoot further.

    If you still want to use the reboot approach instead of just doing the deport/reimport, you will still need to freeze the service group to ensure it doesn't come up on the other node(s) while the original node is rebooting. Once the node is back up and in the cluster, unfreeze the group and online to reimport the dg.

    Prior to the reboot, it would also be a good idea to try to offline the applications before rebooting to at least try to bring them down cleanly (ie: you're basically doing the same steps anyway, you're just adding a reboot in the middle which isn't technically needed!)

    nb: moved discussion to Storage Foundation forum as the issue is SF-related (diskgroup level), rather than SFHA management.

  • When the "dgdisabled" flag is displayed like that for your diskgroup, it means that vxconfigd lost access to all enabled configuration copies for the disk group.  Despite the loss of access to the configuration copies, file systems remain enabled because no IO resulted in fatal errors.

    This is typical for a brief SAN outtage on the order of a few seconds, allowing file system IO to complete after retrying.

    The condition itself means that the disk group in-memory is no longer connected to the underlying disk group configuration copies.  VM will not allow any further modification to the disk group.  This condition is to allow you to sanely bring down file systems / applications.

    If you have a high confidence that your disks are all in a good state and no corruption has occurred, you can attempt to restart vxconfigd.  When vxconfigd starts back up, it will scan the underlying disks and if everything is clean and correct, it will reattach those devices to the disk group.

    NOTE however that this procedure will further degrade your environment if it fails.

    1. Freeze all Service Groups with VM resources:

    # hagrp -freeze <group> -sys <system>

    2. Restart vxconfigd

    # vxconfigd -k -x syslog

    3. Confirm resulting status:

    # vxdg list

    # vxdisk list

    4. Unfreeze Service Groups if DG is now corrected

    # hagrp -unfreeze <group> -sys <system>

     

    After correcting the disk group condition, you will need to look at your cluster configuration for the "oraappl_nfs_fs" resource and determine what its mount point and block device are.  From the block device you can determine disk group and volume name.  

    Umount the mount point and remount.  

    Verify that the volume is in a good state in vxprint -htg <diskgroup>.

     

    A node reboot could potentially correct all of this automatically as well.

  • One of Below can be applied.

    1. try a vxconfigd -k, which resolves the issue in rare cases.

    2. Deport and import the dg after unmounting filesystems. As advised by g_lee.

    3. If umount or deport is not possible, go for reboot.