SFHA Solutions 6.0.1 (Solaris): Troubleshooting the dgdisabled error flag
The dgdisabled error flag indicates that configuration changes on a disk group are disabled. This can occur due to any error that prevents further configuration changes on the disk group. For example, this can occur if no good disks are found during the disk group import operation, if no valid configuration copies are found on the disks in the disk group, or if writes to all configuration copies fail during an update to the disk group configuration. The dgdisabled error flag displays when the Veritas Volume Manager configuration daemon, vxconfigd loses access to all enabled configuration copies for the disk group. Configuration copies let you back up and restore all configuration data fordisk groups, and for objects such as volumes that are configured within the disk groups. Loss of access can occur if power is disrupted or a network cable is disconnected. To recover from loss of access, fix any disk connectivity issues, then deport and re-import the disk group. Beginning with the Storage Foundation and High Availability (SFHA) Solaris 6.0 release, a node can join the cluster even if there is a shared disk group that is in the DGDISABLED state. In earlier releases the node failed to join the cluster. For more information on troubleshooting the dgdisabled error flag, see: Removing the error state for simple or nopriv disks in non-boot disk groups vxdarestore(1m) 6.0.1 manual page: Solaris For more information on using the vxdisk list command to display status and troubleshoot disk errors, see the following Symantec Connect article: SFHA Solutions 6.0.1: Using the vxdisk list command to display status and to recover from errors onVeritas Volume Manager disks Veritas Storage Foundation and High Availability documentation for other releases and platforms can be found on theSORT website.356Views5likes0Commentsvxdisk list
dear, i am getting the output of vxdisk list as DEVICE TYPE DISK GROUP STATUS c0t1d0s2 auto:none - - online invalid kindly advice what to do to make the disk valid?.i had installed the veritas volume manager in solaris 10 which is installed on vmware thanks in advance regards ritchie james693Views2likes3CommentsSFHA Solutions 6.0.1: About GAB seeding and its role in VCS and other SFHA products
Group Membership and Atomic Broadcast (GAB) is a kernel component of Veritas Cluster Server (VCS) that provides globally-ordered messages that keep nodes synchronized. GAB maintains the cluster state information and the correct membership on the cluster. However, GAB needs another kernel component, Low Latency Transport (LLT), to send messages to the nodes and keep the cluster nodes connected. How GAB and LLT function together in a VCS cluster? VCS uses GAB and LLT to share data among nodes over private networks. LLT is the transport protocol responsible for fast kernel-to-kernel communications. GAB carries the state of the cluster and the cluster configuration to all the nodes on the cluster. These components provide the performance and reliability that VCS requires. In a cluster, nodes must share the groups, resources and the resource states. LLT and GAB help the nodes communicate. For information on LLT, GAB, and private networks, see: About LLT and GAB About network channels for heartbeating GAB seeding The GAB seeding function ensures that a new cluster starts with an accurate membership count of the number of nodes in the cluster. It prevents your cluster from a preexisting network partition upon initial start-up. A preexisting network partition refers to the failure in the communication channels that occurs while the nodes are down and VCS cannot respond. When the nodes start, GAB seeding reduces the vulnerability to network partitioning, regardless of the cause of the failure. GAB services are used by all Veritas Storage Foundation and High Availability (SFHA) products. For information about preexisting network partitions, and how seeding functions in VCS, see: About preexisting network partitions About VCS seeding Enabling automatic seeding of GAB If I/O fencing is configured in the enabled mode, you can edit the /etc/vxfenmode file to enable automatic seeding of GAB. If the cluster is stuck with a preexisting split-brain condition, I/O fencing allows automatic seeding of GAB. You can set the minimum number of nodes to form a cluster for GAB to seed by configuring the Control port seed and Quorum flag parameters in the /etc/gabtab file. Quorum is the number of nodes that need to join a cluster for GAB to complete seeding. For information on configuring the autoseed_gab_timeout parameter in the /etc/vxfenmode file, see: About I/O fencing configuration files For information on configuring the control port seed and the Quorum flag parameters in GAB, see: About GAB run-time or dynamic tunable parameters For information on split-brain conditions, see: About the Steward process: Split-brain in two-cluster global clusters How I/O fencing works in different event scenarios Example of a preexisting network partition (split-brain) Role of GAB seeding in cluster membership For information on how the nodes gain cluster membership, seeding a cluster, and manual seeding of a cluster, see: About cluster membership Initial joining of systems to cluster membership Seeding a new cluster Seeding a cluster using the GAB auto-seed parameter through I/O fencing Manual seeding of a cluster Troubleshooting issues that are related to GAB seeding and preexisting network partitions For information on the issues that you may encounter when GAB seeds a cluster and preexisting network partitions, see: Examining GAB seed membership Manual GAB membership seeding Waiting for cluster membership after VCS start-up Summary of best practices for cluster communications System panics to prevent potential data corruption Fencing startup reports preexisting split-brain Clearing preexisting split-brain condition Recovering from a preexisting network partition (split-brain) Example Scenario I – Recovering from a preexisting network partition Example Scenario II – Recovering from a preexisting network partition Example Scenario III – Recovering from a preexisting network partition gabconfig (1M) 6.0.1 manual pages: AIX Solaris For more information on seeding clusters to prevent preexisting network partitions, see: Veritas Cluster Server Administrator's Guide Veritas Cluster Server Installation Guide Veritas Cluster Server documentation for other releases and platforms can be found on the SORT website.750Views2likes0CommentsService group shows online and cluster service offline
Hi TEam, I have an output of the hastatus which is attached, It shows ClusterService online in 35th system and rest of the system shows Clusterservice Offline however, Service group POWERCENTERSERVICEMANAGER shows online in all systems 35,36,37,38,39,40. I am unable to get it why the Service groupshows online in all the other systems when only 35th system ClusterService is online and rest others are offline. Do that the scenario of Active-Active Cluster? Please help me to understand this scenario.Solved2.2KViews1like2Commentsmnt_app resource failover
Hi, Below is the resource dependecies in my enviroment, what happened is somebody unmount the /app filesystem.So, resource went on faulted state, when i checked that resource criticality it shows mnt_app as non-critical resource 0 and vol_app and app_dg is set as critical 1 The depediencies below shows parent child relationship (mnt_app) as parent and (vol_app)as child. If parent is set as critical and child is non-critical 0, do it failover to another node. Or If child is set as critical or parent as non-critical 0, then it failover? Please assist as soon as possible. root@lyle# hares -dep |grep app PDM_PRD_MG APP_aphelion mnt_app PDM_PRD_MG APP_tibjmsd mnt_app PDM_PRD_MG Blind_check_stopDB mnt_app PDM_PRD_MG IPMultB_pdmprdappdb MNicB_DB PDM_PRD_MG appdg SRDF_app PDM_PRD_MG mnt_app vol_app PDM_PRD_MG vol_activelogs appdg PDM_PRD_MG vol_app appdg PDM_PRD_MG vol_archivelogs appdg PDM_PRD_MG vol_index appdgSolved1.7KViews1like5Commentsnode freeze v/s service group freeze
Hi Team, I came across a DIMM replacement activity in one of our solars servers which are in cluster. Please assist before proceeding for teh activity shall i have to freeze teh service group OR DO i need to freeze the node? ALso, kindly guide the scenario, when we should have toproceed for freezing the SG or freezing the node? Thanks..Solved6.8KViews1like1CommentOracle Data Base Replication with VVR under SFCFSHA/DR
Hi All; we are looking for whether VVR can use for the oracle database replication instead of oracle data guard solution. If it is used, do you know veritas gives support for any problem faced. even VVR keeps the write order fidelity, it is not certain the database integrity will ve preserved at the disaster site. do you have any best practices and white papers, experience, anything you suggest for this deploymeny?1.1KViews1like4CommentsPlan for DIMM replacement activity(VCS nodes)
Hi Team, We have to replace DIMM on the passive node in which VCS services is currently not running.The cross over LLTcables is badly hanged wih each other and Symantec engineer told us that he will manage the cable issue and no require to down the active node. Kindly guide step by step procedure for this activity,also please suggest the prerequisities before starting this activity. This is very very crucial activity as Class A application is running on the active node. Currently, SG is running in Sydneyserver.We have to perform activity on Madagascar server. -- SYSTEM STATE -- System State Frozen A Sydney RUNNING 0 A Madagascar RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService Sydney Y N ONLINE B ClusterService Madagascar Y N OFFLINE B ORA_SG_Group Sydney Y N ONLINE B ORA_SG_Group Madagascar Y N OFFLINE Kindly suggest as soon as possible. Thanks in advance.. AllaboutunixSolved2.1KViews1like6CommentsService group concurrency violation
Hi Team, We have alerts ofconcurrency violation, we have two servers in cluster mapibm625, mapibm626 Logs are, 2014/12/26 19:37:03 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS) 2014/12/26 19:37:03 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sapgtsprd 2014/12/26 19:37:03 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group sapgtsprd on all nodes 2014/12/26 19:37:04 VCS WARNING V-16-6-15034 (mapibm625) violation:Offlining group sapgtsprd on system mapibm625 2014/12/26 19:37:04 VCS INFO V-16-1-50135 User root fired command: hagrp -offline sapgtsprd mapibm625 from localhost 2014/12/26 19:37:04 VCS NOTICE V-16-1-10167 Initiating manual offline of group sapgtsprd on system mapibm625 2014/12/26 19:37:04 VCS NOTICE V-16-1-10300 Initiating Offline of Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) on System mapibm625 2014/12/26 19:37:04 VCS INFO V-16-6-15002 (mapibm625) hatrigger:hatrigger executed /opt/VRTSvcs/bin/internal_triggers/violation mapibm625 sapgtsprd successfully 2014/12/26 19:37:04 VCS INFO V-16-10011-306 (mapibm625) Application:App_saposcol:offline:Execution of Stop Program (/opt/VRTSvcs/bin/Saposcol/offline) returned (0). 2014/12/26 19:37:05 VCS INFO V-16-2-13716 (mapibm625) Resource(App_saposcol): Output of the completed operation (offline) ============================================== 2014/12/26 19:37:06 VCS INFO V-16-1-10305 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is offline on mapibm625 (VCS initiated) 2014/12/26 19:37:06 VCS NOTICE V-16-1-10446 Group sapgtsprd is offline on system mapibm625 ======================================================================================== I have asked the application team to look out as whether they are working on the servers because the resource is of SAP(Resource App_saposcol) However, application team has replied that they are not working on it and might theApp_saposcol is online on both of servers which causes the issue. Then, I have checked the status of resources in both the servers and it says, [root@mapibm626]: # hares -state #Resource Attribute System Value App_saposcol State mapibm625 OFFLINE App_saposcol State mapibm626 ONLINE [root@mapibm625]: # hares -state #Resource Attribute System Value App_saposcol State mapibm625 OFFLINE App_saposcol State mapibm626 ONLINE and also checked the current logs of the server however found only, 2014/12/27 13:03:42 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/27 17:03:43 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/27 21:03:44 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/28 01:03:45 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/28 05:03:46 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/28 09:03:47 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/28 10:56:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61% 2014/12/28 11:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61% 2014/12/28 13:03:48 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/28 14:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 60% 2014/12/28 17:03:49 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/28 21:03:50 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/29 01:03:51 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/29 05:03:52 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/29 09:03:53 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/12/29 13:03:55 VCS INFO V-16-1-53504 VCS Engine Alive message!! ========================================================================== Please assist what could be the possible reasons for this and in future how to avoid this? Thanks, AllaboutunixSolved2.6KViews1like7Comments