cancel
Showing results for 
Search instead for 
Did you mean: 

All disks became error

Weny
Level 3


Hi,

Here's the situation:
Two nodes share the same luns, one system is Sol 10 with SFCFSHA 6.0.
Some dg/vol created and configured to 6.0 cluster.
Install another node to RHEL 6 with SFHA 6.2, after then, all the disks became error in Sol 10 node.

Sol 10 + SFCFSHA 6.0 :
[root@sol25 /]#vxdisk -oalldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
disk_0       auto:none       -            -            online invalid
emc_clariion0_1 auto            -            -            error
emc_clariion0_2 auto            -            -            error
emc_clariion0_3 auto            -            -            error
emc_clariion0_4 auto            -            -            error
emc_clariion0_5 auto            -            -            error
emc_clariion0_6 auto            -            -            error

 

RHEL 6 + SFHA 6.2 :
[root@rhel26 ~]# vxdisk -oalldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
emc_clariion0_1 auto:none       -            -            online invalid
emc_clariion0_2 auto:none       -            -            online invalid
emc_clariion0_3 auto:none       -            -            online invalid
emc_clariion0_4 auto:none       -            -            online invalid
emc_clariion0_5 auto:none       -            -            online invalid
emc_clariion0_6 auto:none       -            -            online invalid

How can I get the DGs back on Sol system ? Thanks in advance !

1 ACCEPTED SOLUTION

Accepted Solutions

Weny
Level 3

Thanks all !!

The issue is resovled.

The steps I met the issue:

  1. 2 Solaris nodes in a 6.0 SFCFS cluster - sol25 and sol26
  2. Took down 1 of the solaris nodes - sol26
  3. Left DG imported on remaining solars node sol25 
  4. Reinstalled rhel26 as RHEL 6 - Didn't check DG status on sol25
  5. Installed SFHA 6.2, NON CFS - Checked vxdisk -oalldgs list, all disks were online, but not seen deported DG on shared disk. No other operations.
  6. Ran vxdisk -oalldgs list on sol25 - all disks were error

On sol25:

 

# dd if=/dev/rdsk/c1t5d0s2 of=/c1t5d0s2.ddoutput bs=512 count=100000
dd: /dev/rdsk/c1t5d0s2: open: I/O error

 

 

The steps to resovle the issue:

  1. format -> select the disk -> fdisk
  2. check disks status - online
  3. /etc/vx/cbr - vxconfigrestore DG

 

 

View solution in original post

11 REPLIES 11

Weny
Level 3

#cat /etc/release
                    Oracle Solaris 10 9/10 s10x_u9wos_14a X86
     Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
                            Assembled 11 August 2010

And before re-install, the two nodes are in one cluster.

rsharma1
Level 5
Employee Accredited Certified

Are you seeing the luns alright at the OS level on Solarisc currently?  (cfgadm -al)

Can you clarify more on "before re-install, the two nodes are in one cluster." what was the OS in this 2-node cluster initially and what was the SF version initially on the two nodes?

Weny
Level 3

#cfgadm -al Ap_Id Type Receptacle Occupant Condition c1 scsi-bus connected configured unknown c1::dsk/c1t0d0 disk connected configured unknown c1::dsk/c1t1d0 disk connected configured unknown c1::dsk/c1t2d0 disk connected configured unknown c1::dsk/c1t3d0 disk connected configured unknown c1::dsk/c1t4d0 disk connected configured unknown c1::dsk/c1t5d0 disk connected configured unknown c1::dsk/c1t6d0 disk connected configured unknown c1::dsk/c1t8d0 disk connected configured unknown The 2 nodes cluster initially installed with Sol 10 + SFCFSHA 60 and shared DG/vol created

Weny
Level 3
At OS level, those luns are connected. #cfgadm -al Ap_Id Type Receptacle Occupant Condition c1 scsi-bus connected configured unknown c1::dsk/c1t0d0 disk connected configured unknown c1::dsk/c1t1d0 disk connected configured unknown c1::dsk/c1t2d0 disk connected configured unknown c1::dsk/c1t3d0 disk connected configured unknown c1::dsk/c1t4d0 disk connected configured unknown c1::dsk/c1t5d0 disk connected configured unknown c1::dsk/c1t6d0 disk connected configured unknown c1::dsk/c1t8d0 disk connected configured unknown The 2 nodes are initially installed with Sol 10 + SFCFSHA 60 and shared dg/vol is created.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I see the following in your opening post:

Install another node to RHEL 6 with SFHA 6.2 , after then, all the disks became error in Sol 10 node.

You cannot cluster nodes or share disks between servers that are different OS.

Was anything done at OS level on RHEL server to the luns?
I am scared that they may have been re-labelled or something to that effect...

Have you checked output of 'vxdisk -o alldgs list' on RHEL server?

Weny
Level 3

I'm not going to cluster the two nodes as one cluster.

I thought that re-install node2 should be considered as node2 was down in the CFS cluster, so just reinstalled it to RHEL without disconnect the shared luns.

On RHEL, the disks look fine, but DG in deported status should be seen, which is not.

[root@rhel26 ~]# vxdisk -oalldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
emc_clariion0_1 auto:none       -            -            online invalid
emc_clariion0_2 auto:none       -            -            online invalid
emc_clariion0_3 auto:none       -            -            online invalid
emc_clariion0_4 auto:none       -            -            online invalid
emc_clariion0_5 auto:none       -            -            online invalid
emc_clariion0_6 auto:none       -            -            online invalid

mikebounds
Level 6
Partner Accredited

Can you give more details about what you did - seems you did something like:

  1. You originally had 2 Solaris nodes in a 6.0 SFCFS cluster - sol25 and sol26
  2. You took down 1 of the solaris nodes - sol26
  3. You left DG imported on remaining solars node sol25 or did you deport
  4. You reinstalled sol25 as rhel26 as RHEL 6 with SFHA 6.2, NON CFS
  5. Did you run any "vx" commands on rhel26
  6. Do you know when disks went to error state on Solaris and was DG imported at the time

Mike

 

mikebounds
Level 6
Partner Accredited

Also check you can read disks on both system - first a raw read, and then the private region of disk - example for Solaris:

dd if=/dev/rdsk/emc_clariion0_1 count=8 of=/dev/null

/etc/vx/diag.d/vxprivutil list /dev/rdsk/emc_clariion0_1 

 

Do the same on Linux using Linux device path to disk.

Mike

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
".... so just reinstalled it to RHEL without disconnect the shared luns." My concern is about the disks that were probably seen by the Linux server during OS install. I don't know Linux well enough to know if it would want to put Linux partition table on all disks that it can see during OS installation.

Weny
Level 3

Thanks all !!

The issue is resovled.

The steps I met the issue:

  1. 2 Solaris nodes in a 6.0 SFCFS cluster - sol25 and sol26
  2. Took down 1 of the solaris nodes - sol26
  3. Left DG imported on remaining solars node sol25 
  4. Reinstalled rhel26 as RHEL 6 - Didn't check DG status on sol25
  5. Installed SFHA 6.2, NON CFS - Checked vxdisk -oalldgs list, all disks were online, but not seen deported DG on shared disk. No other operations.
  6. Ran vxdisk -oalldgs list on sol25 - all disks were error

On sol25:

 

# dd if=/dev/rdsk/c1t5d0s2 of=/c1t5d0s2.ddoutput bs=512 count=100000
dd: /dev/rdsk/c1t5d0s2: open: I/O error

 

 

The steps to resovle the issue:

  1. format -> select the disk -> fdisk
  2. check disks status - online
  3. /etc/vx/cbr - vxconfigrestore DG

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Glad vxconfigrestore fixed your issue! Lesson learned : Disconnect VxVM disks during OS installation!