Forum Discussion

unkn0wnn's avatar
unkn0wnn
Level 4
2 years ago

Unable to mount file system after storage went offline

Hi there, I hope that someone can help me.

After storage shelf went off line I am no longer able to use pdvol / advol, upon trying to mount i get the below error, I believe there are some commands to be run in order to bring it back to working state.

I've tried some commands and to me storage is visible by appliance.

Any help is much appreciated - thanks for understanding.

 


n5220uk:/home/maintenance # mount -F /dev/vx/dsk/nbuapp/advol
UX:vxfs mount.vxfs: ERROR: V-3-20003: Cannot open /dev/vx/dsk/nbuapp/advol: No such device or address
UX:vxfs mount.vxfs: ERROR: V-3-24996: Unable to get disk layout version
n5220uk:/home/maintenance # mount -F /dev/vx/dsk/nbuapp/pdvol
UX:vxfs mount.vxfs: ERROR: V-3-20003: Cannot open /dev/vx/dsk/nbuapp/pdvol: No such device or address
UX:vxfs mount.vxfs: ERROR: V-3-24996: Unable to get disk layout version

n5220uk:/home/maintenance # vxdisk list
DEVICE TYPE DISK GROUP STATUS
disk_1 auto:sliced disk_1 nbuapp online
sda auto:none - - online invalid
- - disk_2 nbuapp failed was:disk_2


n5220uk:/home/maintenance # vxprint -rt
Disk group: nbuapp

DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE

dg nbuapp default default 12000 1357919097.7.nb-appliance

dm disk_1 disk_1 auto 65535 9755774656 -
dm disk_2 - - - - NODEVICE

v advol - DISABLED ACTIVE 49392123904 SELECT - fsgen
pl advol-01 advol DISABLED NODEVICE 49392123904 CONCAT - RW
sd disk_1-02 advol-01 disk_1 2097152 1560281088 0 disk_1 ENA
sd disk_2-01 advol-01 disk_2 0 41389391872 1560281088 - NDEV
sd disk_2-03 advol-01 disk_2 43734987072 6442450944 42949672960 - NDEV

v catvol - ENABLED ACTIVE 1951154176 SELECT - fsgen
pl catvol-01 catvol ENABLED ACTIVE 1951154176 CONCAT - RW
sd disk_1-01 catvol-01 disk_1 0 2097152 0 disk_1 ENA
sd disk_1-03 catvol-01 disk_1 1562378240 1949057024 2097152 disk_1 ENA

v pdvol - DISABLED ACTIVE 8589934592 SELECT - fsgen
pl pdvol-01 pdvol DISABLED NODEVICE 8589934592 CONCAT - RW
sd disk_1-04 pdvol-01 disk_1 3511435264 6244339392 0 disk_1 ENA
sd disk_2-02 pdvol-01 disk_2 41389391872 2345595200 6244339392 - NDEV

 

20 Replies

  • Apr 1 03:18:18 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974525952: Uncorrectable write error
    Apr 1 03:18:18 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526080: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526208: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526336: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526464: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526592: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526720: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526848: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526976: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527104: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527232: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527360: Uncorrectable write error
    Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527488: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527616: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527744: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527872: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974528000: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307350528: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125709568: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125709696: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307350656: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307350784: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307350912: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125709824: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125709952: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351040: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351168: Uncorrectable write error
    Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710080: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710208: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351296: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351424: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710336: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710464: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351552: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351680: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710592: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710720: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351808: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351936: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710848: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307352064: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307352192: Uncorrectable write error
    Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710976: Uncorrectable write error
    Apr 1 03:18:23 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125711104: Uncorrectable write error
    Apr 1 03:18:23 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125711232: Uncorrectable write error
    Apr 1 03:18:23 n5220uk kernel: Synchronizing SCSI cache for disk sdb:
    Apr 1 05:16:50 n5220uk CLISH[14082]: User admin executed Disk
    Apr 1 08:35:47 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
    Apr 1 08:45:38 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
    Apr 2 11:47:57 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
    Apr 2 11:57:47 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
    Apr 2 19:32:26 n5220uk sudo: maintenance : TTY=pts/0 ; PWD=/ ; USER=root ; COMMAND=/bin/mount /disk
    Apr 3 12:48:13 n5220uk CLISH[18071]: User admin executed Disk
    Apr 4 12:16:55 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
    Apr 4 12:22:26 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
    Apr 4 13:13:23 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
    Apr 4 13:19:58 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk

  • # vxplex -g <diskgroup> det <problem-plex>
    # vxplex -g <diskgroup> att <volume> <problem-plex>

    please advise

  • Hi unkn0wnn 

    Base on the first lot of output, one of the RAID volumes has failed - how badly is unknown. 

    First you have this in the vxdisk list output: 

    n5220uk:/home/maintenance # vxdisk list
    DEVICE TYPE DISK GROUP STATUS
    disk_1 auto:sliced disk_1 nbuapp online
    sda auto:none - - online invalid
    - - disk_2 nbuapp failed was:disk_2

    Indicating a missing vxdisk that is an important part of the nbuapp volume group.  

    Then further down in the vxprint output you have this:

    dm disk_2 - - - - NODEVICE

     Again indicating the OS cannot find the device assosicated with disk_2.

    You need to figure out why the second disk isn't accessible anymore - the kernel messages also indicate a problem with this disk volume. 

    There are many possibilities about whythis may be - power cycling the storage may help (it may not also). I'd suggest opeining a support case if this is possible to assist in the recovery (but if the appliance is a 5220 as the name suggests this isn't going to happen). 

    You could try using the command "vxvol -g nbuapp -f start pdvol" to see if you can start the volume - but I ssupect the underlying failure of the disk_2 volume will prevent this from succeeding. 

    To investigate the RAID volumes try running these commands and see if you can identify the issue:

    1. Run these commands to verify the RAID controller is able to see the Disk on Expansion shelf.

    1. Run these commands to verify the RAID controller is able to see the Disk on Expansion shelf.

    # /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aAll
    # /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -i "slot number"
    # /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -aall
    # /opt/MegaRAID/MegaCli/MegaCli64 -Cfgdsply -a0 | grep "RAID Level\|State\|Number Of Drives\|Slot Number\|Firmware state"
    # /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -i " Enclosure Device\|slot number\|firmware state\|foreign"

    Paste the output if you need help understanding the output. 

    Good luck
    David
     

    • unkn0wnn's avatar
      unkn0wnn
      Level 4

      Thank you so far David, this is what I am getting:

      n5220uk:/home/maintenance # vxvol -g nbuapp -f start pdvol
      VxVM vxvol ERROR V-5-1-1201 Volume pdvol has no associated data plexes


      n5220uk:/home/maintenance # /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aAll

      Number of enclosures on adapter 0 -- 2

      Enclosure 0:
      Device ID : 252
      Number of Slots : 8
      Number of Power Supplies : 0
      Number of Fans : 0
      Number of Temperature Sensors : 0
      Number of Alarms : 0
      Number of SIM Modules : 1
      Number of Physical Drives : 0
      Status : Normal
      Position : Unavailable
      Connector Name : Unavailable
      Partner Device Id : 65535

      Inquiry data :
      Vendor Identification : LSI
      Product Identification : SGPIO
      Product Revision Level : N/A
      Vendor Specific :

      Enclosure 1:
      Device ID : 24
      Number of Slots : 16
      Number of Power Supplies : 2
      Number of Fans : 4
      Number of Temperature Sensors : 10
      Number of Alarms : 0
      Number of SIM Modules : 2
      Number of Physical Drives : 16
      Status : Normal
      Position : 1
      Connector Name : Port B
      Partner Device Id : 65535

      Inquiry data :
      Vendor Identification : Promise
      Product Identification : J630s
      Product Revision Level : 060=
      Vendor Specific : TB002B103176 0000

      Number of Voltage Sensors :6

      Voltage Sensor :0
      Voltage Sensor Status :OK
      Voltage Value :1170 milli volts

      Voltage Sensor :1
      Voltage Sensor Status :OK
      Voltage Value :980 milli volts

      Voltage Sensor :2
      Voltage Sensor Status :OK
      Voltage Value :3220 milli volts

      Voltage Sensor :3
      Voltage Sensor Status :OK
      Voltage Value :1170 milli volts

      Voltage Sensor :4
      Voltage Sensor Status :OK
      Voltage Value :970 milli volts

      Voltage Sensor :5
      Voltage Sensor Status :OK
      Voltage Value :3220 milli volts

      Number of enclosures on adapter 1 -- 1

      Enclosure 0:
      Device ID : 0
      Number of Slots : 8
      Number of Power Supplies : 2
      Number of Fans : 0
      Number of Temperature Sensors : 1
      Number of Alarms : 0
      Number of SIM Modules : 0
      Number of Physical Drives : 8
      Status : Normal
      Position : Unavailable
      Connector Name : Unavailable
      Partner Device Id : 65535

      Inquiry data :
      Vendor Identification : ESG-SHV.
      Product Identification : SCA HSBP M9.....
      Product Revision Level : 2.17
      Vendor Specific :

      Number of enclosures on adapter 2 -- 0


      Exit Code: 0x00

      /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -i "slot number":

      Slot Number: 1
      Slot Number: 2
      Slot Number: 3
      Slot Number: 4
      Slot Number: 5
      Slot Number: 6
      Slot Number: 7
      Slot Number: 8
      Slot Number: 9
      Slot Number: 10
      Slot Number: 11
      Slot Number: 12
      Slot Number: 13
      Slot Number: 14
      Slot Number: 15
      Slot Number: 16

      n5220uk:/home/maintenance # /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -aall

      Adapter #0

      ==============================================================================
      Versions
      ================
      Product Name : Intel (R) RAID Controller RS2PI008
      Serial No : SV21504201
      FW Package Build: 12.12.0-0048

      Mfg. Data
      ================
      Mfg. Date : 04/09/12
      Rework Date : 00/00/00
      Revision No : 59A
      Battery FRU : N/A

      Image Versions in Flash:
      ================
      FW Version : 2.120.63-1242
      BIOS Version : 3.22.00_4.11.05.00_0x05020000
      Preboot CLI Version: 04.04-017:#%00008
      WebBIOS Version : 6.0-34-e_29-Rel
      NVDATA Version : 2.09.03-0013
      Boot Block Version : 2.02.00.00-0000
      BOOT Version : 09.250.01.219

      Pending Images in Flash
      ================
      None

      PCI Info
      ================
      Vendor Id : 1000
      Device Id : 0079
      SubVendorId : 8086
      SubDeviceId : 9280

      Host Interface : PCIE

      Number of Frontend Port: 0
      Device Interface : PCIE

      Number of Backend Port: 8
      Port : Address
      0 500015554e75723f
      1 0000000000000000
      2 0000000000000000
      3 0000000000000000
      4 0000000000000000
      5 0000000000000000
      6 0000000000000000
      7 0000000000000000

      HW Configuration
      ================
      SAS Address : 500605b00493c1e0
      BBU : Present
      Alarm : Present
      NVRAM : Present
      Serial Debugger : Present
      Memory : Present
      Flash : Present
      Memory Size : 512MB
      TPM : Absent
      On board Expander: Absent
      Upgrade Key : Absent

      Settings
      ================
      Current Time : 23:20:19 4/4, 2023
      Predictive Fail Poll Interval : 300sec
      Interrupt Throttle Active Count : 16
      Interrupt Throttle Completion : 50us
      Rebuild Rate : 30%
      PR Rate : 30%
      BGI Rate : 30%
      Check Consistency Rate : 30%
      Reconstruction Rate : 30%
      Cache Flush Interval : 4s
      Max Drives to Spinup at One Time : 2
      Delay Among Spinup Groups : 2s
      Physical Drive Coercion Mode : 1GB
      Cluster Mode : Disabled
      Alarm : Enabled
      Auto Rebuild : Enabled
      Battery Warning : Enabled
      Ecc Bucket Size : 15
      Ecc Bucket Leak Rate : 1440 Minutes
      Restore HotSpare on Insertion : Enabled
      Expose Enclosure Devices : Disabled
      Maintain PD Fail History : Disabled
      Host Request Reordering : Enabled
      Auto Detect BackPlane Enabled : SGPIO/i2c SEP
      Load Balance Mode : Auto
      Use FDE Only : No
      Security Key Assigned : No
      Security Key Failed : No
      Security Key Not Backedup : No
      Any Offline VD Cache Preserved : No
      Allow Boot with Preserved Cache : No
      Disable Online Controller Reset : No
      PFK in NVRAM : No
      Use disk activity for locate : No

      Looking forward to hear more from you guys...