cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to mount file system after storage went offline

unkn0wnn
Level 4

Hi there, I hope that someone can help me.

After storage shelf went off line I am no longer able to use pdvol / advol, upon trying to mount i get the below error, I believe there are some commands to be run in order to bring it back to working state.

I've tried some commands and to me storage is visible by appliance.

Any help is much appreciated - thanks for understanding.

 


n5220uk:/home/maintenance # mount -F /dev/vx/dsk/nbuapp/advol
UX:vxfs mount.vxfs: ERROR: V-3-20003: Cannot open /dev/vx/dsk/nbuapp/advol: No such device or address
UX:vxfs mount.vxfs: ERROR: V-3-24996: Unable to get disk layout version
n5220uk:/home/maintenance # mount -F /dev/vx/dsk/nbuapp/pdvol
UX:vxfs mount.vxfs: ERROR: V-3-20003: Cannot open /dev/vx/dsk/nbuapp/pdvol: No such device or address
UX:vxfs mount.vxfs: ERROR: V-3-24996: Unable to get disk layout version

n5220uk:/home/maintenance # vxdisk list
DEVICE TYPE DISK GROUP STATUS
disk_1 auto:sliced disk_1 nbuapp online
sda auto:none - - online invalid
- - disk_2 nbuapp failed was:disk_2


n5220uk:/home/maintenance # vxprint -rt
Disk group: nbuapp

DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE

dg nbuapp default default 12000 1357919097.7.nb-appliance

dm disk_1 disk_1 auto 65535 9755774656 -
dm disk_2 - - - - NODEVICE

v advol - DISABLED ACTIVE 49392123904 SELECT - fsgen
pl advol-01 advol DISABLED NODEVICE 49392123904 CONCAT - RW
sd disk_1-02 advol-01 disk_1 2097152 1560281088 0 disk_1 ENA
sd disk_2-01 advol-01 disk_2 0 41389391872 1560281088 - NDEV
sd disk_2-03 advol-01 disk_2 43734987072 6442450944 42949672960 - NDEV

v catvol - ENABLED ACTIVE 1951154176 SELECT - fsgen
pl catvol-01 catvol ENABLED ACTIVE 1951154176 CONCAT - RW
sd disk_1-01 catvol-01 disk_1 0 2097152 0 disk_1 ENA
sd disk_1-03 catvol-01 disk_1 1562378240 1949057024 2097152 disk_1 ENA

v pdvol - DISABLED ACTIVE 8589934592 SELECT - fsgen
pl pdvol-01 pdvol DISABLED NODEVICE 8589934592 CONCAT - RW
sd disk_1-04 pdvol-01 disk_1 3511435264 6244339392 0 disk_1 ENA
sd disk_2-02 pdvol-01 disk_2 41389391872 2345595200 6244339392 - NDEV

 

20 REPLIES 20

unkn0wnn
Level 4

Apr 1 03:18:18 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974525952: Uncorrectable write error
Apr 1 03:18:18 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526080: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526208: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526336: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526464: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526592: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526720: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526848: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974526976: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527104: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527232: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527360: Uncorrectable write error
Apr 1 03:18:20 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527488: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527616: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527744: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974527872: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 12974528000: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307350528: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125709568: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125709696: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307350656: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307350784: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307350912: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125709824: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125709952: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351040: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351168: Uncorrectable write error
Apr 1 03:18:21 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710080: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710208: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351296: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351424: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710336: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710464: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351552: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351680: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710592: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710720: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351808: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307351936: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710848: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307352064: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 18307352192: Uncorrectable write error
Apr 1 03:18:22 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125710976: Uncorrectable write error
Apr 1 03:18:23 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125711104: Uncorrectable write error
Apr 1 03:18:23 n5220uk kernel: VxVM vxio V-5-0-1266 Subdisk disk_2-01 block 22125711232: Uncorrectable write error
Apr 1 03:18:23 n5220uk kernel: Synchronizing SCSI cache for disk sdb:
Apr 1 05:16:50 n5220uk CLISH[14082]: User admin executed Disk
Apr 1 08:35:47 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
Apr 1 08:45:38 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
Apr 2 11:47:57 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
Apr 2 11:57:47 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
Apr 2 19:32:26 n5220uk sudo: maintenance : TTY=pts/0 ; PWD=/ ; USER=root ; COMMAND=/bin/mount /disk
Apr 3 12:48:13 n5220uk CLISH[18071]: User admin executed Disk
Apr 4 12:16:55 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
Apr 4 12:22:26 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
Apr 4 13:13:23 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
Apr 4 13:19:58 n5220uk kernel: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk

unkn0wnn
Level 4

would vxvol start help?

unkn0wnn
Level 4

# vxplex -g <diskgroup> det <problem-plex>
# vxplex -g <diskgroup> att <volume> <problem-plex>

please advise

davidmoline
Level 6
Employee

Hi @unkn0wnn 

Base on the first lot of output, one of the RAID volumes has failed - how badly is unknown. 

First you have this in the vxdisk list output: 

n5220uk:/home/maintenance # vxdisk list
DEVICE TYPE DISK GROUP STATUS
disk_1 auto:sliced disk_1 nbuapp online
sda auto:none - - online invalid
- - disk_2 nbuapp failed was:disk_2

Indicating a missing vxdisk that is an important part of the nbuapp volume group.  

Then further down in the vxprint output you have this:

dm disk_2 - - - - NODEVICE

 Again indicating the OS cannot find the device assosicated with disk_2.

You need to figure out why the second disk isn't accessible anymore - the kernel messages also indicate a problem with this disk volume. 

There are many possibilities about whythis may be - power cycling the storage may help (it may not also). I'd suggest opeining a support case if this is possible to assist in the recovery (but if the appliance is a 5220 as the name suggests this isn't going to happen). 

You could try using the command "vxvol -g nbuapp -f start pdvol" to see if you can start the volume - but I ssupect the underlying failure of the disk_2 volume will prevent this from succeeding. 

To investigate the RAID volumes try running these commands and see if you can identify the issue:

1. Run these commands to verify the RAID controller is able to see the Disk on Expansion shelf.

1. Run these commands to verify the RAID controller is able to see the Disk on Expansion shelf.

# /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aAll
# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -i "slot number"
# /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -aall
# /opt/MegaRAID/MegaCli/MegaCli64 -Cfgdsply -a0 | grep "RAID Level\|State\|Number Of Drives\|Slot Number\|Firmware state"
# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -i " Enclosure Device\|slot number\|firmware state\|foreign"

Paste the output if you need help understanding the output. 

Good luck
David
 

Thank you so far David, this is what I am getting:

n5220uk:/home/maintenance # vxvol -g nbuapp -f start pdvol
VxVM vxvol ERROR V-5-1-1201 Volume pdvol has no associated data plexes


n5220uk:/home/maintenance # /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aAll

Number of enclosures on adapter 0 -- 2

Enclosure 0:
Device ID : 252
Number of Slots : 8
Number of Power Supplies : 0
Number of Fans : 0
Number of Temperature Sensors : 0
Number of Alarms : 0
Number of SIM Modules : 1
Number of Physical Drives : 0
Status : Normal
Position : Unavailable
Connector Name : Unavailable
Partner Device Id : 65535

Inquiry data :
Vendor Identification : LSI
Product Identification : SGPIO
Product Revision Level : N/A
Vendor Specific :

Enclosure 1:
Device ID : 24
Number of Slots : 16
Number of Power Supplies : 2
Number of Fans : 4
Number of Temperature Sensors : 10
Number of Alarms : 0
Number of SIM Modules : 2
Number of Physical Drives : 16
Status : Normal
Position : 1
Connector Name : Port B
Partner Device Id : 65535

Inquiry data :
Vendor Identification : Promise
Product Identification : J630s
Product Revision Level : 060=
Vendor Specific : TB002B103176 0000

Number of Voltage Sensors :6

Voltage Sensor :0
Voltage Sensor Status :OK
Voltage Value :1170 milli volts

Voltage Sensor :1
Voltage Sensor Status :OK
Voltage Value :980 milli volts

Voltage Sensor :2
Voltage Sensor Status :OK
Voltage Value :3220 milli volts

Voltage Sensor :3
Voltage Sensor Status :OK
Voltage Value :1170 milli volts

Voltage Sensor :4
Voltage Sensor Status :OK
Voltage Value :970 milli volts

Voltage Sensor :5
Voltage Sensor Status :OK
Voltage Value :3220 milli volts

Number of enclosures on adapter 1 -- 1

Enclosure 0:
Device ID : 0
Number of Slots : 8
Number of Power Supplies : 2
Number of Fans : 0
Number of Temperature Sensors : 1
Number of Alarms : 0
Number of SIM Modules : 0
Number of Physical Drives : 8
Status : Normal
Position : Unavailable
Connector Name : Unavailable
Partner Device Id : 65535

Inquiry data :
Vendor Identification : ESG-SHV.
Product Identification : SCA HSBP M9.....
Product Revision Level : 2.17
Vendor Specific :

Number of enclosures on adapter 2 -- 0


Exit Code: 0x00

/opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -i "slot number":

Slot Number: 1
Slot Number: 2
Slot Number: 3
Slot Number: 4
Slot Number: 5
Slot Number: 6
Slot Number: 7
Slot Number: 8
Slot Number: 9
Slot Number: 10
Slot Number: 11
Slot Number: 12
Slot Number: 13
Slot Number: 14
Slot Number: 15
Slot Number: 16

n5220uk:/home/maintenance # /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -aall

Adapter #0

==============================================================================
Versions
================
Product Name : Intel (R) RAID Controller RS2PI008
Serial No : SV21504201
FW Package Build: 12.12.0-0048

Mfg. Data
================
Mfg. Date : 04/09/12
Rework Date : 00/00/00
Revision No : 59A
Battery FRU : N/A

Image Versions in Flash:
================
FW Version : 2.120.63-1242
BIOS Version : 3.22.00_4.11.05.00_0x05020000
Preboot CLI Version: 04.04-017:#%00008
WebBIOS Version : 6.0-34-e_29-Rel
NVDATA Version : 2.09.03-0013
Boot Block Version : 2.02.00.00-0000
BOOT Version : 09.250.01.219

Pending Images in Flash
================
None

PCI Info
================
Vendor Id : 1000
Device Id : 0079
SubVendorId : 8086
SubDeviceId : 9280

Host Interface : PCIE

Number of Frontend Port: 0
Device Interface : PCIE

Number of Backend Port: 8
Port : Address
0 500015554e75723f
1 0000000000000000
2 0000000000000000
3 0000000000000000
4 0000000000000000
5 0000000000000000
6 0000000000000000
7 0000000000000000

HW Configuration
================
SAS Address : 500605b00493c1e0
BBU : Present
Alarm : Present
NVRAM : Present
Serial Debugger : Present
Memory : Present
Flash : Present
Memory Size : 512MB
TPM : Absent
On board Expander: Absent
Upgrade Key : Absent

Settings
================
Current Time : 23:20:19 4/4, 2023
Predictive Fail Poll Interval : 300sec
Interrupt Throttle Active Count : 16
Interrupt Throttle Completion : 50us
Rebuild Rate : 30%
PR Rate : 30%
BGI Rate : 30%
Check Consistency Rate : 30%
Reconstruction Rate : 30%
Cache Flush Interval : 4s
Max Drives to Spinup at One Time : 2
Delay Among Spinup Groups : 2s
Physical Drive Coercion Mode : 1GB
Cluster Mode : Disabled
Alarm : Enabled
Auto Rebuild : Enabled
Battery Warning : Enabled
Ecc Bucket Size : 15
Ecc Bucket Leak Rate : 1440 Minutes
Restore HotSpare on Insertion : Enabled
Expose Enclosure Devices : Disabled
Maintain PD Fail History : Disabled
Host Request Reordering : Enabled
Auto Detect BackPlane Enabled : SGPIO/i2c SEP
Load Balance Mode : Auto
Use FDE Only : No
Security Key Assigned : No
Security Key Failed : No
Security Key Not Backedup : No
Any Offline VD Cache Preserved : No
Allow Boot with Preserved Cache : No
Disable Online Controller Reset : No
PFK in NVRAM : No
Use disk activity for locate : No

Looking forward to hear more from you guys...

And I am attaching a text file as per 20,000 characters limit.

HI @unkn0wnn 

The external and internal storage both appear to be okay - no devices offline, failed disks or degraded. 

If you haven't done so recently, try rebooting the appliance head only (leave the external storage powered up) and see if the system can detect the external storage. 

Cheers
David

We did try to reboot with no success, storage was visible in the bios raid management, hmm its weird as in the GUI all is showing as up and running, don't we need to reattach / reactivate something else hmm?

n5220uk:/home/maintenance # vxprint -ht -g nbuapp
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE

dg nbuapp default default 12000 1357919097.7.nb-appliance

dm disk_1 disk_1 auto 65535 9755774656 -
dm disk_2 - - - - NODEVICE

v advol - DISABLED ACTIVE 49392123904 SELECT - fsgen
pl advol-01 advol DISABLED NODEVICE 49392123904 CONCAT - RW
sd disk_1-02 advol-01 disk_1 2097152 1560281088 0 disk_1 ENA
sd disk_2-01 advol-01 disk_2 0 41389391872 1560281088 - NDEV
sd disk_2-03 advol-01 disk_2 43734987072 6442450944 42949672960 - NDEV

v catvol - ENABLED ACTIVE 1951154176 SELECT - fsgen
pl catvol-01 catvol ENABLED ACTIVE 1951154176 CONCAT - RW
sd disk_1-01 catvol-01 disk_1 0 2097152 0 disk_1 ENA
sd disk_1-03 catvol-01 disk_1 1562378240 1949057024 2097152 disk_1 ENA

v pdvol - DISABLED ACTIVE 8589934592 SELECT - fsgen
pl pdvol-01 pdvol DISABLED NODEVICE 8589934592 CONCAT - RW
sd disk_1-04 pdvol-01 disk_1 3511435264 6244339392 0 disk_1 ENA
sd disk_2-02 pdvol-01 disk_2 41389391872 2345595200 6244339392 - NDEV

Hi @unkn0wnn 

Yes - the OS is unable to use the external volume (disk_2). As your output shows - the two vxvols (advol & pdvol) are both disabled as one of their consituant parts is offline (disk_2). 

Two things to try - first, power everything down and leave off for 5 mins, then power on the storage enclosure and give it time to come up and stabilise (about 5 mins), then power on the appliance. 

If this doesn't help, can you run the "fdisk -l" command and share the output.

David

Ok - we will reboot as per your sequence later and let you know the results, so far fdisk -l displays the below, btw: we have had issues rebooting and to resolve we had to power the storage off and boot up and power on storage after otherwise a cursor sits there indefinetly...


n5220uk:/home/maintenance # fdisk -l

Disk /dev/sda: 998.9 GB, 998999326720 bytes
255 heads, 63 sectors/track, 121454 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 4178 33559753+ 82 Linux swap / Solaris
/dev/sda2 * 4179 34542 243898830 83 Linux
/dev/sda3 34543 121454 698120640 83 Linux

Disk /dev/sdb: 4994.9 GB, 4994996633600 bytes
255 heads, 63 sectors/track, 607273 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb4 1 1 0+ ee EFI GPT

Disk /dev/VxDMP1: 998.9 GB, 998999326720 bytes
255 heads, 63 sectors/track, 121454 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/VxDMP1p1 1 4178 33559753+ 82 Linux swap / Solaris
/dev/VxDMP1p2 * 4179 34542 243898830 83 Linux
/dev/VxDMP1p3 34543 121454 698120640 83 Linux

Disk /dev/VxDMP2: 4994.9 GB, 4994996633600 bytes
255 heads, 63 sectors/track, 607273 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/VxDMP2p4 1 1 0+ ee EFI GPT

 

 

Hi @unkn0wnn 

What is not showing from the above is the 24TB external storage shelf ( possibly /dev/sdc). 

Until the OS can see the drive, there is not much else that can be done. 

If the power on sequence order I suggested doesn't work for you, then use whatever has worked for you in the past (normally the storage is powered on first), 

David

So booting with storage on did not resolve this, checked some more logs and suspecting hardware issue, some device id 0x18 went into degraded state as reported by dmesg too, screen shots attached, by default would it be disk or possibly scsci card something else failed?

 

unkn0wnn_0-1680720249317.png

 

unkn0wnn_1-1680720269795.png

 

 

 

Enclosure Device ID: 24
Slot Number: 11
Device Id: 18
Sequence Number: 1
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.817 TB [0xe8b6d000 Sectors]
Firmware state: Unconfigured(good), Spun Up
SAS Address(0): 0x5000cca01c6aa8bd
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: HITACHI HUS723020ALS640 A222YGHWNE2A
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: Foreign
Foreign Secure: Drive is not secured by a foreign lock key
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified

Hi @unkn0wnn 

As per the megaraid output - DeviceID 24 is the external storage shelf. 

You may have a hardware problem. The megaraid output though did not suggest any issue with the storage itself, so possibly either the raid card or cabling is the problem. 

David

 

Hi @davidmoline,

How about PSU unit? Can it be monitored same as battery etc.?

Hi @unkn0wnn 

The CLISH hardware health menu item can be used to provide a comprehensive status of most of the components (Support -> Test Hardware). This should include the PSU status (which can also be viewed directly by checking the lights on the appliance). 

I'm not sure if the test includes the PSU status of the external storage shelf.

David

Problem is that I monitor most of hw manually as perl monitoring scripts no longer work for me, getting hung upon ipmi_si module load into kernel, very long long scripts too.

 

My colleague changed something in the bios and pdvol working, advol showing as disabled recover:

v advol - DISABLED ACTIVE 49392123904 SELECT - fsgen
pl advol-01 advol DISABLED RECOVER 49392123904 CONCAT - RW
sd disk_1-02 advol-01 disk_1 2097152 1560281088 0 disk_1 ENA
sd disk_2-01 advol-01 disk_2 0 41389391872 1560281088 disk_2 ENA
sd disk_2-03 advol-01 disk_2 43734987072 6442450944 42949672960 disk_2 ENA

Can anybody pls advise ?

Ok issue is now resolved, our crew have changed raid settings in the bios and installed new psu from what I know, pdvol started automatically and advol did not but this time vxprint returned disabled active with disabled recover state, finally to resolve on OS side I've followed the below tech note to bring advol back to enabled active, enabled clean state:

https://www.veritas.com/support/en_US/article.100022863

Then had to run fsck as could not mount however not default fsck but fsck_vxfs, most of manual / info wanted -F flag set, I think that our fsck vxfs binary is not up to date so have ran the below with -t wich and -o full as requested by symantec:

/usr/lib/fs/vxfs/fsck -t vxfs -o full /dev/vx/rdsk/nbuapp/advol

responded yes as had 4 messages if bring back log to up and clean and clean header so only 4 messages which is very good,

then did netbackup stop just in case, netbackup start - all started ok and some nodes were rebuilded too,

then reran some backup and had error 2074 which is easy to resolve if no hardware issues:

/usr/openv/netbackup/bin/admincmd/nbdevconfig -changestate -stype AdvancedDisk -dp dp_adv_n5220uk -dv /advdisk -state RESET

&

/usr/openv/netbackup/bin/admincmd/nbdevconfig -changestate -stype AdvancedDisk -dp dp_adv_n5220uk -dv /advdisk -state UP

all working - thank you David / Everybody else