cancel
Showing results for 
Search instead for 
Did you mean: 

NVMe drives disappear after upgrade to the RHEL7.7 kernel.

ElCoyote
Level 3

Hi,

I'm using Infoscale 7.4.1.1300 on RHEL 7.x

Tonight, as I was running RHEL7.7 with the latest RHEL7.6 kernel, I decided to upgrade to the RHEL7.7 kernel (the only part of 7.7 which was missing). This
had the nasty side effect of making NVMe drives disappear.

1) before upgrade:

# modinfo vxio
filename: /lib/modules/3.10.0-957.27.2.el7.x86_64/veritas/vxvm/vxio.ko
license: VERITAS
retpoline: Y
supported: external
version: 7.4.1.1300
license: Proprietary. Send bug reports to enterprise_technical_support@veritas.com
retpoline: Y
rhelversion: 7.6
depends: veki
vermagic: 3.10.0-957.el7.x86_64 SMP mod_unload modversions

# vxdmpadm listctlr
CTLR_NAME ENCLR_TYPE STATE ENCLR_NAME PATH_COUNT
=========================================================================
c515 Samsung_NVMe ENABLED daltigoth_samsung_nvme1 1
c0 Disk ENABLED disk 3

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
nvme0n1 auto:cdsdisk - (nvm01dg) online ssdtrim
sda auto:LVM - - LVM
sdb auto:cdsdisk loc01d00 local01dg online
sdc auto:cdsdisk - (ssd01dg) online

 

2) after upgrade:

# modinfo vxio
filename: /lib/modules/3.10.0-1062.1.1.el7.x86_64/veritas/vxvm/vxio.ko
license: VERITAS
retpoline: Y
supported: external
version: 7.4.1.1300
license: Proprietary. Send bug reports to enterprise_technical_support@veritas.com
retpoline: Y
rhelversion: 7.7
depends: veki
vermagic: 3.10.0-1062.el7.x86_64 SMP mod_unload modversions

# vxdmpadm listctlr
CTLR_NAME ENCLR_TYPE STATE ENCLR_NAME PATH_COUNT
=========================================================================
c0 Disk ENABLED disk 3

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
sda auto:LVM - - LVM
sdb auto:cdsdisk loc01d00 local01dg online
sdc auto:cdsdisk - (ssd01dg) online

 

I've reverted to the latest z-stream RHEL7.6 kernel (3.10.0-957.27.2.el7) while I research this issue.

Has this been reported already?

9 REPLIES 9

ElCoyote
Level 3

On RHEL7.7 using kernel 3.10.0-957.27.2.el7 (7.6.z errata), I see this:

# ls -al /dev/vx/rdmp/nvme0n1 /dev/nvme0n1
brw-rw----. 1 root disk 259, 0 Sep 5 01:30 /dev/nvme0n1
brw-------. 1 root root 201, 64 Aug 9 23:19 /dev/vx/rdmp/nvme0n1

On RHEL7.7 using kernel 3.10.0-1062.1.1.el7 (7.7.z errata), I see this:

# ls -al /dev/vx/rdmp/nvme0n1 /dev/nvme0n1
ls: cannot access /dev/vx/rdmp/nvme0n1: No such file or directory
brw-rw----. 1 root disk 259, 0 Sep 5 01:30 /dev/nvme0n1

To me, this looks like an issue in the DMP layer (confirmed by the absence of the NVMe controller in

vxdmpadm listctlr

).

ON 3.10.0-957.27.2.el7, I see:

# vxddladm list devices
DEVICE TARGET-ID STATE DDL-STATUS (ASL)
===============================================================
nvme0n1 - Online CLAIMED (libvxnvme.so)
sdc - Online CLAIMED (Disk)
sdb - Online CLAIMED (Disk)
sda - Online CLAIMED (Disk)

On 3.10.0-1062.1.1.el7, I see:

# vxddladm list devices
DEVICE TARGET-ID STATE DDL-STATUS (ASL)
===============================================================
sda - Online CLAIMED (Disk)
sdb - Online CLAIMED (Disk)
nvme0n1 - Online ERROR (libvxnvme.so)

That's interesting because both systems are using the same ASL:

# rpm -qf /etc/vx/lib/discovery.d/libvxnvme.so
VRTSaslapm-7.4.1.1200-RHEL7.x86_64

I upgraded the asl rpm and at one point, the NVMe drive showed up after the upgrade.. but it was gone upon the next reboot:

 

# rpm -ivh VRTSaslapm-7.4.1.1201-RHEL7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:VRTSaslapm-7.4.1.1201-RHEL7      ################################# [100%]
Installing keys for APMs
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
nvme0n1 auto - - error
sda auto:LVM - - LVM
sdb auto:cdsdisk loc00d00 local00dg online
# vxdisksetup -ivf nvme0n1
VxVM ERROR V-5-1-19068 Failed to read disk /dev/vx/rdmp/nvme0n1 with error 6
VxVM ERROR V-5-1-15405 vxmediadisc: Error (6) returned from vol_media_fmt(), retrying.
VxVM ERROR V-5-1-15405 vxmediadisc: Error (6) returned from vol_media_fmt().
VxVM ERROR V-5-1-15405 vxmediadisc: Error (6) returned from vol_media_fmt().
VxVM ERROR V-5-1-19076 vxmediadisc: Error (6).
VxVM ERROR V-5-3-12153: Can't open device /dev/vx/dmp/nvme0n1
VxVM vxparms ERROR V-5-1-6536 error reading partitions
VxVM vxdisksetup ERROR V-5-2-6546 0

This appears to be caused by an issue with NVMe SCSI VPD information on el7.7 kernels:

On 3.10.0-1062.1.1.el7:
# sg_inq /dev/nvme0n1
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x06  [SPC-4]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  [BQue=0]
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
    length=36 (0x24)   Peripheral device type: disk
 Vendor identification: NVMe    
 Product identification: <==================================
 Product revision level: <==================================
 Unit serial number: <==================================

On 3.10.0-957.27.2.el7:
# sg_inq /dev/nvme0n1
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x06  [SPC-4]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  [BQue=0]
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
    length=36 (0x24)   Peripheral device type: disk
 Vendor identification: NVMe    
 Product identification: Samsung SSD 970  <==================================
 Product revision level: EXE7  <==================================
 Unit serial number: S464NB0M400753T      <==================================

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

HI Marianne,

I think that link needs an update. Patches for support of Infoscale 7.4.1 on 7.7 were posted by Veritas 24 hours after that RHEL minor update shipped:

https://sort.veritas.com/patch/detail/15132

I wrote (and updated) the above article.

https://access.redhat.com/errata/RHSA-2019:3979

New RH kernel (kernel-3.10.0-1062.7.1) resolves the NVME VPD information required by Infoscale.

Hope this helps!

Danny