cfsmount1 &cfsmount2 resource could not offline

I met a problem about vcs.

 

Environment:

HW T5220 Server *2 + ax4-5;

Problem description:

 

When executing “init 6” or “hastop –all” command in cluster system, resource cfsmount1&cfsmount2 could not been offline normally;

 

Checked with the HW state(EMC connective state ,disk, system,  iostat –En),the output of “vxdisk , vxprint, vxdmpadm, fuser, mount –v etc.”

 

I tried to umount  /var/opt/mediation/MMStorage manually, it did not succeed  and it look like the process  has hung up;

 

Please see check list in attach file check_point.log  , engine_A.log and main.cf .

 

Could you give me some advice about how to fix the problem?

9 Replies
Highlighted

Re: cfsmount1 &cfsmount2 resource could not offline

Hi, 

  Not find attached file.

  if cfsmount resource can't offline, normally need check if some application access the file system.

 like fuser -uc /mountpoint.

Before offline cfsmount resouce, make sure application resource has offline on that file system.

 

  if manual umount cfsmount ,you need think about mount lock.

Highlighted

Re: cfsmount1 &cfsmount2 resource could not offline

Hi,

Not sure where attachments are, I did see them & then they were gone. From the logs, I see that cfsmount couldn't offline in time however later clean of resource succeeded

2016/05/25 16:45:52 VCS WARNING V-16-2-13011 (HBCG14BER) Resource(cfsmount1): offline procedure did not complete within the expected time

2016/05/25 16:45:52 VCS WARNING V-16-2-13011 (HBCG14BER) Resource(cfsmount2): offline procedure did not complete within the expected time.

 

2016/05/25 16:45:53 VCS INFO V-16-2-13068 (HBCG14BER) Resource(cfsmount2) - clean completed successfully.
2016/05/25 16:45:53 VCS INFO V-16-2-13068 (HBCG14BER) Resource(cfsmount1) - clean completed successfully.

 

two things I can suggest ..

1. Check if the app was offline, that it was not writing to filesystem anymore.

2. As a test, you can try with increasing timeout of cfsmount resource (hares -modify MonitorTimeout .. )

3. As clean is able to succeed later, I have strong belief that something is holding up the filesystem which is cleaned by clean procedure. you may need to dig more from filesystem prospective to understand what is holding / accessing filesystem.

 

G

Highlighted

when timeout occured i tired

when timeout occured i tired fuser -c /mountpoint , but no response.

and i also try umount /mountpoint & unmount -o mntunlock=vcs /mountpoint. it does not succeed.

Smiley Sad((

Highlighted

Re: cfsmount1 &cfsmount2 resource could not offline

What was the error that you got when you ran:

unmount -o mntunlock=vcs /mountpoint

"mntunlock=vcs" --> Is it a typo? VCS should be in caps.

If you used caps then also it could not unmount then something is holding up the filesystem, which needs to be figured out.

 

Highlighted

Hi As you said that something

Hi As you said that something is holding up the filesystem, then the fuser command should be show the process that use the filesystem; but in actually when execute fuser -c /mountpoint ,there is nothing showed on screen; # umount /var/opt/mediation/MMStorage ^C # fuser -c /var/opt/mediation/MMStorage /var/opt/mediation/MMStorage: ^C # fuser -c /var/opt/mediation/MMDB /var/opt/mediation/MMDB: ^C # # Ctrl +C command is used when nothing returned; Do you have any other suggestion? thank you !
Highlighted

Re: cfsmount1 &cfsmount2 resource could not offline

If fuser is hanging that does not mean FS is not being held up. Can you give us below outputs:

uname -a

pkginfo -l VRTSvcs

pkginfo -l VRTSvxvm

pkginfo -l VRTSvxfs

modinfo | grep -i vx

 

df -k

mount -p

fuser -cu /<mount point>

fuser -fu /<mount point>

fuser -ku /<mount point>

 

If fuser hangs dont kill the process, follow the below technote and keep your evidences ready in case you need to open a suppor case to analyze live core and truss output of hung processes.

https://www.veritas.com/support/en_US/article.000020115

 

 

Highlighted

beforeexecute  hastop -all #

beforeexecute  hastop -all

# uname -a
SunOS HBCG13BER 5.10 Generic_150400-18 sun4v sparc SUNW,SPARC-Enterprise-T5220
# pkginfo -l VRTSvcs
   PKGINST:  VRTSvcs
      NAME:  Veritas Cluster Server by Symantec
  CATEGORY:  system
      ARCH:  sparc
   VERSION:  5.1
   BASEDIR:  /
    VENDOR:  Symantec Corporation
      DESC:  Veritas Cluster Server by Symantec
    PSTAMP:  5.1.104.000-5.1SP1RP4-2013-08-08_16.00.00
  INSTDATE:  Jan 09 2016 12:44
    STATUS:  completely installed
     FILES:      284 installed pathnames
                  28 shared pathnames
                  61 directories
                 105 executables
              237447 blocks used (approx)

# pkginfo -l VRTSvxvm
   PKGINST:  VRTSvxvm
      NAME:  Binaries for VERITAS Volume Manager by Symantec
  CATEGORY:  system
      ARCH:  sparc
   VERSION:  5.1,REV=10.06.2009.22.05
   BASEDIR:  /
    VENDOR:  Symantec Corporation
      DESC:  Virtual Disk Subsystem
    PSTAMP:  5.1.104.000-5.1SP1RP4-2013-08-07-142629-19
  INSTDATE:  Jan 09 2016 12:35
   HOTLINE:  http://support.veritas.com/phonesup/phonesup_ddProduct_.htm
     EMAIL:  support@veritas.com
    STATUS:  completely installed
     FILES:      955 installed pathnames
                  44 shared pathnames
                 116 directories
                 428 executables
              419176 blocks used (approx)

# pkginfo -l VRTSvxfs
   PKGINST:  VRTSvxfs
      NAME:  VERITAS File System
  CATEGORY:  system,utilities
      ARCH:  sparc
   VERSION:  5.1,REV=7Oct2009
   BASEDIR:  /
    VENDOR:  VERITAS Software
      DESC:  Commercial File System
    PSTAMP:  5.1.104.000-5.1SP1RP4-2013-08-12-FS-142634-13
  INSTDATE:  Jan 09 2016 12:41
   HOTLINE:  (800) 342-0652
     EMAIL:  support@veritas.com
    STATUS:  completely installed
     FILES:      332 installed pathnames
                  35 shared pathnames
                   4 linked files
                  53 directories
                 108 executables
              117239 blocks used (approx)

# modinfo | grep -i vx
 34 7be00000  51628 361   1  vxdmp (VxVM 5.1SP1RP4 DMP Driver)
 35 7ba00000 221b08 362   1  vxio (VxVM 5.1SP1RP4 I/O driver)
 37 7be48df0   11a8 363   1  vxspec (VxVM 5.1SP1RP4 control/status d)
244 7af153c0    d40 364   1  vxportal (VxFS 5.1SP1RP4 portal driver)
245 7a200000 206470  21   1  vxfs (VxFS 5.1SP1RP4 SunOS 5.10)
248 79e00000  6d760 368   1  vxfen (VRTS Fence 5.1SP1RP4)
249 7af16000  24f90 369   1  vxglm (VxGLM 5.1_SP1RP2P1 SunOS 5.10)
250 7a7c2000   48a8 370   1  vxgms (VxGMS 5.1_SP1 (Solaris 5.10))
266 7a7e6000   baa8 365   1  fdd (VxQIO 5.1SP1RP4 Quick I/O drive)
# df -k
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d0       108687644 18906254 88694514    18%    /
/devices                   0       0       0     0%    /devices
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                 56583952    1904 56582048     1%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
/platform/SUNW,SPARC-Enterprise-T5220/lib/libc_psr/libc_psr_hwcap2.so.1
                     108687644 18906254 88694514    18%    /platform/sun4v/lib/libc_psr.so.1
/platform/SUNW,SPARC-Enterprise-T5220/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
                     108687644 18906254 88694514    18%    /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd                         0       0       0     0%    /dev/fd
swap                 56582312     264 56582048     1%    /tmp
swap                 56582080      32 56582048     1%    /var/run
swap                 56582048       0 56582048     0%    /dev/vx/dmp
swap                 56582048       0 56582048     0%    /dev/vx/rdmp
/dev/odm                   0       0       0     0%    /dev/odm
/dev/vx/dsk/mmdbdg/vol01
                     209673216  329715 196260240     1%    /var/opt/mediation/MMDB
/dev/vx/dsk/mmdatadg/vol01
                     13507384320 3861851 13235095333     1%    /var/opt/mediation/MMStorage
#
#
# mount -p
/dev/md/dsk/d0 - / ufs - no rw,intr,largefiles,logging,xattr,onerror=panic
/devices - /devices devfs - no
ctfs - /system/contract ctfs - no
proc - /proc proc - no
mnttab - /etc/mnttab mntfs - no
swap - /etc/svc/volatile tmpfs - no xattr
objfs - /system/object objfs - no
sharefs - /etc/dfs/sharetab sharefs - no
/platform/SUNW,SPARC-Enterprise-T5220/lib/libc_psr/libc_psr_hwcap2.so.1 - /platform/sun4v/lib/libc_psr.so.1 lofs - no
/platform/SUNW,SPARC-Enterprise-T5220/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1 - /platform/sun4v/lib/sparcv9/libc_psr.so.1 lofs - no
fd - /dev/fd fd - no rw
swap - /tmp tmpfs - no xattr
swap - /var/run tmpfs - no xattr
swap - /dev/vx/dmp tmpfs - no xattr
swap - /dev/vx/rdmp tmpfs - no xattr
/dev/odm - /dev/odm odm - no smartsync
/dev/vx/dsk/mmdbdg/vol01 - /var/opt/mediation/MMDB vxfs - no rw,suid,delaylog,largefiles,qio,cluster,ioerror=mdisable,crw,mntlock=VCS
/dev/vx/dsk/mmdatadg/vol01 - /var/opt/mediation/MMStorage vxfs - no rw,suid,delaylog,largefiles,qio,cluster,ioerror=mdisable,crw,mntlock=VCS
#
#
# fuser -cu /var/opt/mediation/MMStorage/
/var/opt/mediation/MMStorage/:     6583om(root)    5985om(root)    5853o(root)    5729o(root)    5605o(root)
# fuser -fu /var/opt/mediation/MMStorage/
/var/opt/mediation/MMStorage/:
# fuser -ku /var/opt/mediation/MMStorage/
/var/opt/mediation/MMStorage/:

 

after execute hastop -all

# fuser -cu /var/opt/mediation/MMStorage
/var/opt/mediation/MMStorage: fuser: Invalid argument
#

Highlighted

Re: cfsmount1 &cfsmount2 resource could not offline

# fuser -cu /var/opt/mediation/MMStorage
/var/opt/mediation/MMStorage: fuser: Invalid argument
#

 

The possible reason for hitting the error "Invalid argument" when the FS is already unmounted. Did you check using df command if the FS is already unmounted?

Highlighted

Re: cfsmount1 &cfsmount2 resource could not offline

I would say first lets try to segregate the layer of the issue .. is it the issue with the mount / app accessing or any HA component.

Can you try bringing up CVM, mounting the filesystems using CFSmount command, start the app. Then stop application & see if you are able to offline the mount using cfsumount command. This will help us to figure if the issue is happening at HA layer or FS layer. If the cfsumount succeeds, I would suggest to repeat the procedure & time the cfsumount command.

Basis on the time taken by cfsumount, you may need to adjust the monitortimeout of cfsmount resources to ensure sufficient time is given for VCS to stop the mount resource.


G