05-25-2016 11:40 PM
I met a problem about vcs.
Environment:
HW T5220 Server *2 + ax4-5;
Problem description:
When executing “init 6” or “hastop –all” command in cluster system, resource cfsmount1&cfsmount2 could not been offline normally;
Checked with the HW state(EMC connective state ,disk, system, iostat –En),the output of “vxdisk , vxprint, vxdmpadm, fuser, mount –v etc.”
I tried to umount /var/opt/mediation/MMStorage manually, it did not succeed and it look like the process has hung up;
Please see check list in attach file check_point.log , engine_A.log and main.cf .
Could you give me some advice about how to fix the problem?
05-26-2016 08:17 PM
Hi,
Not find attached file.
if cfsmount resource can't offline, normally need check if some application access the file system.
like fuser -uc /mountpoint.
Before offline cfsmount resouce, make sure application resource has offline on that file system.
if manual umount cfsmount ,you need think about mount lock.
05-26-2016 09:41 PM
Hi,
Not sure where attachments are, I did see them & then they were gone. From the logs, I see that cfsmount couldn't offline in time however later clean of resource succeeded
2016/05/25 16:45:52 VCS WARNING V-16-2-13011 (HBCG14BER) Resource(cfsmount1): offline procedure did not complete within the expected time
2016/05/25 16:45:52 VCS WARNING V-16-2-13011 (HBCG14BER) Resource(cfsmount2): offline procedure did not complete within the expected time.
2016/05/25 16:45:53 VCS INFO V-16-2-13068 (HBCG14BER) Resource(cfsmount2) - clean completed successfully.
2016/05/25 16:45:53 VCS INFO V-16-2-13068 (HBCG14BER) Resource(cfsmount1) - clean completed successfully.
two things I can suggest ..
1. Check if the app was offline, that it was not writing to filesystem anymore.
2. As a test, you can try with increasing timeout of cfsmount resource (hares -modify MonitorTimeout .. )
3. As clean is able to succeed later, I have strong belief that something is holding up the filesystem which is cleaned by clean procedure. you may need to dig more from filesystem prospective to understand what is holding / accessing filesystem.
G
05-27-2016 01:22 AM
when timeout occured i tired fuser -c /mountpoint , but no response.
and i also try umount /mountpoint & unmount -o mntunlock=vcs /mountpoint. it does not succeed.
:(((
05-27-2016 04:44 AM
What was the error that you got when you ran:
unmount -o mntunlock=vcs /mountpoint
"mntunlock=vcs" --> Is it a typo? VCS should be in caps.
If you used caps then also it could not unmount then something is holding up the filesystem, which needs to be figured out.
05-29-2016 07:12 PM
05-30-2016 09:27 AM
If fuser is hanging that does not mean FS is not being held up. Can you give us below outputs:
uname -a
pkginfo -l VRTSvcs
pkginfo -l VRTSvxvm
pkginfo -l VRTSvxfs
modinfo | grep -i vx
df -k
mount -p
fuser -cu /<mount point>
fuser -fu /<mount point>
fuser -ku /<mount point>
If fuser hangs dont kill the process, follow the below technote and keep your evidences ready in case you need to open a suppor case to analyze live core and truss output of hung processes.
https://www.veritas.com/support/en_US/article.000020115
05-30-2016 08:23 PM
beforeexecute hastop -all
# uname -a
SunOS HBCG13BER 5.10 Generic_150400-18 sun4v sparc SUNW,SPARC-Enterprise-T5220
# pkginfo -l VRTSvcs
PKGINST: VRTSvcs
NAME: Veritas Cluster Server by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.1
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Veritas Cluster Server by Symantec
PSTAMP: 5.1.104.000-5.1SP1RP4-2013-08-08_16.00.00
INSTDATE: Jan 09 2016 12:44
STATUS: completely installed
FILES: 284 installed pathnames
28 shared pathnames
61 directories
105 executables
237447 blocks used (approx)
# pkginfo -l VRTSvxvm
PKGINST: VRTSvxvm
NAME: Binaries for VERITAS Volume Manager by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.1,REV=10.06.2009.22.05
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Virtual Disk Subsystem
PSTAMP: 5.1.104.000-5.1SP1RP4-2013-08-07-142629-19
INSTDATE: Jan 09 2016 12:35
HOTLINE: http://support.veritas.com/phonesup/phonesup_ddProduct_.htm
EMAIL: support@veritas.com
STATUS: completely installed
FILES: 955 installed pathnames
44 shared pathnames
116 directories
428 executables
419176 blocks used (approx)
# pkginfo -l VRTSvxfs
PKGINST: VRTSvxfs
NAME: VERITAS File System
CATEGORY: system,utilities
ARCH: sparc
VERSION: 5.1,REV=7Oct2009
BASEDIR: /
VENDOR: VERITAS Software
DESC: Commercial File System
PSTAMP: 5.1.104.000-5.1SP1RP4-2013-08-12-FS-142634-13
INSTDATE: Jan 09 2016 12:41
HOTLINE: (800) 342-0652
EMAIL: support@veritas.com
STATUS: completely installed
FILES: 332 installed pathnames
35 shared pathnames
4 linked files
53 directories
108 executables
117239 blocks used (approx)
# modinfo | grep -i vx
34 7be00000 51628 361 1 vxdmp (VxVM 5.1SP1RP4 DMP Driver)
35 7ba00000 221b08 362 1 vxio (VxVM 5.1SP1RP4 I/O driver)
37 7be48df0 11a8 363 1 vxspec (VxVM 5.1SP1RP4 control/status d)
244 7af153c0 d40 364 1 vxportal (VxFS 5.1SP1RP4 portal driver)
245 7a200000 206470 21 1 vxfs (VxFS 5.1SP1RP4 SunOS 5.10)
248 79e00000 6d760 368 1 vxfen (VRTS Fence 5.1SP1RP4)
249 7af16000 24f90 369 1 vxglm (VxGLM 5.1_SP1RP2P1 SunOS 5.10)
250 7a7c2000 48a8 370 1 vxgms (VxGMS 5.1_SP1 (Solaris 5.10))
266 7a7e6000 baa8 365 1 fdd (VxQIO 5.1SP1RP4 Quick I/O drive)
# df -k
Filesystem kbytes used avail capacity Mounted on
/dev/md/dsk/d0 108687644 18906254 88694514 18% /
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 56583952 1904 56582048 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
/platform/SUNW,SPARC-Enterprise-T5220/lib/libc_psr/libc_psr_hwcap2.so.1
108687644 18906254 88694514 18% /platform/sun4v/lib/libc_psr.so.1
/platform/SUNW,SPARC-Enterprise-T5220/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
108687644 18906254 88694514 18% /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd 0 0 0 0% /dev/fd
swap 56582312 264 56582048 1% /tmp
swap 56582080 32 56582048 1% /var/run
swap 56582048 0 56582048 0% /dev/vx/dmp
swap 56582048 0 56582048 0% /dev/vx/rdmp
/dev/odm 0 0 0 0% /dev/odm
/dev/vx/dsk/mmdbdg/vol01
209673216 329715 196260240 1% /var/opt/mediation/MMDB
/dev/vx/dsk/mmdatadg/vol01
13507384320 3861851 13235095333 1% /var/opt/mediation/MMStorage
#
#
# mount -p
/dev/md/dsk/d0 - / ufs - no rw,intr,largefiles,logging,xattr,onerror=panic
/devices - /devices devfs - no
ctfs - /system/contract ctfs - no
proc - /proc proc - no
mnttab - /etc/mnttab mntfs - no
swap - /etc/svc/volatile tmpfs - no xattr
objfs - /system/object objfs - no
sharefs - /etc/dfs/sharetab sharefs - no
/platform/SUNW,SPARC-Enterprise-T5220/lib/libc_psr/libc_psr_hwcap2.so.1 - /platform/sun4v/lib/libc_psr.so.1 lofs - no
/platform/SUNW,SPARC-Enterprise-T5220/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1 - /platform/sun4v/lib/sparcv9/libc_psr.so.1 lofs - no
fd - /dev/fd fd - no rw
swap - /tmp tmpfs - no xattr
swap - /var/run tmpfs - no xattr
swap - /dev/vx/dmp tmpfs - no xattr
swap - /dev/vx/rdmp tmpfs - no xattr
/dev/odm - /dev/odm odm - no smartsync
/dev/vx/dsk/mmdbdg/vol01 - /var/opt/mediation/MMDB vxfs - no rw,suid,delaylog,largefiles,qio,cluster,ioerror=mdisable,crw,mntlock=VCS
/dev/vx/dsk/mmdatadg/vol01 - /var/opt/mediation/MMStorage vxfs - no rw,suid,delaylog,largefiles,qio,cluster,ioerror=mdisable,crw,mntlock=VCS
#
#
# fuser -cu /var/opt/mediation/MMStorage/
/var/opt/mediation/MMStorage/: 6583om(root) 5985om(root) 5853o(root) 5729o(root) 5605o(root)
# fuser -fu /var/opt/mediation/MMStorage/
/var/opt/mediation/MMStorage/:
# fuser -ku /var/opt/mediation/MMStorage/
/var/opt/mediation/MMStorage/:
after execute hastop -all
# fuser -cu /var/opt/mediation/MMStorage
/var/opt/mediation/MMStorage: fuser: Invalid argument
#
06-01-2016 05:01 AM
# fuser -cu /var/opt/mediation/MMStorage
/var/opt/mediation/MMStorage: fuser: Invalid argument
#
The possible reason for hitting the error "Invalid argument" when the FS is already unmounted. Did you check using df command if the FS is already unmounted?
06-02-2016 05:20 AM
I would say first lets try to segregate the layer of the issue .. is it the issue with the mount / app accessing or any HA component.
Can you try bringing up CVM, mounting the filesystems using CFSmount command, start the app. Then stop application & see if you are able to offline the mount using cfsumount command. This will help us to figure if the issue is happening at HA layer or FS layer. If the cfsumount succeeds, I would suggest to repeat the procedure & time the cfsumount command.
Basis on the time taken by cfsumount, you may need to adjust the monitortimeout of cfsmount resources to ensure sufficient time is given for VCS to stop the mount resource.
G