cancel
Showing results for 
Search instead for 
Did you mean: 

Client can not read/write after SG fail-over

mrrout
Level 4

Hi All,

I am getting a problem in my VCS 5.0 MP3 configuration for an NFS configuration. It is as below:

The cluster is a two node setup on AIX 5.3 TL09, contents of "main.cf" file is as below:

include "types.cf"

cluster HDS_CLUS (
        UserNames = { admin = fIJbIDiFJeJJhRJdIG }
        Administrators = { admin }
        UseFence = SCSI3
        )

system AIX-23 (
        )

system AIX-24 (
        )

group SG1 (
        SystemList = { AIX-23 = 0, AIX-24 = 1 }
        AutoStartList = { AIX-23, AIX-24 }
        )

        DiskGroup dg01 (
                DiskGroup = dg01
                )

        IP dNFS_IP (
                Critical = 0
                Device @AIX-3 = en2
                Device @AIX-4 = en2
                Address = "172.17.12.9"
                NetMask = "255.255.254.0"
                )

        Mount dNFS_Mount1 (
                Critical = 0
                MountPoint = "/11"
                BlockDevice = "/dev/vx/dsk/dg01/vol1"
                FSType = vxfs
                FsckOpt = "-y"
                )

        Mount dNFS_Mount2 (
                Critical = 0
                MountPoint = "/12"
                BlockDevice = "/dev/vx/dsk/dg01/vol2"
                FSType = vxfs
                FsckOpt = "-y"
                )

        NFS dNFS_NFS (
                Critical = 0
                NFSv4Root = "/"
                )

        NFSRestart NFSRestart_1 (
                NFSRes = dNFS_NFS
                )

        NIC dNFS_NIC (
                Device @AIX-23 = en2
                Device @AIX-24 = en2
                NetworkHosts = { "172.17.12.1" }
                )

        Share dNFS_Share1 (
                Critical = 0
                PathName = "/11"
                Options = "rw,vers=4"
                )

        Share dNFS_Share2 (
                Critical = 0
                PathName = "/12"
                Options = "rw,vers=4"
                )

        NFSRestart_1 requires dNFS_IP
        dNFS_IP requires dNFS_NIC
        dNFS_IP requires dNFS_Share1
        dNFS_IP requires dNFS_Share2
        dNFS_Mount1 requires dg01
        dNFS_Mount2 requires dg01
        dNFS_Share1 requires dNFS_Mount1
        dNFS_Share1 requires dNFS_NFS
        dNFS_Share2 requires dNFS_Mount2
        dNFS_Share2 requires dNFS_NFS

 

Whenever I am trying to failover the SG using - "hagrp -switch SG1 -to AIX-24" [Assuming the SG is online on AIX-23], after failover the client returns following on trying to read/ write into the FS:

AIX91-C2 > touch /fs0/xxx                                        [AIX91-C2 is my client]
touch: 0652-046 Cannot create /fs0/xxx

AIX91-C2 > ls -l /fs0
ls: 0653-341 The file /fs0/xyz does not exist.
total 0
drwxr-xr-x    2 root     system           96 Aug 22 2011  lost+found

I am not getting any issues if I offline/ online the SG on the same node with an interval of about 10 mins, i.e. offline the SG wait for 10 mins and then online it on the same node. I guess this is something related to NFS Lock or file handle but not sure how to resolve.

Any help is greatly appreciated.

 

Thanks in advance.


 

1 ACCEPTED SOLUTION

Accepted Solutions

mrrout
Level 4

Mike,

Thank you very much for your input, I was missing some madatory steps.

But the problem I had (Client failing to read/ write to NFS mounts) was due to different major numbers for VxVM disks in two servers (One had 52 the other had 53). After modify it to the same major number using - haremajor -s <#> in one server this issue did not re-appear.

Thank you very much for your comments.

Manoranjan.

View solution in original post

3 REPLIES 3

mrrout
Level 4

If I mount the shared filesystem (From VCS cluster) on a linux client and then do switchover of the NFS service group (SG1) then the client shows:

"Stale file Handle" error for the corresponding filesystem. For a standalone NFS server one would unmount the FS at the client and re-mount it to resolve the issue but a cluster setup should not be requiring this. 

Please let me know if anyone has come accross such an issue and any resolution for this. 

Thanks in advance.

mikebounds
Level 6
Partner Accredited

Have you done the following from 5.0MP3 RP5 readme_first (ftp://ftp.veritas.com/pub/support/patchcentral/AIX/5.0_MP3/sfha/sfha-aix-5.0MP3RP5-patches.tar.gz_do...:)

Mandatory configuration change for the NFS and NFSRestart

 

resources
You must perform the following instructions for VCS configurations that have
NFSRestart resources. Failure to perform these instructions can result in
NFS/NFSRestart resources not functioning correctly.
Symantec implemented this change to prevent the invocation of
NFSRestart-related triggers when no NFSRestart resources in the VCS
configuration.
To copy the nfs_preonline and nfs_postoffline files
Copy the nfs_preonline and nfs_postoffline files to the
/opt/VRTSvcs/bin/triggers directory.
# cp /opt/VRTSvcs/bin/sample_triggers/nfs_preonline \ 
 
/opt/VRTSvcs/bin/triggers
 
# cp /opt/VRTSvcs/bin/sample_triggers/nfs_postoffline \ 
 
/opt/VRTSvcs/bin/triggers
 
 
You were required to do this from at least RP2.
In case you only have RP1 installed, the instructions where different:
 
1469381
Fixed an issue where the Share agent was 10x slower on 5.0 MP1 with 300+
Share resources in a service group.
Note: This fix changes basic VCS functionality, it is critically important for
you to implement these changes for all service groups that contain
NFSRestart resources.
You must set the value of the PreOnline attribute to 1 for all service groups
that contain NFSRestart resources. Failure to set the service group's
PreOnline attribute to a value of 1 results in broken NFSRestart resource
configurations.
The ha commands to change this attribute are:
# haconf -makerw
# hagrp -modify servicegroup_name PreOnline 1
# haconf -dump -makero
 
 
Mike
 

mrrout
Level 4

Mike,

Thank you very much for your input, I was missing some madatory steps.

But the problem I had (Client failing to read/ write to NFS mounts) was due to different major numbers for VxVM disks in two servers (One had 52 the other had 53). After modify it to the same major number using - haremajor -s <#> in one server this issue did not re-appear.

Thank you very much for your comments.

Manoranjan.