cancel
Showing results for 
Search instead for 
Did you mean: 

VCS node sanboot os become readonly in configuration CFS testbed.

dongdong
Level 2
Hi,

I have two nodes VCS testbed, and configured CFS-NFS.    (the tested version is 5.1)

Run IO on nfs client.

After running IO on the CFS mounted dirs about more than 10 days, I find the cluster master node (for example node B)SANBoot OS become readonly which is encapsulated by vxvm.  And IO failed on client.

The logs on node A:
Mar 30 01:36:21 arcx3455rmnd2 kernel: LLT INFO V-14-1-10205 link 0 (eth1) node 1 in trouble
Mar 30 01:36:22 arcx3455rmnd2 kernel: LLT INFO V-14-1-10024 link 0 (eth1) node 1 active
Mar 30 01:36:22 arcx3455rmnd2 kernel: LLT INFO V-14-1-10205 link 1 (eth0) node 1 in trouble
Mar 30 01:36:24 arcx3455rmnd2 kernel: LLT INFO V-14-1-10205 link 0 (eth1) node 1 in trouble
Mar 30 01:36:26 arcx3455rmnd2 kernel: LLT INFO V-14-1-10024 link 0 (eth1) node 1 active
Mar 30 01:36:27 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 1 (eth0) node 1 inactive 8 sec (33366296)
Mar 30 01:36:28 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 1 (eth0) node 1 inactive 9 sec (33366308)
Mar 30 01:36:29 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 1 (eth0) node 1 inactive 10 sec (33366320)
Mar 30 01:36:29 arcx3455rmnd2 kernel: LLT INFO V-14-1-10205 link 0 (eth1) node 1 in trouble
Mar 30 01:36:30 arcx3455rmnd2 kernel: LLT INFO V-14-1-10024 link 0 (eth1) node 1 active
Mar 30 01:36:30 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 1 (eth0) node 1 inactive 11 sec (33366333)
Mar 30 01:36:31 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 1 (eth0) node 1 inactive 12 sec (33366346)
Mar 30 01:36:32 arcx3455rmnd2 kernel: LLT INFO V-14-1-10205 link 0 (eth1) node 1 in trouble
Mar 30 01:36:32 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 1 (eth0) node 1 inactive 13 sec (33366358)
Mar 30 01:36:32 arcx3455rmnd2 kernel: LLT INFO V-14-1-10024 link 0 (eth1) node 1 active
Mar 30 01:36:32 arcx3455rmnd2 kernel: LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (eth0) node 1. 4 more to go.
Mar 30 01:36:33 arcx3455rmnd2 kernel: LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (eth0) node 1. 3 more to go.
Mar 30 01:36:33 arcx3455rmnd2 kernel: LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (eth0) node 1. 2 more to go.
Mar 30 01:36:33 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 1 (eth0) node 1 inactive 14 sec (33366369)
Mar 30 01:36:33 arcx3455rmnd2 kernel: LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (eth0) node 1. 1 more to go.
Mar 30 01:36:34 arcx3455rmnd2 kernel: LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (eth0) node 1. 0 more to go.
Mar 30 01:36:34 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 1 (eth0) node 1 inactive 15 sec (33366381)
Mar 30 01:36:34 arcx3455rmnd2 kernel: LLT INFO V-14-1-10509 link 1 (eth0) node 1 expired
Mar 30 01:36:34 arcx3455rmnd2 kernel: LLT INFO V-14-1-10205 link 0 (eth1) node 1 in trouble
Mar 30 01:36:37 arcx3455rmnd2 kernel: LLT INFO V-14-1-10024 link 0 (eth1) node 1 active
Mar 30 01:36:39 arcx3455rmnd2 kernel: LLT INFO V-14-1-10205 link 0 (eth1) node 1 in trouble
Mar 30 01:36:40 arcx3455rmnd2 kernel: LLT INFO V-14-1-10024 link 0 (eth1) node 1 active
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20036 Port a gen   14ec12 membership 01
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20037 Port a gen   14ec12   jeopardy ;1
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20036 Port d gen   14ec10 membership 01
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20037 Port d gen   14ec10   jeopardy ;1
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20036 Port f gen   14ec18 membership 01
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20037 Port f gen   14ec18   jeopardy ;1
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20036 Port h gen   14ec13 membership 01
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20037 Port h gen   14ec13   jeopardy ;1
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20036 Port v gen   14ec14 membership 01
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20037 Port v gen   14ec14   jeopardy ;1
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20036 Port w gen   14ec16 membership 01
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20037 Port w gen   14ec16   jeopardy ;1
Mar 30 01:36:40 arcx3455rmnd2 Had[7371]: VCS INFO V-16-1-10077 Received new cluster membership
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20036 Port b gen   14ec11 membership 01
Mar 30 01:36:40 arcx3455rmnd2 Had[7371]: VCS ERROR V-16-1-10111 System arcx3455vafv4 (Node '1') is in Regular and Jeopardy Memberships - Membership: 0x3, Jeopardy: 0x2
Mar 30 01:36:40 arcx3455rmnd2 kernel: GAB INFO V-15-1-20037 Port b gen   14ec11   jeopardy ;1
Mar 30 01:36:42 arcx3455rmnd2 kernel: LLT INFO V-14-1-10205 link 0 (eth1) node 1 in trouble
Mar 30 01:36:48 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 0 (eth1) node 1 inactive 8 sec (106313653)
Mar 30 01:36:49 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 0 (eth1) node 1 inactive 9 sec (106313675)
Mar 30 01:36:50 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 0 (eth1) node 1 inactive 10 sec (106313700)
Mar 30 01:36:51 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 0 (eth1) node 1 inactive 11 sec (106313723)
Mar 30 01:36:52 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 0 (eth1) node 1 inactive 12 sec (106313749)
Mar 30 01:36:53 arcx3455rmnd2 kernel: LLT INFO V-14-1-10032 link 0 (eth1) node 1 inactive 13 sec (106313773)

And no logs could be found on node B due to its readonly file system after reboot checking.

Does anyone know such problem? many thanks.
0 REPLIES 0