cancel
Showing results for 
Search instead for 
Did you mean: 

Could not start VxFEN

khemmerl
Level 5

I'm following the steps in the Veritas Cluster Server Installation Guide 5.1 Solaris and am up to Chapter 7, the section Setting up disk-based I/O fencing using installvcs program.  All previous steps have completed successfully (although some required multiple attempts).  When I run ./installvcs -fencing, I get the following falure even though all the checks up to that point complete successfully.

     Starting Fencing on st31bbl01 .......................... Failed
    Could not start VxFEN on st31bbl01 

 

Checking /opt/VRTS/install/logs/installvcs-201109121140VCS/start.vxfen.st31bbl0, I find the following:

 0 11:43:33 VxFEN start failed on system st31bbl01, dumping debug information:
(sys: st31bbl01) VxFEN Logs (last 40 lines):
BEGIN VxFEN LOG {

        (sys: st31bbl01) lbolt: 25921851 vxfen_sun.c ln  145 VXFEN: vxfen_sunclose: end ret: 0
        (sys: st31bbl01) lbolt: 25922848 vxfen_sun.c ln  104 VXFEN: vxfen_sunopen: begin
        (sys: st31bbl01) lbolt: 25922848 vxfen_sun.c ln  121 VXFEN: vxfen_sunopen: end ret: 0
        (sys: st31bbl01) lbolt: 25922848 vxfen_ioctl.c ln 3304 Log Buffer: 0x70324090
        (sys: st31bbl01) lbolt: 25922848 vxfen_sun.c ln  140 VXFEN: vxfen_sunclose: begin
        (sys: st31bbl01) lbolt: 25922848 vxfen_sun.c ln  145 VXFEN: vxfen_sunclose: end ret: 0
        (sys: st31bbl01) lbolt: 25922849 vxfen_sun.c ln  104 VXFEN: vxfen_sunopen: begin
        (sys: st31bbl01) lbolt: 25922849 vxfen_sun.c ln  121 VXFEN: vxfen_sunopen: end ret: 0
        (sys: st31bbl01) lbolt: 25922849 vxfen_ioctl.c ln 1404 VXFEN: vxfen_ioc_conf: begin
        (sys: st31bbl01) lbolt: 25922849 vxfen_ioctl.c ln 1414 VXFEN: vxfen_ioc_conf: 32 bit client detected for copyin
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  143 VXFEN: vxfen_conf: beginning
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  171 VXFEN: vxfen_conf: set fence_mode: SCSI3 from: NONE
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  179 VXFEN: setting vxfen.noreboot: 0
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  191 VXFEN: Unsupported protocal version: 0
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  194 VXFEN: Defaulting vxfen.vxfen_proto_version_cur: 10
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  205 VXFEN: Setting RFSM current prootcol to 10

        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  220 VXFEN: setting vxfen.use_nodeweights: 0
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  234 VXFEN: setting vxfen.my_cluster_id: 0
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  103 VXFEN: vxfen_coord_pt_in: coord_pt_initial begin
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  848 VXFEN: vxfen_coord_disk_in: begin fence_mode: SCSI3 andtarget_list 703246d8
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  865 VXFEN: vxfen_coord_disk_in: singledisk_flag: 0 fence_mode: SCSI3 cnt: 3 uaddr: 0x3f688
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  911 VXFEN: vxfen_coord_disk_in: kernbuf->count is: 3 kernbuf->singledsk_flag is:0
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  918 VXFEN: maj_num: 310, min_num: 50 npaths: 1 serial_num: 600508B40006B5100000800001310000
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  918 VXFEN: maj_num: 310, min_num: 74 npaths: 1 serial_num: 600508B40006B5100000800001360000
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  918 VXFEN: maj_num: 310, min_num: 58 npaths: 1 serial_num: 600508B40006B51000008000013B0000
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  950 VXFEN: device_num: 50 serial_num: 600508B40006B5100000800001310000 npaths: 1dmp: 1 device_key: 13600000032
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  950 VXFEN: device_num: 74 serial_num: 600508B40006B5100000800001360000 npaths: 1dmp: 1 device_key: 1360000004a
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  950 VXFEN: device_num: 58 serial_num: 600508B40006B51000008000013B0000 npaths: 1dmp: 1 device_key: 1360000003a
        (sys: st31bbl01) lbolt: 25922849 vxfen_scsi3.c ln  959 VXFEN: vxfen_coord_disk_in: end return: 0
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  115 VXFEN: vxfen_coord_pt_in: COORD disk stat: 0
        (sys: st31bbl01) lbolt: 25922849 vxfen_init.c ln  127 VXFEN: vxfen_coord_pt_in: end ret=0
        (sys: st31bbl01) lbolt: 25922860 vxfen_sun.c ln  104 VXFEN: vxfen_sunopen: begin
        (sys: st31bbl01) lbolt: 25922860 vxfen_sun.c ln  121 VXFEN: vxfen_sunopen: end ret: 0
        (sys: st31bbl01) VXFEN trace V-11-1-32835 In file vrfsm-fsm.c on line 1035:
        (sys: st31bbl01)        group states pointer is NULL
        (sys: st31bbl01) lbolt: 25922864 vxfen_sun.c ln  104 VXFEN: vxfen_sunopen: begin
        (sys: st31bbl01) lbolt: 25922864 vxfen_sun.c ln  121 VXFEN: vxfen_sunopen: end ret: 0
        (sys: st31bbl01) lbolt: 25922864 vxfen_ioctl.c ln  963 VXFEN: vxfen_ioc_print_deblog: begin
        (sys: st31bbl01) lbolt: 25922864 vxfen_ioctl.c ln  973 VXFEN: Client is 32 Bit ..!!
        (sys: st31bbl01)
END VxFEN LOG }

 

I don't understand how every check can pass an yet fencing still fails to start.  Any suggestions greatly appreciated.

Ken

1 ACCEPTED SOLUTION

Accepted Solutions

khemmerl
Level 5

I opened up a ticket with Veritas support and we've spent about 5 hours on the phone and connected via a webex meeting where I shared my desktop with the Veritas analyst.  The good news is that the problem has been resolved.  The bad news is that I'm left with the impression that the entire cluster server installation is a house of cards where one touch can crash the entire system. 

Is there somewhere that customers can discuss their experience with SFHA?  To be honest, from what I've seen so far I have the impression that this software is not ready for prime-time use and I'm surprised that anyone would actually use this on a production, mission-critical system.

Ken

View solution in original post

3 REPLIES 3

khemmerl
Level 5

I opened up a ticket with Veritas support and we've spent about 5 hours on the phone and connected via a webex meeting where I shared my desktop with the Veritas analyst.  The good news is that the problem has been resolved.  The bad news is that I'm left with the impression that the entire cluster server installation is a house of cards where one touch can crash the entire system. 

Is there somewhere that customers can discuss their experience with SFHA?  To be honest, from what I've seen so far I have the impression that this software is not ready for prime-time use and I'm surprised that anyone would actually use this on a production, mission-critical system.

Ken

mrrout
Level 4

Ken,

What was the issue with your setup. If you may share would be of help to someone encountering similar problem.

 

Thanks.

khemmerl
Level 5

I guess the real problem is that my local vendor told me that SFHA is a much simpler solution to having a high-availability than going with Oracle RAC and that the installation could be completed within a week.  Now, after two months of effort, I still don't have HA running and I can kiss my yearly performance bonus goodbye as there's no way I'll meet my project deadline.  I have encountered bugs where Symantec has refused to see LUNs greater than 1TB in size - despite the vendors claim that this limitation had been addressed.  I've encountered a problem where the local disk group created to hold the Oracle binaries has corrupted the OS install.  And the problem I listed above is the result of a failed installation leaving some kind of files behind that prevent the new installation from working.  Overall, I've found Symantec cluster services to be very difficult to install and get working.  In fact, it is still not working as I write this.  As mentioned previously, I find it hard to believe that companies actually rely on this for their production systems.