Forum Discussion

joaotelles's avatar
joaotelles
Level 4
11 years ago

VCS - Error with the MultiNIC resource

Hi,

Im got this error  when I restarted the a box and it entered in the cluster:

2014/01/28 21:26:37 VCS ERROR V-16-10001-6505 MultiNICB:MultiNICB_Pub:monitor:The mpathd process (/usr/lib/inet/in.mpathd) does not exist
2014/01/28 21:26:37 VCS WARNING V-16-10001-6506 MultiNICB:MultiNICB_Pub:monitor:Will try to restart mpathd with (/usr/lib/inet/in.mpathd)

Just want to check if this error is relevant and what would be the cause for it.

This node in question is in the cluster but currently no SG is running in it.. only a parallel "nic" SG that has the Multinic appls.

In the mainc.cf looks like this:

group nic (
        SystemList = { DP-node4 = 0, DP-node5 = 1, DP-node6 = 2, DP-node8 = 3,
                 dp-node9 = 4 }
        Parallel = 1
        )

        MultiNICB MultiNICB_Pub (
                UseMpathd = 1
                ConfigCheck = 0
                Device @DP-node4 = { nxge0 = 0, nxge4 = 1 }
                Device @DP-node5 = { nxge0 = 0, nxge4 = 1 }
                Device @DP-node6 = { nxge0 = 0, nxge4 = 1 }
                Device @DP-node8 = { nxge0 = 0, bge0 = 0 }
                Device @dp-node9 = { igb0 = 0 }
                IgnoreLinkStatus = 0
                NetworkTimeout = 300
                GroupName = Public_Network
                )

        Phantom nic_phantom (
                Critical = 0
                )
 

Currently its Online at the node with the error (9).

Any suggestion?

Tks,

Joao

 

 

 

  • Are you seeing that error repeatedly?

    In my opinion, that error must have occurred only once because in.mpathd daemon was not running earlier. The MultiNICB agent detected it and tried to restart the daemon as UseMpathd attribute was set to 1.

    Ideally, that should not cause any problem while bringing the corresponding IPMultiNICB resource (in this case, mdm_PubIP) online.

7 Replies

  • Does mpathd exist at /usr/lib/inet/in.mpathd or is it already running with a different path name.

    If so you can use "MpathdCommand" attribute on MultiNICB_Pub resource to set the correct path.

    Mike 

  • It is running and its i the path:

    # ls -la /usr/lib/inet/in.mpathd
    -r-xr-xr-x   1 root     bin        87832 Nov 23  2010 /usr/lib/inet/in.mpathd
    # ps -ef | grep in.mpathd
        root 11088 11069   0 10:31:18 pts/1       0:00 grep in.mpathd
        root  1865     1   0 21:26:38 ?           0:01 /usr/lib/inet/in.mpathd

    In the types.cf I have this:

    str MpathdCommand = "/usr/lib/inet/in.mpathd"
     

    I dont have it in the main.cf..

    In another node of the cluster where I dont have this error, I have the mpath started like this:

    # ps -ef | grep in.mpathd
        root   488     1   0   Jun 05 ?          88:46 /usr/lib/inet/in.mpathd -a
     

    What this -a means?

     

  • Not sure what "-a" means, but the process should look the same on each node, and as Solaris starts mpathd, this would suggest the 2 nodes are configured differently so you should try to find this difference (have a look at /etc/default/mpathd as a starting point)

    The VCS bundled agents guide gives some checks:

    Checklist to ensure the proper operation of MultiNICB
    For the MultiNICB agent to function properly, you must satisfy each item in the
    following list:
    ■ Each interface must have a unique MAC address.
    ■ A MultiNICB resource controls all the interfaces on one IP subnet.
    ■ At boot time, you must configure and connect all the interfaces that are under
    the MultiNICB resource and give them base IP addresses.
    ■ All base IP addresses for the MultiNICB resource must belong to the same
    subnet as the virtual IP address.
    ■ Reserve the base IP addresses, which the agent uses to test the link status, for
    use by the agent. These IP addresses do not get failed over.
    ■ The IgnoreLinkStatus attribute is set to 1 (default) when using trunked
    interfaces.
    ■ If you specify the NetworkHosts attribute, then that host must be on the same
    subnet as the base IP addresses for the MultiNICB resource.
    ■ Test IP addresses have "nofailover" and "deprecated" flags set at boot time.
    ■ /etc/default/mpathd has TRACK_INTERFACES_ONLY_WITH_GROUPS=yes.
    ■ If you are not using Solaris in.mpathd, all MultiNICB resources on the system
    have the UseMpathd attribute set to 0 (default). You cannot run in.mpathd on
    this system.
    ■ If you are using Solaris in.mpathd, all MultiNICB resources on the system have
    the UseMpathd attribute set to 1.

    Mike

  • Can I kill the process and start it with the -a again as a Workaround?

    # ps -ef | grep in.mpathd
        root 11088 11069   0 10:31:18 pts/1       0:00 grep in.mpathd
        root  1865     1   0 21:26:38 ?           0:01 /usr/lib/inet/in.mpathd

    Something like kill -9 1865

    And then:

    /usr/lib/inet/in.mpathd -a

  • Hi,

    yes you can do that .. just to be on safe side I would suggest to freeze service groups & run that

    -a is switch used with in.mpathd commonly .. many of symantec article has that  .. e.g

    http://www.symantec.com/docs/TECH171008

    http://www.symantec.com/docs/TECH137947

     

    G

  • I tried but no luck:

    root@dp-node9 # ps -ef | grep mpath
        root  4115  2887   0 14:31:14 pts/2       0:00 grep mpath
        root  1914     1   0 13:50:10 ?           0:00 /usr/lib/inet/in.mpathd
    root@dp-node9 #
    root@dp-node9 #
    root@dp-node9 #
    root@dp-node9 # kill -9 1914
    root@dp-node9 # /usr/lib/inet/in.mpathd -a
    root@dp-node9 # ps -ef | grep mpath
        root  4125  2887   0 14:31:31 pts/2       0:00 grep mpath
        root  4124     1   0 14:31:29 ?           0:00 /usr/lib/inet/in.mpathd
     

    =====

    Do you think this can cause problems when for example starting this resource?

            IPMultiNICB mdm_PubIP (
                    BaseResName = MultiNICB_Pub
                    Address = "10.129.68.23"
                    NetMask = "255.255.255.192"
                    )
     

     


     

     

  • Are you seeing that error repeatedly?

    In my opinion, that error must have occurred only once because in.mpathd daemon was not running earlier. The MultiNICB agent detected it and tried to restart the daemon as UseMpathd attribute was set to 1.

    Ideally, that should not cause any problem while bringing the corresponding IPMultiNICB resource (in this case, mdm_PubIP) online.