cancel
Showing results for 
Search instead for 
Did you mean: 

/system/llt in maintenance state

Fugitive
Level 4
I have a 2 node cluster running VCS5.1. I added a new node to this cluster which was added succesfully. But the svcs -xv on the new node shows following


:/>svcs -xv
svc:/system/llt:default (Veritas Low Latency Transport (LLT) Init service)
 State: maintenance since Thu Jul 29 13:40:32 2010
Reason: Start method failed repeatedly, last exited with status 2.
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /opt/VRTSllt/man/man1m/ -s 1M lltconfig
   See: /var/svc/log/system-llt:default.log
Impact: 3 dependent services are not running:
        svc:/system/gab:default
        svc:/system/vcs:default
        svc:/system/vxfen:default


And log for it says this 


[ Jul 29 13:17:00 Method "start" exited with status 2 ]
[ Jul 29 13:40:32 Leaving maintenance because clear requested. ]
[ Jul 29 13:40:32 Enabled. ]
[ Jul 29 13:40:32 Executing start method ("/lib/svc/method/llt start") ]
silent failure
[ Jul 29 13:40:32 Method "start" exited with status 2 ]


due to which the gab memberships are not running properly .. what could be the reason for this and how can i resolve it ? 


hasys -state
#System    Attribute          Value
Node1  SysState           RUNNING
Node2  SysState           RUNNING
Node3  SysState           RUNNING



gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   2d880e membership 012
Port b gen   2d880d membership 01
Port b gen   2d880d    visible ; 2
Port h gen   2d8811 membership 012
 
 
  
1 ACCEPTED SOLUTION

Accepted Solutions

Anoop_Kumar1
Level 5

Yes,

I just found it.

If below entries are = 0 in /etc/default/llt, then it throws same errors you are recieving on your node.

LLT_START=1
LLT_STOP=1

Pleas check above in /etc/default/llt or make them =1 and then try.

~Anoop


View solution in original post

16 REPLIES 16

Gaurav_S
Moderator
Moderator
   VIP    Certified

This is strange...

Your GAB membership & HAD ports are already formed for 3rd node (you can see  0-1-2 for 3 nodes)..... Its only VxFEN which is not formed the membership...

Did you tried starting vxfen on 3rd node ? you can start it via svcadm (search for vxfen) or via legacy method:

# /etc/init.d/vxfen start

Did you configured vxfen correctly on new node ?  that is populating /etc/vxfendg file, /etc/vxfenmode file ?

If answer to above is yes & you are still facing issues... From the newly added node... can u paste following

# lltconfig
# modinfo | grep -i vxfen
# lltstat -vvn | head
# cat /etc/VRTSvcs/conf/config/main.cf | grep -i usefence
# cat /etc/vxfendg
# cat /etc/vxfenmode
# cat /etc/gabtab


Gaurav

Fugitive
Level 4
I have not configured the VxFen .. and VxFen is dependent on /system/llt due to which it will not start 



#/etc/vxfen.d>cat /etc/vxfenmode
cat: cannot open /etc/vxfenmode
#/etc/vxfen.d>cat /etc/vxfendg
cat: cannot open /etc/vxfendg
#:/etc/vxfen.d>cat /etc/VRTSvcs/conf/config/main.cf | grep -i usefence
#/etc/vxfen.d>cat /etc/gabtab
/sbin/gabconfig -c -n3
#/etc/vxfen.d>ltstat -vvn | head
-bash: ltstat: command not found
#/etc/vxfen.d>lltstat -vvn | head
LLT node information:
    Node                 State    Link  Status  Address
     0 Node1    OPEN
                                  vnet1   UP      00:14:4F:F8:04:D8
                                  vnet2   UP      00:14:4F:FA:52:BB
     1 Node2    OPEN
                                  vnet1   UP      00:14:4F:F9:C1:32
                                  vnet2   UP      00:14:4F:FA:E4:A2
   * 2 Node3    OPEN
                                  vnet1   UP      00:14:4F:F8:33:CE
#/etc/vxfen.d>modinfo | grep -i vxfen
#/etc/vxfen.d>lltconfig
LLT is running

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi Fugitve....

Outputs suggest that LLT is running.....  & lltstat -vvn output also saying that LLT is up....  Also your GAB is started  which is dependent on LLT ...... LLT error message seems to be bit misleading..
What I understand is, only vxfen is not started on this node which is making port "b" membership incorrect...



gabconfig -a

GAB Port Memberships
===============================================================
Port a gen   2d880e membership 012    <<<<<<<<<<<<<<<<<<<<<< GAB formed for 3 nodes
Port b gen   2d880d membership 01
Port b gen   2d880d    visible ; 2  <<<<<<<<<<<<<<<<<<<<<< vxfen not visible for 3rd node
Port h gen   2d8811 membership 012  <<<<<<<<<<<<<<<<<<<<<< HAD formed for 3 nodes

Can you check the same /etc/vxfendg & /etc/vxfen on other 2 running nodes & make similar settings here on new node ...
If you are not using fencing, quite possible you would be running fencing in disabled mode...

Once you have configured above two files, you will need to start vxfen as suggested in my first answer..  you can also check below on other nodes to see if fencing is running

# vxfenadm -d  



Gaurav

rregunta
Level 4
Hello,

First of all mate you say, you have 2 node cluster but I could see 3 nodes in the cluster. Node 0, 1, and 2.

So...

1.] Did you add a 3rd node to the 2 node cluster setup recently?
2.] Did you make required modifications to start vxfen on the new node?

The vxfen module seems have not started on the 3rd node, the port b missing for the node.

You could verify the same using -
# /etc/init.d/vxfen status
# vxfenadm -d

Few more things you could do -


Verify the name of the coordinator diskgroup.
# cat /etc/vxfendg
 
Verify the number of disks used in the coordinator diskgroup.
# vxfenconfig -l
 
attempt to restart vxfen driver with the command:
On Solaris 9:
# /etc/init.d/vxfen stop
# /etc/init.d/vxfen start
 
On Solaris 10:
# svcadm disable vxfen
# svcadm enable vxfen

Regards
Rajesh

Fugitive
Level 4
Hello Guys,

Did you check the o/p i posted for svcs -xv ... it says /system/llt is failing and /system/vxfen is dependent on /system/llt .. thats my real question that lltstat -n shows 
its running and state Open on 3 nodes .. and still /system/llt is in maintenance state  :( 


lltstat -vvn | head
LLT node information:
    Node                 State    Link  Status  Address
     0 Node1    OPEN
                                  vnet1   UP      00:14:4F:F8:04:D8
                                  vnet2   UP      00:14:4F:FA:52:BB
     1 Node2    OPEN
                                  vnet1   UP      00:14:4F:F9:C1:32
                                  vnet2   UP      00:14:4F:FA:E4:A2
   * 2 Node3    OPEN
                                  vnet1   UP      00:14:4F:F8:33:CE


svcs -xv

svc:/system/llt:default (Veritas Low Latency Transport (LLT) Init service)
 State: maintenance since Thu Jul 29 13:40:32 2010
Reason: Start method failed repeatedly, last exited with status 2.
   See: http://sun.com/msg/SMF-8000-K
   See: man -M /opt/VRTSllt/man/man1m/ -s 1M lltconfig
   See: /var/svc/log/system-llt:default.log
Impact: 3 dependent services are not running:
        svc:/system/gab:default
        svc:/system/vcs:default
        svc:/system/vxfen:default

Anoop_Kumar1
Level 5
Fugitive,

lets talk on SMF status on llt here, which is your main concern.

Check service llt using below

# svcs -l llt

e.g

-bash-3.00# svcs -l llt
fmri         svc:/system/llt:default
name         Veritas Low Latency Transport (LLT) Init service
enabled      true
state        online
next_state   none
state_time   Tue Jul 20 23:36:25 2010
logfile      /var/svc/log/system-llt:default.log
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   optional_all/none svc:/network/initial (online)


In above, if state = maintenance , then you check for dependency i.e filesystem/local and network/initial if they are online. If any of dependency is in maintenance/offline/fault then llt service will be in maintenance.

You can clear the maintenance on llt or dependencies using below command.

#svcadm clear filesystem/local
#svcadm clear llt

Belore applying above, you have stop other services like vcs, vxfen running on 3rd node.

I hoep it will help you.

Regards,
~Anoop

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi Fugitive..

Yes I saw the original post & LLT status & thats why commented that SMF status seems to be bit misleading since LLT is up & even dependent GAB is up, also HAD is up... so to drill down on two concerns you had...

a) Incorrect GAB port memberships (vxfen port not showing) ..... I think I have already provided you steps on how to recover that...

b) LLT service not showing online in SMF -- need to diagnose a bit..

In addition to what anoop has commented above...

-- lltstat -vvn is showing both vnet1 & vnet2 up for newly added node ? (head output might have cut it to vnet1, just want to make it sure)

-- Is it possible to stop had/vxfen/gab on newly added node & try starting LLT again with SMF, if no luck then try starting LLT manually (/etc/init.d/llt start OR lltconfig -c) & then see the status in SMF..

-- If still you won't get a luck, I may think of one more thing but would be little difficult for you, to stop all the ports from all nodes (had/vcxfen/gab) & then start components manually on all the nodes one by one.... as soon you start GAB on all the three (after starting LLT), you should see LLT & GAB up on all 3 nodes with good status.... 

Before all this, worth to check, can you post your config files here to review:

# cat /etc/llthosts
# cat /etc/llttab

Gaurav

Fugitive
Level 4
Anoop,

Thanks for the suggestn    the dependencies for llt  are already online .. u can see the following logs 




Node3:/>svcs -l llt
fmri         svc:/system/llt:default
name         Veritas Low Latency Transport (LLT) Init service
enabled      true
state        maintenance
next_state   none
state_time   Fri Jul 30 02:09:03 2010
logfile      /var/svc/log/system-llt:default.log
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   optional_all/none svc:/network/initial (online)
Node3:/>svcadm clear /system/llt
Node3:/>svcs /system/llt
STATE          STIME    FMRI
maintenance     2:13:17 svc:/system/llt:default
Node3:/>svcs -d /system/llt
STATE          STIME    FMRI
online          1:10:56 svc:/network/initial:default
online          1:12:01 svc:/system/filesystem/local:default
Node3:/>tail -5 /var/svc/log/system-llt:default.log
[ Jul 30 02:13:17 Leaving maintenance because clear requested. ]
[ Jul 30 02:13:17 Enabled. ]
[ Jul 30 02:13:17 Executing start method ("/lib/svc/method/llt start") ]
silent failure
[ Jul 30 02:13:17 Method "start" exited with status 2 ]

Anoop_Kumar1
Level 5
Fugitive,

The maintenace is not cleared, might be llt is running thats why.

Try this,

# gabconfig -U
# svcadm disable gab

# lltconfig              <----- check if llt is running
# lltconfig -U

now,

# svcadm clear llt

or

# svcadm disable llt
# svcadm enable llt

If it works, then start gab, vxfen and vcs using svcadm.

Regards,
~Anoop

Fugitive
Level 4
It didn't help  



Node3:/>hastop -local
Node3:/>gabconfig -U
Node3:/>svcadm disable gab
Node3:/>lltconfig
LLT is running
Node3:/>lltconfig -U
lltconfig: this will attempt to stop and reset LLT. Confirm (y/n)? y
Node3:/>svcadm disable llt
Node3:/>svcs llt
STATE          STIME    FMRI
disabled        2:20:35 svc:/system/llt:default
Node3:/>svcadm enable llt
Node3:/>svcs llt
STATE          STIME    FMRI
maintenance     4:37:45 svc:/system/llt:default
Node3:/>svcs llt
STATE          STIME    FMRI
maintenance     4:37:45 svc:/system/llt:default
 and log says following 


[ Jul 30 04:38:59 Executing start method ("/lib/svc/method/llt start") ]
silent failure
[ Jul 30 04:38:59 Method "start" exited with status 2 ]
 

Anoop_Kumar1
Level 5

paste me

cat /etc/default/llt

~Anoop

Anoop_Kumar1
Level 5

Yes,

I just found it.

If below entries are = 0 in /etc/default/llt, then it throws same errors you are recieving on your node.

LLT_START=1
LLT_STOP=1

Pleas check above in /etc/default/llt or make them =1 and then try.

~Anoop


Fugitive
Level 4


cat /etc/default/llt
#
# This file is sourced :
#       from /etc/init.d/llt            for Solaris < 2.10
#       from /lib/svc/method/llt        for Solaris 2.10
#
# Set the two environment variables below as follows:
#
#       1 = start or stop llt
#       0 = do not start or stop llt
#
 
LLT_START=0
LLT_STOP=0

Fugitive
Level 4
Thanx Anoop,

I didn't notice the last line .. i just re-read it again and got it working :)

Anoop_Kumar1
Level 5

I believe you installed that 3rd node as single node in cluster and tried to configure manually by changing llt and gab files after making llt private links.

However, if above is true, you could have try running installvcs script again with -addnode option to add this in multinode cluster.

Anyways, Congrats !

Regards,
~Anoop

Gaurav_S
Moderator
Moderator
   VIP    Certified

Nice pick anoop.....

Gaurav