cancel
Showing results for 
Search instead for 
Did you mean: 

Solaris 10 + SF 5.0 hang at boot issues

Assaf_Leibovitc
Level 4
HI all,

I have 2 Solars 10 11/06 nodes congifured with I/O fencing, Oracle RAC configuration.

2 issues:

1.
===================================

the 2 nodes were taken down, node A started.
It printed to console:
LLT INFO V-14-1-10009 LLT Protocol available
GAB INFO V-15-1-20021 GAB available
LMX Multiplexor available

and stopped its boot process , seems to be hang...

I've tried going into single mode state, clearing the vxfen keys with vxfenclearpre and reboot, still not good...
the vxfendg and vxfenmode are ok, the vxfencing disk group is configured and deported, disks are available.

Only after starting llt , gab and vxfen manually on one node in single user mode and returning to multi-user mode, the cluster went up ok.
after VCS was up I booted node B.

2.
=======================
Same problem after trying root disk encapsulation, after the second reboot (VxVM own reboot) the system hang...


Any ideas?

Thanks
10 REPLIES 10

neo4897
Level 2
Hi Assaf,

I am seeing similar slow/hang host reboot issues on a similar setup, but Solaris 10 6/06. Where you able to resolve this issue?

Thanks, Vinay

Assaf_Leibovitc
Level 4
Hi Vinay,
 
Sorry but I don't have a solution yet... I'm in touch with Symantec support... nothing can identify the problem yet...
 
they recommended installing maintanence pack 1 and a change in our configuration but the problem can't be identified yet.

neo4897
Level 2
Hi Assaf,

thanks for the info . 

I am running 5.0 MP1 and still seeing the host reboot  issue. What  change did symantec suggest?. What storage are using ? .. 

Thanks, Vinay

Assaf_Leibovitc
Level 4
Hi,

Nothing new yet... they say:

"The hang occurs during the vxdctl enable"

I sent core dumps of servers that hang and they check them.

one more thing, It happened more often after unencapsulating the root disk with vxunroot command, but sometimes after system boot...

I hope they'll find something.

Assaf_Leibovitc
Level 4
Hi,

This is what I got from support, tried it and it didn't help as my server hang after root disk encapsulation second reboot.

Etrack incident 1014903 identifies the deadlock situation caused by vxgms driver.
The workaround is to add the property 'ddi-no-autodetach=1' to all four drivers’ (vxglm, vxgms, llt, vcsmm) in their /kernel/drv/*.conf files.

For example –
$ cat /kernel/drv/vxglm.conf
name="vxglm" parent="pseudo" instance=0 nodes=32 ports=32;
change this to –
$ cat /kernel/drv/vxglm.conf
name="vxglm" parent="pseudo" instance=0 nodes=32 ports=32 ddi-no-autodeatch=1;

neo4897
Level 2
Thank you Assaf..

I will try this out ..

-- Vinay

Derrick_Shultz
Not applicable
I am seeing the same behavior on Solaris 10U3 and 10U4 with 5.0MP1 as well as 5.0MP1 RP3.  Reconfiguration reboot (-r) doesn't have the problem...only non-reconfiguration reboots.  I have also seen the issue with "vxdctl enable"  as is also mentioned above.

I am currently trying the fix mentioned above, but It doesn't appear to be helping at all.

ArunSym
Not applicable
Employee
Hi Assaf and Vinay,
 
I am also experiencing the same issue with Sol 10 update 5, SFRAC 5.0 MP1 and a temporary work-around seems to   boot to single user and run "devfsadm  -C v" and touch /reconfigure  and reboot the system.I am working on 2 node cluster and only one node gives this problem in reboot.
 
Hope this help until you get final resoultion from support.
 
Thanks,
 
Arun 

bsobek
Level 5
Hi Assaf,
 
do you use PowerPath or any other multipathing product?
 
Greets
Björn

Solaris_Admin
Level 3

Hi,

can you suggest me why this going in maintance mode after installin solari10 U4 with SF5.0MP1

 State: maintenance since Mon Jun 30 11:51:40 2008
Reason: Start method died on Killed (9).
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 1M init
   See: /var/svc/log/milestone-multi-user:default.log
Impact: 12 dependent services are not running:
        svc:/milestone/multi-user-server:default
        svc:/system/basicreg:default
        svc:/site/ypslave-setup:default
        svc:/system/zones:default
        svc:/site/bootnotify:default
        svc:/site/filedist-pullclient:default
        svc:/site/lb-smf-base:default
        svc:/site/opsware-register:default
        svc:/site/disco-on-demand:default
        svc:/site/netbackup-routes:default
        svc:/system/vxvm/vxvm-recover:default
        svc:/application/cde-printinfo:default
----

it's hung at certain processes

    root  3693     1   0 11:49:05 ?           0:00 /bin/sh /etc/rc2.d/S760vxpal.actionagent start

 

not getting fully run-level 3 envo.