cancel
Showing results for 
Search instead for 
Did you mean: 

Windows services not started after restarted servers of VCS cluster

Rayson_Wong
Level 3

Dear all,

We have run into a strange issue.

On last Sunday, we shutdown both Windows servers running VCS to add additional disk space to the Mail Archive disk.  We did not take the ServiceGroup offline before we shutdown the servers.

After that, when the active node was started, the server cannot be remotely accessed and take long time to login.  (iLo still work).  We hard reboot the server once, and then we login, but found that many Windows services are not started, including Terminal Service, Windows Time, Server, to name a few.  All applications (EV, Domino, Mcafee, VCS, SFW) services were not started.

I am not sure what had happened.  It is as if the server was waiting to timeout for each service.  Manual restart of the services are not successful.

Is there anyone had idea what might have happened? 

Thanks a lot,

Rayson

5 REPLIES 5

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Not taking the Service Group offline (although it's a good idea) would not cause this kind of problem. See this thread: https://www-secure.symantec.com/connect/forums/shutdown-windows-server-vcs

Seems more like a Microsoft problem rather than a VCS problem?

Have you checked Event Viewer logs yet?

Rayson_Wong
Level 3

Hi Marianne,

I checked the System Event Logs.  There is nothing special.  The server is not able to find the domain controller, but this is possibly because the Server service is down.  Other than this, nothing special recorded.

Wally_Heim
Level 6
Employee

Hi Rayson,

SFW can cause long boot times when there are issue with the disk subsystem being used.  You did make changes to your disk configuration but you mentioned other windows services having problems starting so I'm not sure that this related to the disk subsystem slow down that I mentioned.

In fact, this does not seem to be related to SFW/SFW-HA/VCS because some of the services that you mentioned are not controlled by SFW/SFW-HA/VCS.  I'm guessing that the SFW and VCS services are being affected by the same issue that is causing the Terminal Service, Windows Time, Server services to not start.

Did this happen on both servers or just one?

Just to confirm the issue that I first mentioned, Are you able to log in faster when you disable/unplug the fiber connections to the server?

Thanks,

Wally

Rayson_Wong
Level 3

Hi Wally,

The 2 nodes of cluster, EV server 1 (previously active but now failed) and EV server 2 (previously standby, but is now active to serve production), have some setup change after the issue.  Since EV server 2 is now in active and take the disk subsystem, could this cause EV server 1's SFW to failed to get the disk subsystem and therefore take long time to boot?

We thought the long boot time may be related to the services not started and waiting for timeout, but it seems that it may be 2 different cases at all.

The problem so far only observed in the EV server 1, the previously active node.  The EV server 2 can start but SFW failed to recognised the disk (later fixed by Symantec support). 

As for the login speed, I have not tried to complete the login when we unplug the fiber.  We did try to bootup and login in EV server 1 with all fiber and network cable unpluged, but after stuck in the "Applying User Setting" for more than 10 minutes, we did not wait for it to complete so I cannot compare the time used to login.

Wally_Heim
Level 6
Employee

Hi Rayson,

From an SFW perspective, werver 2 owning the disk group(s) would not affect the boot/login time for Server 1. 

SFW does perform a SCSI bus scan during the "Applying Computer Settings".  However, with the fiber unplugged there is very little to scan (no shared disks are visable) so it doesn't sound like the delays are related to SFW activities.

Have there been any other changes to the servers recently? 

Thanks,

Wally