Application resource faulting instead of Diskgroup/Mount
Environment
SFHA = 6.2
Cluster nodes = 2 (local cluster)
OS = Solaris 10 SPARC
SAN storage connectivity = Dual Fiber cables (DMP installed)
Application = Custom made application with Start/Stop and Monitor process in placed (configured the attributes)under the Java console. Complete application is copied under Mount point (Mount resource)
WE ARE ASUMING THAT CLUSTER SENCE FIRST THE APPLICATION FAILURE INSTEAD OF DISKGROUP AND EXECUTE FAILURE
Query
We are performing a test case in which we are plugging out both fiber cables for SAN connectivity. Instead of Diskgroup or Mount resources crash, we are noticing our Application resource getting crash
Hi Zahid,
If you take IMF out of the picture, then it comes doen to a timing issue as to which resource hits its next montior cycle after the cables are pulled. It would stand to reason that the application would complain or fault if the disks that it needs is removed.
Once the first resource faults, VCS will fault the group if the resource is critical or affects a critical resource. This corrective process should being the group down and not worry about the other resources that might fault.
The second system is not down in the screenshot. It is up but marked as faulted for this service group. The server icon would be grey (not running state) instead of yellow (running state) if the server was down. The backgroup color is what show the state. Red is faulted, yellow/blue is partial online, blue is online and grey is offline.
Thank you,
Wally
This is down to timeouts, what I/O is happening and as Wally says timing.
If there are no writes to the storage, then reads may come from cache so it can take a while for Volume Manager and the O/S to detect the storage is gone. When writes are performed then there are timeouts as to when the I/O is marked as failed, so it is also feasible that your application may timeout before the application.
Or it maybe that the diskgroup and mount resource monitor entrypoints hang as oppose to fail and the default is 4 timeouts before the resource fails, whereas your application resource monitor entrypoint may return "offline" rather than hang and this would be my guess as to the main reason, but it could also be simply timing as each monitor runs independently for the different resource types.
As Marianne says, check your engine log (and also agent logs) to see what VCS detects at what time.
Mike
IMF is enabled by default in 6.2. If the application is such that, the processes die immediately after storage loss, then IMF triggered the monitor (since you configured the MonitorProcesses attribute) for Application resource immediately and the resource detected fault due to process death. With IMF in picture depends on application behavior to storage loss. The storage loss would be detected little late even if IMF enabled for DiskGroup. Without IMF, the fault detection happens based on timing of the monitor entry points of individual resources.
Thanks,
Venkat