Solved: Hi all, The Trusted Advisors

tanislavm · ‎07-14-2014

Hi,

Based on llthosts information had know on what node to start an service or to failover this service(failover list).right?

If this llthosts is missing then no service will be bring online.right?

If an service group has restart limit=1 and an critical resource fails,then first had try to restart the group on this node and if he can not,then this group will fail over?

The link to convert an vcs to an cvm http://unix-essentials.blogspot.ro/2009/03/vcs-implementing-cluster-file-system.html.In this example the shared disk group is initialized.My question is if the shared group could import the data that i had in a non shared disk group.

Please how i could safely shutdown an node within CVM?I found a link http://sfdoccentral.symantec.com/sf/5.1SP1/solaris/html/vxvm_admin/ch13s04s03.htm.

"shutdown procedure of the node's cluster monitor"-who is node`s cluster monitor?and how i invoke this procedure?How i invoke clean node shutdown?

thanks so much.

mikebounds · ‎07-14-2014

llthosts is required by LLT, giving each node an id. If llthosts is missing the LLT will not start, which mean GAB will not start, which means VCS will not start, except in the case of a one node cluser where llt and gab are not required in which case you can start VCS with "hastart -onenode" without gab and llt.

The main.cf defines what node starts a service and where to failover this service (SystemList)

ServiceGroup does nodes have a restart limit attibute - it has a OnlineRetryLimit and if this is set to 1 and a critical resource fails,then first VCS tries to restart the group on this node and if he not,then this group will fail over. You can have a RestartLimit on a resource (this is the norm, rather than at group level) and whether resource is critical or not, if resource fails then VCS will try to restart the resource - if the restart fails, AND resource is critcal then group will fail over?

You can import a non shared disk group as shared by simply using "-s" flag - similiarly you can deport a shared diskgroup and then import without "-s" to make it a non shared disk group

To shutdown a node on any cluster, CVM or failover cluster:

Stop any clients accessing application and complete any work in progress on the app if applicable
Offline all VCS application service groups (using hagrp -offline or GUI)
Shutdown VCS (hastop -local)
Shutdown server

If you just shutdown using O/S shutdown command, then this may work as shutdown script calls hastop with flags to evacuate service groups, but issues could be:

Application maybe shutdown forcefully if it doesn't offline quickly enough
Sometimes service groups get stuck and so hastop does not complete and node does not shutdown
Service group may have issues starting on failover node, so you shoud make sure your service is up on failover node before shuting down node (unless shutting down all nodes)

Mike

View solution in original post

mikebounds · ‎07-14-2014

llthosts is required by LLT, giving each node an id. If llthosts is missing the LLT will not start, which mean GAB will not start, which means VCS will not start, except in the case of a one node cluser where llt and gab are not required in which case you can start VCS with "hastart -onenode" without gab and llt.

The main.cf defines what node starts a service and where to failover this service (SystemList)

ServiceGroup does nodes have a restart limit attibute - it has a OnlineRetryLimit and if this is set to 1 and a critical resource fails,then first VCS tries to restart the group on this node and if he not,then this group will fail over. You can have a RestartLimit on a resource (this is the norm, rather than at group level) and whether resource is critical or not, if resource fails then VCS will try to restart the resource - if the restart fails, AND resource is critcal then group will fail over?

You can import a non shared disk group as shared by simply using "-s" flag - similiarly you can deport a shared diskgroup and then import without "-s" to make it a non shared disk group

To shutdown a node on any cluster, CVM or failover cluster:

Stop any clients accessing application and complete any work in progress on the app if applicable
Offline all VCS application service groups (using hagrp -offline or GUI)
Shutdown VCS (hastop -local)
Shutdown server

If you just shutdown using O/S shutdown command, then this may work as shutdown script calls hastop with flags to evacuate service groups, but issues could be:

Application maybe shutdown forcefully if it doesn't offline quickly enough
Sometimes service groups get stuck and so hastop does not complete and node does not shutdown
Service group may have issues starting on failover node, so you shoud make sure your service is up on failover node before shuting down node (unless shutting down all nodes)

Mike

stinsong · ‎07-14-2014

Hi tanislavm,

1. /etc/llthosts is the file listing VCS LLT nodes. It's used for local node knowing all nodes in the cluster and assign node ID accordingly. It's needed for LLT service startup. But LLT service is not a service would failover. So the /etc/llthosts file is not about failover.

If the file was missing, LLT service will not be able to startup which means VCS could not startup. So there is no service will be able to startup or available.

2. About the RestartLimit, your understanding is basically right, but Mike gives the detail explaination.

3. Importing a non-share DG to share DG is possible and use the options of -s by vxdg import.

But share DG has more restricted conditions than non-share DG. In CVM, you need to make sure all nodes in CVM could access all disks of the share DG which means all nodes would see all disks of the DG even if you do not plan to import them on that node. And you need to make sure CVM master could communicate with all other nodes when import share DG. And for this, the node should be already joined CVM cluster which mean cvm service has been online.

If above conditions were not complied, share import action will not succeed.

4. " the node's cluster monitor" means the VCS daemon "had" and its cooperators.

The clean shutdown is generally like what Mike said. The idea is to make sure all application ServiceGroup and VCS services are stopped properly without error or force. So follow the sequence stoping all ServiceGroups, then VCS, then OS and make sure solved all the errors and needs in such steps.

tanislavm · ‎07-14-2014

Hi,

So if on an cvm node the vxconfigd is missing then this node will not see the shared disk group.right?

In this case the application running on this node will hang.right?

If i shutdown and power off all the nodes in cvm,i should power on first the last node i shutdown?

If all the nodes crash how i handle this situation?

Gaurav_S · ‎07-14-2014

So if on an cvm node the vxconfigd is missing then this node will not see the shared disk group.right?

>> Right, CVM nodes will run vxconfigd in master/slave mode. Every node must have vxconfigd running for successful operation of vxvm.

In this case the application running on this node will hang.right?

>> If the shared diskgroups goes missing, ofcourse the filesystems on that node will not be accessible & hence application may fault/hang depending on application behaviour.

If i shutdown and power off all the nodes in cvm,i should power on first the last node i shutdown?

>> If you want to keep the CVM master as master, you should power on that node first. If it doesn't matter for configuration, which ever node you start first will become CVM master. You can check which node is master by running "vxdctl -c mode" command.

If all the nodes crash how i handle this situation?

>> Above answer should explain this.

G

stinsong · ‎07-15-2014

Hi tanislavm,

So if on an cvm node the vxconfigd is missing then this node will not see the shared disk group.right?

>>>Yes.

In this case the application running on this node will hang.right?

>>>Yes.

If i shutdown and power off all the nodes in cvm,i should power on first the last node i shutdown?

>>> It does not matter the sequence of power on. The first node start vxconfigd and join cluster will become CVM master, that's all.

If all the nodes crash how i handle this situation?

>>> When the first node startup and join cluster, it becomes CVM master and will check and run FileSystem log to recover data operations when node crash. This will take time depending on how many dirty log need to run. So there maybe share DG or CFS mount resource online timeout and failed during VCS online.

This could be complecated and maybe risk of data loss. Because it's different situations about how and what the sequence of all nodes crashed. Because all nodes could not crash at the same time, right ? So the CVM master on the moment any node crash will start to run the FileSystem log to recover data consistency. But if the CVM master crash, it will take time to elect new CVM master. And if all nodes went to crash one by one, there maybe no chance to complete any FileSystem check and data structure could be demaged.

But under most circumstance, this will not happen. How terrible all your nodes would crash, right ?

tanislavm · ‎07-15-2014

hi,

so if all nodes crash,i will start first the last node who crash?

if an slave node crash,as you written the cvm master will run filesystem log to recover data,i mean the transaction uncommited.right?But also there could be the case that that data could not be recovered.right?

Anyway in this case no plex will goes on Stale status or dettaced status.

If the master CVM will crash than there will be the case that plexes go into stale or dettached status.right?

So if all the nodes crash i start the nodes and only when the number of nodes will reach the number in gabconfig of nodes,then the cluster will be formed and the first node who will accuire the DG will be elected the CVM master.right?

thanks so much.

mikebounds · ‎07-15-2014

Regarding vxconfigd:

vxconfigd is involved in volume manager operations, NOT I/O, so if vxconfigd is missing, then you will not be able to import shared or non-shared diskgroups, so your application won't hang - it wll not be able to start as you won't be able to import the diskgroup. If vxconfigd were to die/hang or go missing after diskgroup was imported and application was running, then application would still run as I/O would continue, but you won't be able to deport diskgroup, and also things like create and extend volumes and also get information about volumes which means CVM/Vxvm agents would hang or fault, but the clean would fail, so actually your application would still continue to run.

Mike

stinsong · ‎07-15-2014

Basically yes. And I feel that you understand it so you said to power on the last crash node firstly. But actually the Filesystem log is located on disk. So whichever the CVM master is, it will check and run the log from disk.

stinsong · ‎07-15-2014

Actualy I had the same idea with you. But I'm not sure if this is the same as CFS because vxconfigd is in charge of communication with CVM master and other nodes. Once vxconfigd down, I don't know about how could operation lock mechanism work properly.

tanislavm · ‎07-15-2014

hi,

The DG has the key written of the last node who crash,so if i start any other node the DG can not be imported.right?

The I/O are performed by vxiod and the lock manager is working with vxiod.right?

The lock manager also work with vxconfigd,but if vxconfigd are down on all nodes then no node will see the DG,so no point of lock manager to works.right?

vxiod will does not knows to where to write.right?

Gaurav_S · ‎07-15-2014

First thing, the question is extending into a different direction .. I suggest you open new threads with precise questions so that understanding of issue would be better & solution could be precise as well. in these lenghty discussion the context of problem is changing which is driving away from actual solution.

Now coming to answers:

The DG has the key written of the last node who crash,so if i start any other node the DG can not be imported.right?

>> what key are we talking ? IOFencing ? if all nodes shutdown yes VCS may have reservation keys from last node but once that node shuts down as well, no keys should exist. so which ever node you start first should be able to import the diskgroup.

The I/O are performed by vxiod and the lock manager is working with vxiod.right?

>> which lock manager are talking here ? node lock, GLM ? both have different use cases. Yes I/O is performed by vxiod but locking happens as CFS layer, two different things

The lock manager also work with vxconfigd,but if vxconfigd are down on all nodes then no node will see the DG,so no point of lock manager to works.right?

>> conceptually right

vxiod will does not knows to where to write.right?

>> again context changed, vxiod would know where to right normally, if vxconfigd is not working, yes vxiod will not find the raw device/block device to make IO

G

tanislavm · ‎07-15-2014

Hi Gaurav,

The context was with all node crash.so the last node will write its key on DG before he will crash,the rest of nede has crashed already.

We are talking here about cvm so about lock manager.So each I/O is executed using vxiod and lock manager also is involved.right?

no context changed everything was in context of CVM.why you all the time write it and point to me if you do not read all the dscussion?

tanislavm · ‎07-15-2014

hi,

Please could somebody answer nicely to my questions?

Gaurav_S · ‎07-15-2014

Hi Tanislavm,

I did read the discussion & post that only I have posted my comment.

I have been answering to your posts for quite sometime & I had noticed a pattern for which I wrote to you. We all answer to posts so that these are reusable answers & someone will get help out of these answers. I can keep answering toyou however as TAs of community it is our reponsibility to follow right practices & create reusable content.

If you look at your original post

1st part -

Based on llthosts information had know on what node to start an service or to failover this service(failover list).right?

If this llthosts is missing then no service will be bring online.right?

>>> This is a question related to LLT, nothing to do with CVM

If an service group has restart limit=1 and an critical resource fails,then first had try to restart the group on this node and if he can not,then this group will fail over?

>>> This is related to group attribute, separate topic, nothing to do with CVM

The link to convert an vcs to an cvm http://unix-essentials.blogspot.ro/2009/03/vcs-implementing-cluster-file-system.html.In this example the shared disk group is initialized.My question is if the shared group could import the data that i had in a non shared disk group.

>> Related to CVM, question was answered in very response post by Mike

Please how i could safely shutdown an node within CVM?I found a link http://sfdoccentral.symantec.com/sf/5.1SP1/solaris/html/vxvm_admin/ch13s04s03.htm.

"shutdown procedure of the node's cluster monitor"-who is node`s cluster monitor?and how i invoke this procedure?How i invoke clean node shutdown?

>> Related to CVM, again Mike provided the answer in first response.

In such case, all I am saying is to close the discussion there, open a new thread with fresh question so that knowledge base is clean with unique solution.

I hope you would agree to above ?

G

tanislavm · ‎07-15-2014

Hi,

Thanks so much.Anyway i prefer to ask my questions all together and not split on some classification.But if you feel that my questions should be grouped on classes fine i will do it.I prefer to clarify things i wish to find out more,even they are miscelaneous.

So i will try to group on classes which is not my goal.

thanks so much.

Marianne · ‎07-15-2014

Please could somebody answer nicely to my questions?

You seem to have LOTS of unrelated questions all thrown into a single forum discussion.

These totally unrelated questions should even be asked in different forums - llthosts, vcs, cvm belong in Cluster forum.

vxconfigd is related to Storage Foundation and has nothing to do with vcs and the rest of your questions.

Everything just proofs that you do not understand the very basics of Storage Foundation (VxVM and VxFS), VCS and CFS.

I can see that you are reading up, but not in any sort of order. All of these bits and pieces seem to confuse you even more.

I have previously suggested that you find a Symantec Training location in your area.

Start with the 5-day Storage Foundation training course.
This will lay the neccessary foundation to understand the underlying concepts of VxVM (Volume Manager) and VxFS (Veritas File System).

Next will be the Veritas Cluster training course where you will learn about the building blocks of a cluster and about parallel and failover service groups and how vcs monitors resources and how it deals with critical and non-critical resource faults.

Only when you have a proper understanding of all of the above, book yourself on a cvm/cfs training course.

I feel that in the past couple of weeks with all these unrelated questions, we have not managed to teach you anything...

Handy NetBackup Links

mikebounds · ‎07-15-2014

Hi Tanislavm,

Just to add, you should try to make the title the question and the question should not be too generic - like "how does cvm work". It would be ok to ask how a particular feature works, but not a whole product, so you could have a forum for example with title "How do you shutdown CVM cleanly".

As Gaurav has said, this forum is meant to be also used as a lookup for information, so if someone wants to know how to shutdown CVM cleanly, then if everyone titled their questions about cvm, just "cvm" and then in each of the discussions there were several other questions, then the information would be very hard to find.

Thanks for your understanding as I see you have already said "if you feel that my questions should be grouped on classes fine i will do it".

Mike

tanislavm · ‎07-15-2014

Hi Mike,

Thanks for your kind way to tell the things.I am aware this is a forum,so i will try.

Kimberley · ‎07-16-2014

Hi all,

The Trusted Advisors bring up good points here, in the spirit of keeping the forums to one question/solution, which benefits the whole community. @tanislavm, I certainly understand you following your train of thought on the forum, even if it jumps questions, as it's a very natural way of thinking and processing information. The issue is that, in a forum environment, it becomes difficult for others who may be having the same issue to 1. Know that the issue has already been discussed 2. That there may be a resolution that the group has come to, and a solution marked.

So there are things that we ask people do in the forums, in the way that they post and answer, that is for the good of the whole community, including those coming after us looking for a solution to their issue. It's more of all of us working together on a best practice, rather than a right/wrong way of doing things. By structuring your content in that way, it benefits all, as the Advisors were mentioning.

We will be working on a Connect best practices document that describes many things that people can be doing in their forum posts to benefit the larger community going forward. I'll share it in the forums here so all can review.

Thanks to all of you for your assistance and cooperation with this effort.

Best,

Kimberley

VOX

cvm