cancel
Showing results for 
Search instead for 
Did you mean: 

how to check vcs working properly with nbu?

16ris10
Level 6

i have a new 2 node master that i did the installation for nbu 7.5. we have vcs 5.1 for clustering. and os is rhel6. and 4 other media server running the same os.

cluster name is: nbu

node1: master01

node2: master02

[root@master02]# cat bp.conf
SERVER = nbu.domain.com
SERVER = media01.domain.com
SERVER = media02.domain.com
SERVER = media03.domain.com
SERVER = media04.domain.com
SERVER = master01.domain.com
SERVER = master02.domain.com
CLIENT_NAME = master02.domaincom
CLUSTER_NAME = nbu.domain.com
CONNECT_OPTIONS = localhost 1 0 2
USE_VXSS = PROHIBITED
VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY
EMMSERVER = nbu.domain.com
HOST_CACHE_TTL = 3600
VXDBMS_NB_DATA = /opt/VRTSnbu/db/data
KMS_DIR = /opt/VRTSnbu/kms
TELEMETRY_UPLOAD = NO

the problem is: i do not see node2 of the master listed in the nbemmcmd.

[root@master02]# ./nbemmcmd -listhosts
NBEMMCMD, Version: 7.5
The following hosts were found:
server           nbu.domain.com
cluster          nbu.domain.com
master           master01.domain.com
Command completed successfully.

the other thing i did was the installation of media servers first. i thought adding via nbemmcmd as media server would be enough or i need to reinstall since the new master server is ready now? as of now i do not see any media servers in any server's nbemmcmd.

[root@master02 bin]# ./hastatus -summary

-- SYSTEM STATE
-- System               State                Frozen

A  master01           RUNNING              0
A  master02           RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  ClusterService  master01           Y          N               OFFLINE
B  ClusterService  master02           Y          N               ONLINE
B  nbu_group       master01           Y          N               ONLINE
B  nbu_group       master02           Y          N               OFFLINE

moreover, how do i verify that this clustered enviroment is all set and ready to work? i mean i was thinking to check the failover, i can initiate a backup, and then shutdown the services on 1 node, and then see if the backups keep running. is this the only check?

do you need for information about anything? this system hasnt gone production yet, its all setting up new. and we're going to migrate the poclicy stuff after we are done.

1 ACCEPTED SOLUTION

Accepted Solutions

joseph_dangelo
Level 6
Employee Accredited
Jobs currently running to your media servers should restart once the Master server is back online. Here is an excerpt from the NBU Cluster guide for 7.5: "When a failover occurs, the backup jobs that were running are rescheduled with the normal NetBackup retry logic for a failed backup. The NetBackup services are started on another node and the backup processing resumes." Please ensure that the following file is identical on both hosts: /usr/openv/netbackup/bin/cluster/NBU_RSP A simple /opt/openv/netbackup/bin/bpps -a on the active node will report all the online services. To test the failover, simply run the following command: #> hagrp -switch nbu_group -to master02 #> hastatus Once the Service Group "nbu_group" is online on master02 then you can check the status of your pending jobs. Can you post the output from #> nbemmcmd -listhosts -verbose Keep in mind that the NBU Master services are only active on one node at a time, so my susepciion is that you will see master02 in the list once you fail over. Hope this helps. Joe D

View solution in original post

13 REPLIES 13

joseph_dangelo
Level 6
Employee Accredited
Jobs currently running to your media servers should restart once the Master server is back online. Here is an excerpt from the NBU Cluster guide for 7.5: "When a failover occurs, the backup jobs that were running are rescheduled with the normal NetBackup retry logic for a failed backup. The NetBackup services are started on another node and the backup processing resumes." Please ensure that the following file is identical on both hosts: /usr/openv/netbackup/bin/cluster/NBU_RSP A simple /opt/openv/netbackup/bin/bpps -a on the active node will report all the online services. To test the failover, simply run the following command: #> hagrp -switch nbu_group -to master02 #> hastatus Once the Service Group "nbu_group" is online on master02 then you can check the status of your pending jobs. Can you post the output from #> nbemmcmd -listhosts -verbose Keep in mind that the NBU Master services are only active on one node at a time, so my susepciion is that you will see master02 in the list once you fail over. Hope this helps. Joe D

16ris10
Level 6

i did that, and the failover failed? what does this output mean now?

[root@master01 bin]# ./hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen

A  master01           RUNNING              0
A  master02           RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  ClusterService  master01           Y          N               OFFLINE
B  ClusterService  master02           Y          N               ONLINE
B  nbu_group       master01           Y          N               ONLINE
B  nbu_group       master02           Y          N               OFFLINE|FAULTED

-- RESOURCES FAILED
-- Group           Type                 Resource             System

D  nbu_group       Mount                nbu_mount            master02
[root@master01 bin]#

16ris10
Level 6

i checked both cluster files, they both are identical. here is the verbose output.

[root@master02 admincmd]# ./nbemmcmd -listhosts -verbose
NBEMMCMD, Version: 7.5
The following hosts were found:
nbu.domain.com
MachineName = "nbu.domain.com"
FQName = "nbu.domain.com"
MachineDescription = ""
MachineNbuType = server (6)
nbu.domain.com
MachineName = ""nbu.domain.com
FQName = "nbu.domain.com"
MachineDescription = ""
MachineNbuType = cluster (5)
NetBackupVersion = 7.5.0.0 (750000)
Active Node Name = "master01.domain.com"
master01.domain.com
ClusterName = "nbu.domain.com"
MachineName = "master01.domain.com"
FQName = "master01.domain.com"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x77
MachineNbuType = master (3)
MachineState = active for tape and disk jobs (14)
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
media01.domain.com
ClusterName = ""
MachineName = "media01.domain.com"
FQName = "media01.domain.com"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x10
MachineNbuType = media (1)
MachineState = active for disk jobs (12)
MasterServerName = "nbu.domain.com"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
media02.domain.com
ClusterName = ""
MachineName = "media02.domain.com"
FQName = "media02.domain.com"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x10
MachineNbuType = media (1)
MachineState = active for disk jobs (12)
MasterServerName = "nbu.domain.com"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
media03.domain.com
ClusterName = ""
MachineName = "media03.domain.com"
FQName = "media03.domain.com"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x10
MachineNbuType = media (1)
MachineState = active for disk jobs (12)
MasterServerName = "nbu.domain.com"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
media04.domain.com
ClusterName = ""
MachineName = "media04.domain.com"
FQName = "media04.domain.com"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0
MachineNbuType = media (1)
MachineState = active for disk jobs (12)
MasterServerName = "nbu.domain.com"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
master02.domain.com
ClusterName = "nbu.domain.com"
MachineName = "master02.domain.com"
FQName = "master02.domain.com"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0
MachineNbuType = master (3)
MachineState = active for disk jobs (12)
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
Command completed successfully.

 

joseph_dangelo
Level 6
Employee Accredited

master02 now appears in the output from the nbemmcmd command.  If you look at the output of the hastatus, the nbu_mount command failed.  Dollars to donuts the mount point directory doesn't exist on master02, thus the file system cannot be mounted.

Please run the following commands:

#> hamsg Mount_A
#> tail -20 /var/VRTSvcs/log/engine_A.log

These should tell us/confirm why the Mount failed.

Check the Mount Point Attribute
#>hares -display nbu_mount -attribute MountPoint

Verify that this path exists on both nodes. Create the directory if it doesn't. You will then need to clear the resource fault.

#> hares -clear nbu_mount -sys master02
#> hares -online nbu_mount -sys master02
#> hastatus

If everything looks good then you can bring the rest of the service group online.
#> hagrp -online nbu_group -sys master02


Joe D

16ris10
Level 6
[root@master01 bin]# ./hamsg Mount_A
Wed 23 Jan 2013 12:17:36 AM UTC VCS INFO V-16-10031-20507 Mount:Mount:imf_init:successfully initialized the VxAMF Mount Module
Wed 23 Jan 2013 12:17:36 AM UTC VCS INFO V-16-2-13805 (imf_init) entry point completed with return status (0)
Thu 24 Jan 2013 03:10:05 AM UTC VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Received notification for vxamf-group nbu_mount

[root@master01 bin]# tail -20 /var/VRTSvcs/log/engine_A.log
2013/01/24 03:12:08 VCS ERROR V-16-2-13066 (master02) Agent is calling clean for resource(nbu_mount) because the resource is not up even after              online completed.
2013/01/24 03:12:09 VCS INFO V-16-2-13068 (master02) Resource(nbu_mount) - clean completed successfully.
2013/01/24 03:12:09 VCS INFO V-16-2-13071 (master02) Resource(nbu_mount): reached OnlineRetryLimit(0).
2013/01/24 03:12:09 VCS ERROR V-16-1-54031 Resource nbu_mount (Owner: Unspecified, Group: nbu_group) is FAULTED on sys master02
2013/01/24 03:12:09 VCS NOTICE V-16-1-10300 Initiating Offline of Resource nbu_ip (Owner: Unspecified, Group: nbu_group) on System master02
2013/01/24 03:12:09 VCS INFO V-16-6-15015 (master02) hatrigger:/opt/VRTSvcs/bin/triggers/resfault is not a trigger scripts directory or can no             t be executed
2013/01/24 03:12:10 VCS INFO V-16-1-10305 Resource nbu_ip (Owner: Unspecified, Group: nbu_group) is offline on master02 (VCS initiated)
2013/01/24 03:12:10 VCS ERROR V-16-1-10205 Group nbu_group is faulted on system master02
2013/01/24 03:12:10 VCS NOTICE V-16-1-10446 Group nbu_group is offline on system master02
2013/01/24 03:12:10 VCS INFO V-16-1-10493 Evaluating master01 as potential target node for group nbu_group
2013/01/24 03:12:10 VCS INFO V-16-1-10493 Evaluating master02 as potential target node for group nbu_group
2013/01/24 03:12:10 VCS INFO V-16-1-50010 Group nbu_group is online or faulted on system master02
2013/01/24 03:12:10 VCS NOTICE V-16-1-10301 Initiating Online of Resource nbu_ip (Owner: Unspecified, Group: nbu_group) on System master01
2013/01/24 03:12:10 VCS NOTICE V-16-1-10301 Initiating Online of Resource nbu_mount (Owner: Unspecified, Group: nbu_group) on System master01
2013/01/24 03:12:13 VCS INFO V-16-1-10298 Resource nbu_mount (Owner: Unspecified, Group: nbu_group) is online on master01 (VCS initiated)
2013/01/24 03:12:22 VCS INFO V-16-1-10298 Resource nbu_ip (Owner: Unspecified, Group: nbu_group) is online on master01 (VCS initiated)
2013/01/24 03:12:22 VCS NOTICE V-16-1-10301 Initiating Online of Resource nbu_server (Owner: unknown, Group: nbu_group) on System master01
2013/01/24 03:12:42 VCS INFO V-16-1-10298 Resource nbu_server (Owner: unknown, Group: nbu_group) is online on master01 (VCS initiated)
2013/01/24 03:12:42 VCS NOTICE V-16-1-10447 Group nbu_group is online on system master01
2013/01/24 03:12:42 VCS NOTICE V-16-1-10448 Group nbu_group failed over to system master01

16ris10
Level 6
[root@master01 bin]# ./hares -display nbu_mount -attribute MountPoint
#Resource    Attribute             System     Value
nbu_mount    MountPoint            global     /opt/VRTSnbu

16ris10
Level 6

which path? /opt/VRTSnbu/ ?

Let me post here what exist on both.
 

[root@master02 VRTSnbu]# pwd
/opt/VRTSnbu
[root@master02 VRTSnbu]# ls -l
total 8
drwxr-xr-x 3 root bin 4096 Jan 23 00:45 db

 
[root@master01 VRTSnbu]# pwd
/opt/VRTSnbu
[root@master01 VRTSnbu]# ls -l
total 0
drwxr-xr-x 4 root bin  96 Jan 23 00:21 db
drwxr-xr-x 2 root root 96 Jan 23 00:19 kms
drwxr-xr-x 2 root root 96 Jan 21 23:32 lost+found
drwxr-xr-x 4 root root 96 Jan 23 00:19 netbackup
drwxr-xr-x 3 root root 96 Jan 23 00:19 var
drwxr-xr-x 3 root root 96 Jan 23 00:19 volmgr

16ris10
Level 6

i am waiting for you to see the tail of the log and then i would clear (last two comands you mentioned, clear and bring it online). in the meantime i ran hastatus. and this is the output.

[root@master01 bin]# ./hastatus
attempting to connect....
attempting to connect....connected

group           resource             system               message
--------------- -------------------- -------------------- --------------------
                                     master01           RUNNING
                                     master02           RUNNING
ClusterService                       master01           OFFLINE
ClusterService                       master02           ONLINE
-------------------------------------------------------------------------
nbu_group                            master01           ONLINE
nbu_group                            master02           *FAULTED* OFFLINE
                webip                master01           OFFLINE
                webip                master02           ONLINE
                csgnic               master01           ONLINE
-------------------------------------------------------------------------
                csgnic               master02           ONLINE
                nbu_nic              master01           ONLINE
                nbu_nic              master02           ONLINE
                nbu_ip               master01           ONLINE
                nbu_ip               master02           OFFLINE
-------------------------------------------------------------------------
                nbu_mount            master01           ONLINE
                nbu_mount            master02           *FAULTED*
                nbu_server           master01           ONLINE
                nbu_server           master02           OFFLINE

16ris10
Level 6

after running this command. i have to ctrol^C to get out of it, its either taking too long for my patcience or its hanged..

16ris10
Level 6

posting the complete engine log for you..

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please tell us more about the mount resource. What type of volume management? VxVM or LVM?
Which filesystem type?

Have you ever tested if volume can be mounted (outside of VCS) at OS level on node 2?

*** EDIT ****

OK -  I can see in your NBU post that Yasuhisa managed to help you create all resources from scratch and that the Service Group is now configured correctly.
Correctly completed worksheet and resources verified at OS level is key to successful NBU clustered install....

16ris10
Level 6

thank a lot. this post was spot on to fix my issue. thanks.. marked another post of yours as solution since that was relevant to topic. :)

16ris10
Level 6

its VxVM i suppose, and not LVM for sure. no i havent tested that, neighter i know how to do it. :S.