10-03-2020 04:54 AM
Hello,
I have a robot (TLD0) say no in "Enable tab", in activity monitor in Drive tab, the Drives ara in Scan mode, i put 2 media in this library that i need to restore, however, after 2 hours the job keep saying mount xxxx media.
I try:
- reset demons for the robot
- use wizard to add (i dont delete the robot), just overwriting
- reboot the library
[root@lprpmedia01 bin]# ./robtest
Configured robots with local control supporting test utilities:
TLD(0) robotic path = /dev/sg133
Robot Selection
---------------
1) TLD 0
2) none/quit
Enter choice: 1
Robot selected: TLD(0) robotic path = /dev/sg133
Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -rn 0 -r /dev/sg133
Opening /dev/sg133
MODE_SENSE complete
Enter tld commands (? returns help information)
Can you help me.
Regards
Solved! Go to Solution.
10-08-2020 08:09 AM
SCAN host is normal in a shared environment where one media server controls the drive assignment.
Although the drives are not shared with any other media server(s), the drives were configured as SHARED.
I would do the following:
Delete ALL devices configured for lprpmedia01. If you do it from the master sever GUI, it will prompt you for a restart of Device Management services. Answer Yes.
Else, run nbemmcmd -deletealldevices -machinename <media-server-name> -machinetype media
Run the device config wizard, select lprpmedia01 (or lprpmedia01.derco.cl) and complete the wizard.
When done, you will need to Inventory the robot as all tapes are moved to standalone when you delete the robot.
Next, check the Storage Unit config for this media server. Double-check hostname and attributes.
When done, reset the Resource Broker for this media server:
nbrbutil -resetMediaServer <media-server-name>
Test a backup or restore for this media server.
10-04-2020 10:02 AM
Hmm
Did you run robot inventory - in order to tell NBU new tapes did appear in the library? To do so run this command:
vmupdate -rt tld -rn 0 -empty_map
Or you can use equivallent from GUI - right click on media/robot inventory...
Once done your restore should proceed further...
10-05-2020 02:32 AM
What was the purpose of running robtest?
Did you perhaps check drive and slot status?
Please remember to quit out of robtest before you do anything in NBU - robtest holds control of the robot and will prevent NBU from managing the robot.
Please tell us more about the robot and drives on media server lprpmedia01.
Is it in every day use for backups?
Are the drives shared with other media servers?
Is comms fine between the master and this media server?
If you run 'vmoprcmd -d' on the media server, does the output correspond with Device Monitor?
Have you checked Device Monitor (not Devices tab in Avtivity Monitor) for pending mount request?
10-05-2020 07:47 AM
Hello,
Yes i run invetory and update inventory and then run fine.
--------------------------------------------------------------------
05-10-2020 11:47:20
Robot: TLD(0) on lprpmedia01.derco.cl
Operation: Inventory
--------------------------------------------------------------------
Robot Contents
Slot Tape Barcode
==== ==== ============
1 Yes DE0036L6
2 Yes DER375L6
3 Yes DE0067L6
4 Yes DER227L6
5 Yes DER342L6
6 Yes DER152L6
7 No
8 Yes DE0053L6
9 Yes DER052L6
10 No
11 Yes DER230L6
12 Yes DER164L6
13 Yes DER305L6
14 No
15 Yes DER527L6
16 Yes DE0031L6
17 No
18 No
19 Yes DE0028L6
20 Yes DE0030L6
21 Yes DER196L6
22 No
23 No
24 No
25 No
26 No
27 No
28 Yes DER113L6
29 Yes DE0063L6
30 Yes DE00556
31 Yes DER314L6
32 Yes DER204L6
33 Yes DE0027L6
34 Yes DER438L6
35 Yes DER017L6
36 No
37 Yes CLN069CU
38 No
39 No
40 No
41 No
42 No
43 No
44 No
45 No
--------------------------------------------------------------------
Regards
10-05-2020 08:05 AM
What was the purpose of running robtest? Test for know the robot is ok
Did you perhaps check drive and slot status? the web show the robot is ok, attached image
Please remember to quit out of robtest before you do anything in NBU - robtest holds control of the robot and will prevent NBU from managing the robot., Oh ok, thanks for this
Please tell us more about the robot and drives on media server lprpmedia01.
Is it in every day use for backups?, Yes
Are the drives shared with other media servers? No
Is comms fine between the master and this media server? Yes
If you run 'vmoprcmd -d' on the media server, does the output correspond with Device Monitor?
[root@lprpmedia01 bin]# ./vmoprcmd -d
Could not connect to vmd on host lprpmedia01.derco.cl (70)
But, in the GUI show are running.
Have you checked Device Monitor (not Devices tab in Avtivity Monitor) for pending mount request?
I dont see pending.
10-05-2020 09:34 AM
[root@lprpmedia01 bin]# ./vmoprcmd -d
Could not connect to vmd on host lprpmedia01.derco.cl (70)
That's why robtest works but NBU is having issues. Run bpps -x to see what NBU processes are running on the media server.
In your initial post, you said you rebooted the library, but didn't say anything about restarting NBU services. Run bp.kill_all on the media server, wait for everything to be terminated, then restart them with bp.start_all. Make sure that the vmd and ltid processes start properly, then check if the drive status has changed.
If you're still experiencing issues, stop NBU services on the media (bp.kill_all or netbackup stop), run bpps -x to make sure everything was terminated, reboot the library, then restart NBU services on the media server.
10-05-2020 11:04 AM
[root@lprpmedia01 bin]# ./bp.kill_all
No NB/MM daemons appear to be running.
[root@lprpmedia01 bin]# ./bpps -x
NB Processes
------------
MM Processes
------------
Shared Veritas Processes
-------------------------
root 20884 1 0 Feb14 ? 00:26:25 /opt/VRTSpbx/bin/pbx_exchange
[root@lprpmedia01 bin]# ./bp.start_all
Starting vnetd...
Starting NB_dbsrv...
Starting nbatd...
Starting nbazd...
Starting nbaudit...
Starting nbwmc...
Starting bpcd...
Starting nbftclnt...
Starting nbdisco...
Starting nbevtmgr...
Starting spad...
Starting spoold...
Starting nbemm...
Starting nbrb...
Starting ltid...
Starting bprd...
Starting bpcompatd...
Starting nbjm...
Starting nbpem...
Starting nbstserv...
Starting nbrmms...
Starting nbkms...
Starting nbsl...
Starting nbim...
Starting nbars...
Starting bmrd...
Starting nbvault...
Starting nbcssc...
Starting nbsvcmon...
Starting bmrbd...
[root@lprpmedia01 bin]# ./bpps -x
NB Processes
------------
root 122180 1 0 14:58 ? 00:00:00 /usr/openv/netbackup/bin/vnetd -proxy inbound_proxy -number 0
root 122181 1 0 14:58 ? 00:00:00 /usr/openv/netbackup/bin/vnetd -proxy http_tunnel -number 0
root 122182 1 0 14:58 ? 00:00:00 /usr/openv/netbackup/bin/vnetd -proxy outbound_proxy -number 0
root 122288 1 0 14:58 ? 00:00:00 /usr/openv/netbackup/bin/vnetd -standa lone
root 122417 1 0 14:58 ? 00:00:00 /usr/openv/netbackup/bin/bpcd -standal one
root 122503 1 0 14:58 ? 00:00:00 /usr/openv/netbackup/bin/nbdisco
root 122800 1 0 14:58 ? 00:00:00 /usr/openv/netbackup/bin/nbrmms
root 122887 1 0 14:58 ? 00:00:00 /usr/openv/netbackup/bin/nbsl
root 123119 1 0 14:58 ? 00:00:01 /usr/openv/netbackup/bin/nbsvcmon
root 123309 122800 0 14:59 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/bpst sinfo -DPSPROXY
MM Processes
------------
root 122739 1 0 14:58 pts/2 00:00:00 /usr/openv/volmgr/bin/ltid
root 122747 1 0 14:58 pts/2 00:00:00 vmd
root 123244 122747 0 14:58 pts/2 00:00:00 rdevmi -sockfd 13 -p 50 -r
root 123245 122739 0 14:58 pts/2 00:00:00 tldd
root 123259 122739 0 14:58 pts/2 00:00:00 avrd
root 123275 1 0 14:58 pts/2 00:00:00 tldcd
Shared Veritas Processes
-------------------------
root 20884 1 0 Feb14 ? 00:26:25 /opt/VRTSpbx/bin/pbx_exchange
[root@lprpmedia01 bin]# cd /usr/openv/volmgr/bin/
[root@lprpmedia01 bin]# ./vmoprcmd -d
Could not connect to vmd on host lprpmedia01.derco.cl (70)
kepp the same problem.
Regards
10-05-2020 11:18 AM
10-06-2020 12:03 AM
I agree with @jnardello - it looks like the media server cannot resolve its own FQDN.
Which hostname are you using for this media server in bp.conf on the media server as well as on the master server?
Shortname or FQDN?
If you run 'nbemmcmd -listhosts -verbose' on the master server, what do you see for this media server?
10-06-2020 04:57 AM - edited 10-06-2020 04:58 AM
Thanks for your reply.
./nbemmcmd -listhosts -verbose
lprpmedia01.derco.cl
ClusterName = ""
MachineName = "lprpmedia01.derco.cl"
FQName = "lprpmedia01.derco.cl"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x67
MachineNbuType = media (1)
MachineState = active for tape and disk jobs (14)
MasterServerName = "lprvnetbkp01.resp.derco.cl"
NetBackupVersion = 8.1.1.0 (811000)
OperatingSystem = linux (16)
ScanAbility = 5
lprpmedia01.resp.derco.cl
MachineName = "lprpmedia01.resp.derco.cl"
FQName = "lprpmedia01.resp.derco.cl"
MachineDescription = "PureDisk"
MachineFlags = 0x2
MachineNbuType = ndmp (2) (storage_server)
[root@lprpmedia01 ~]# hostname --fqdn
lprpmedia01.resp.derco.cl
bp.conf master server
SERVER = lprpmedia01.resp.derco.cl
SERVER = lprpmedia01.derco.cl
MEDIA_SERVER = lprpmedia01.resp.derco.cl
MEDIA_SERVER = lprpmedia01.derco.cl
Regards
10-06-2020 06:14 AM
We still don't know why the media server cannot connect to vmd running locally :
[root@lprpmedia01 bin]# ./vmoprcmd -d
Could not connect to vmd on host lprpmedia01.derco.cl (70)
Please check bp.conf on the media server as well as /etc/hosts.
Also not sure why the master server has 2 different configs for 2 different FQDNs for the same media server?
lprpmedia01.derco.cl
lprpmedia01.resp.derco.cl
10-06-2020 06:27 AM
Can you provide the bp.conf for the media and master server as well as the /etc/hosts files? From the ouput you've provided, it looks like there are definitely some hostname mismatches that are causing you to have issues.
If you have two interfaces on the server (only reason I can see to have .resp.derco.cl and .derco.cl), you need to make sure that your master and media server both resolve to the correct interface.
On your bp.conf files (master and media), the first SERVER = entry should be the master server's name.
On your media server bp.conf file, make sure that the CLIENT_NAME = field is correct (.resp.derco.cl or .derco.cl). To check this, run bpclntcmd -pn from the media server. This will show you what name the master server identifies the media server as. Then, run bpclntcmd -self from the media server to see what it identfies itself as. Odds are you will immediately see a discrepancy.
10-06-2020 07:50 AM
Bp.conf media01
[root@lprpmedia01 ~]# cat /usr/openv/netbackup/bp.conf
SERVER = lprvnetbkp01.resp.derco.cl
SERVER = lprpmedia01.resp.derco.cl
SERVER = lprpmedia01.derco.cl
SERVER = WCLQASQL12KTEST.derco.cl
CLIENT_NAME = lprpmedia01.resp.derco.cl
CONNECT_OPTIONS = localhost 1 0 2
USE_VXSS = PROHIBITED
EMMSERVER = lprvnetbkp01.resp.derco.cl
HOST_CACHE_TTL = 3600
CLI_GA_RET_LOGS_DURATION = 0
TELEMETRY_UPLOAD = YES
/etc/hosts media01
10.0.0.50 lprpmedia01.resp.derco.cl lprpmedia01
10.0.0.59 lprvnetbkp01.resp.derco.cl lprvnetbkp01
10.0.0.3 lprpaplnetbk.resp.derco.cl lprpaplnetbk
Bp.conf master server lprvnetbkp01
[root@lprvnetbkp01 ~]# cat /usr/openv/netbackup/bp.conf
SERVER = lprvnetbkp01.resp.derco.cl
SERVER = lprpaplnetbk.resp.derco.cl
SERVER = lprpmedia01.resp.derco.cl
SERVER = lprpmedia01.derco.cl
SERVER = lprpnetbkpbo01.imcruz.org
SERVER = lprvnetbkppe01.derco.local
SERVER = lprpmedialevel01.resp.derco.cl
SERVER = WCLQASQL12KTEST.derco.cl
#SERVER = lprpaplnetbk.resp.derco.cl
CLIENT_NAME = lprvnetbkp01.resp.derco.cl
CONNECT_OPTIONS = localhost 1 0 2
CONNECT_OPTIONS = lprvsappibd01.resp.derco.cl 0 1 2
USE_VXSS = PROHIBITED
EMMSERVER = lprvnetbkp01.resp.derco.cl
HOST_CACHE_TTL = 3600
VXDBMS_NB_DATA = /usr/openv/db/data
MEDIA_SERVER = lprpaplnetbk.resp.derco.cl
MEDIA_SERVER = lprpmedia01.resp.derco.cl
MEDIA_SERVER = lprpmedialevel01.resp.derco.cl
MEDIA_SERVER = 10.0.0.25
MEDIA_SERVER = lprpmedia01.derco.cl
JOB_PRIORITY = 0 0 90000 90000 90000 90000 85000 85000 80000 80000 80000 80000 75000 75000 70000 70000 50000 50000 45000 0 0 0 0 0
GUI_ACCOUNT_LOCKOUT_DURATION = 1
VERBOSE = -2
ALLOW_MEDIA_OVERWRITE = ANSI
CLIENT_CONNECT_TIMEOUT = 600
CLIENT_READ_TIMEOUT = 600
KEEP_VAULT_SESSIONS_DAYS = 5
OPS_CENTER_SERVER_NAME = 12kprvops01.derco.cl
OPS_CENTER_SERVER_NAME = 10.0.0.49
USEMAIL
TRUSTED_MASTER = lprpnetbkpbo01.imcruz.org
TRUSTED_MASTER = lprvnetbkppe01.derco.local
KEEP_LOGS_SIZE_GB = 5
ENABLE_CRITICAL_PROCESS_LOGGING = YES
VM_PROXY_SERVER = 10.56.0.22
VM_PROXY_SERVER = 10.55.134.27
VM_PROXY_SERVER = 10.56.0.45
VM_PROXY_SERVER = 10.56.0.8
SPS_REDIRECT_ALLOWED = dcqlc02exc02.derco.cl dcqlc02exc06.derco.cl
GRANULAR_DUP_RECURSION = 0
INCOMPLETE_JOB_CLEAN_INTERVAL = 1
INCOMPLETE_BKUP_JOB_CLEAN_INTERVAL = 2
SERVER_SENDS_MAIL = YES
CLI_GA_RET_LOGS_DURATION = 0
TELEMETRY_UPLOAD = YES
VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY
WEBSVC_GROUP = nbwebgrp
WEBSVC_USER = nbwebsvc
/etc/hosts master server
10.55.0.54 lprpmedia01.derco.cl
10.0.0.50 lprpmedia01.resp.derco.cl lprpmedia01
10.0.0.59 lprvnetbkp01.resp.derco.cl lprvnetbkp01
10.0.0.3 lprpaplnetbk.resp.derco.cl lprpaplnetbk
I dont know why the master see the same server with 2 different name, this configuration is old and was working recently.
Regards
10-06-2020 08:22 AM
CLIENT_NAME = lprpmedia01.resp.derco.cl
10.0.0.50 lprpmedia01.resp.derco.cl lprpmedia01
/etc/hosts media01
10.0.0.50 lprpmedia01.resp.derco.cl lprpmedia01
10.0.0.59 lprvnetbkp01.resp.derco.cl lprvnetbkp01
10.0.0.3 lprpaplnetbk.resp.derco.cl lprpaplnetbk
/etc/hosts master server
10.55.0.54 lprpmedia01.derco.cl
10.0.0.50 lprpmedia01.resp.derco.cl lprpmedia01
These are your problems. The simplest thing to do would be to add the entry: 10.55.0.54 lprpmedia01.derco.cl to your media server's /etc/hosts file, restart services (including PBX, just to be sure), and see if that fixes it. That's why the media server can't resolve itself locally.
If everything was working until recently, I feel like something must have been changed to cause these discrepancies. If you have multiple interfaces, you need to make sure that you're only telling NBU to use the one you want (10.55.0.54 vs 10.0.0.50).
10-07-2020 12:24 AM
The error on the media server is about hostname lprpmedia01.derco.cl (70).
Look at /etc/hosts on the media server:
10.0.0.50 lprpmedia01.resp.derco.cl lprpmedia01
There is no entry for lprpmedia01.derco.cl.
Then, there are 2 entries on the master server for this media server:
10.55.0.54 lprpmedia01.derco.cl
10.0.0.50 lprpmedia01.resp.derco.cl lprpmedia01
Why is that?
You need to find out why these different hostnames and IP addresses exist.
NBU works with hostnames - if resolution at OS-level is not configured correctly, then you will experience issues in NBU.
10-07-2020 07:59 AM
I dont know why are 2 entry for the same server, is config old. Now i register.
10.0.0.50 lprpmedia01.resp.derco.cl lprpmedia01.derco.cl lprpmedia01
And i restart the netbackup service in media, i need do it in the master?
any other commands?
[root@lprpmedia01 bin]# ./vmoprcmd -d
PENDING REQUESTS
ReqId User RecMID ExtMID Density Mode Time Barcode VolGroup
DRIVE STATUS
Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart3 TLD - No - 0
1 hcart3 TLD - No - 0
2 hcart3 TLD - No - 0
3 hcart3 TLD - No - 0
ADDITIONAL DRIVE STATUS
Drv DriveName Shared Assigned Comment
0 lprpmedia01hcart3 Yes -
1 lprpmedia01hcart3000 Yes -
2 lprpmedia01hcart3001 Yes -
3 lprpmedia01hcart3002 Yes -
Regards
10-08-2020 02:09 AM
When you 'fix' /etc/hosts entries, it should be fine to only clear host cache on the master and media server (as mentiond by @jnardello on Monday).
Now that vmoprcmd shows correct output on the media server - what is happening with the restore?
Do the media server drives show correct status (corresponding with vmoprcmd) in Device Monitor?
10-08-2020 07:10 AM
The job still failed, example for SLP that use media01.
08-10-2020 10:55:28 - requesting resource LCM_lprpmedia01-hcart3-robot-tld-0
08-10-2020 10:55:29 - begin Duplicate
08-10-2020 10:55:29 - granted resource LCM_lprpmedia01-hcart3-robot-tld-0
08-10-2020 10:55:29 - started process RUNCMD (pid=17659)
08-10-2020 10:55:29 - requesting resource lprpmedia01-hcart3-robot-tld-0
08-10-2020 10:55:29 - requesting resource @aaaaj
08-10-2020 10:55:29 - reserving resource @aaaaj
08-10-2020 10:55:30 - ended process 0 (pid=17659)
08-10-2020 10:55:30 - end Duplicate; elapsed time 0:00:01
08-10-2020 10:55:30 - Error nbjm (pid=16350) NBU status: 830, EMM status: No drives are available
08-10-2020 10:55:30 - Error nbjm (pid=16350) NBU status: 830, EMM status: No drives are available
08-10-2020 10:55:30 - Error nbjm (pid=16350) NBU status: 830, EMM status: No drives are available
No drives are available for this job (2001)
The drive keep in scan mode, attached image.
Regards
10-08-2020 07:34 AM
@robertoaxity you forgot to attach the image.
Another thing -
You said in a previous post that the drives are NOT shared with other media servers, but your previous vmoprcmd shows 'Yes' under the Shared heading?
Please run these commands on the master server (without any options) and post the output.
vmdareq
vmoprcmd
10-08-2020 07:40 AM
Oh, i dont know if the robot are shared, but that should not happen, because should each media manages its own library.
[root@lprvnetbkp01 bin]# ./vmdareq
lprpmedia01hcart3 - AVAILABLE
lprpmedia01.derco.cl SCAN_HOST UP
lprpmedia01hcart3000 - AVAILABLE
lprpmedia01.derco.cl SCAN_HOST UP
lprpmedia01hcart3001 - AVAILABLE
lprpmedia01.derco.cl SCAN_HOST UP
lprpmedia01hcart3002 - AVAILABLE
lprpmedia01.derco.cl SCAN_HOST UP
[root@lprvnetbkp01 bin]# ./vmoprcmd
HOST STATUS
Host Name Version Host Status
========================================= ======= ===========
lprvnetbkp01.resp.derco.cl 811000 ACTIVE-DISK
lprpaplnetbk.resp.derco.cl 810000 ACTIVE
lprpmedia01.resp.derco.cl 811000 ACTIVE-DISK
lprpmedialevel01.resp.derco.cl 811000 ACTIVE-DISK
lprpmedia01.derco.cl 811000 ACTIVE
lprpmediapaine01.resp.derco.cl 810000 OFFLINE
PENDING REQUESTS
<NONE>
DRIVE STATUS
Drive Name Label Ready RecMID ExtMID Wr.Enbl. Type
Host DrivePath Status
=============================================================================
IBM.ULT3580-HH7.000 Yes Yes 0026L7 0026L7 Yes hcart
lprpaplnetbk.resp.derco.cl /dev/nst5 ACTIVE
IBM.ULT3580-HH7.001 No No No hcart
lprpaplnetbk.resp.derco.cl /dev/nst11 DOWN-TLD
IBM.ULT3580-HH7.002 Yes Yes 000210 000210 Yes hcart
lprpaplnetbk.resp.derco.cl /dev/nst10 ACTIVE
IBM.ULT3580-HH7.003 No No No hcart
lprpaplnetbk.resp.derco.cl /dev/nst9 DOWN-TLD
IBM.ULT3580-HH7.004 Yes Yes 000082 000082 Yes hcart
lprpaplnetbk.resp.derco.cl /dev/nst7 ACTIVE
IBM.ULT3580-HH7.005 No No No hcart
lprpaplnetbk.resp.derco.cl /dev/nst6 DOWN-TLD
IBM.ULT3580-TD6.001 Yes Yes DER331 DER331 Yes hcart3-Clean
lprpaplnetbk.resp.derco.cl /dev/nst2 ACTIVE
IBM.ULT3580-TD6.002 No No No hcart3-Clean
lprpaplnetbk.resp.derco.cl /dev/nst1 TLD
IBM.ULT3580-TD6.003 No No No hcart3-Clean
lprpaplnetbk.resp.derco.cl /dev/nst3 TLD
IBM.ULT3580-TD6.004 No No No hcart3
lprpaplnetbk.resp.derco.cl MISSING_PATH:5:0:0:0:10WT019901 DOWN-TLD
IBM.ULT3580-TD6.005 No No No hcart3-Clean
lprpaplnetbk.resp.derco.cl /dev/nst0 TLD
lprpmedia01hcart3 No No No hcart3
lprpmedia01.derco.cl /dev/nst3 SCAN-TLD
lprpmedia01hcart3000 No No No hcart3
lprpmedia01.derco.cl /dev/nst2 SCAN-TLD
lprpmedia01hcart3001 No No No hcart3
lprpmedia01.derco.cl /dev/nst1 SCAN-TLD
lprpmedia01hcart3002 No No No hcart3
lprpmedia01.derco.cl /dev/nst0 SCAN-TLD
Regards