03-07-2013 08:36 PM
hi,
I came across a EMC NDMP backup issue and this is the first time I configure EMC NDMP backup.
NBU version on master : 7.5.0.3 running on Linx(CentOS release 5.7 (Final))
NBU version on media serer: 7.5.0.3 running on Linx(CentOS release 5.7 (Final))
we have multi-IPs on both maser and media server; please look as below; the IP of EMC data mover we use is 10.119.35.13Q;
[root@lx0034nbumast ~]# ifconfig -a |grep 10 |grep addr
inet addr:10.119.9.CC Bcast:10.119.9.255 Mask:255.255.255.0
inet addr:10.119.10.AB Bcast:10.119.10.255 Mask:255.255.255.0
[root@lx0034nbumed01 lx0034nbumast_phd_bkups]# ifconfig -a |grep 10 |grep addr
inet addr:10.119.9.DD Bcast:10.119.9.255 Mask:255.255.255.0
inet addr:10.119.10.XY Bcast:10.119.10.255 Mask:255.255.255.0
we want the EMC storage talk to our backup servers via 10.119.10.XX IP ;; so the network guys already helps open 10000 bidirectionally ;
now the NDMP config verification are good on both master and media servers;
[root@lx0034nbumed01 ~]# /usr/openv/volmgr/bin/tpautoconf -verify 10.119.35.13Q
Connecting to host "10.119.35.139" as user "ndmp"...
Waiting for connect notification message...
Opening session--attempting with NDMP protocol version 4...
Opening session--successful with NDMP protocol version 4
host supports TEXT authentication
host supports MD5 authentication
Getting MD5 challenge from host...
Logging in using MD5 method...
Host info is:
host name "server_2"
os type "DartOS"
os version "EMC File Server.T.7.1.55.3"
host id "abc1997"
Login was successful
Host supports LOCAL backup/restore
Host supports 3-way backup/restore
but NDMP backup always failed. I don't know what I miss.
could you help me ?
Since I want the EMC storage talking to our backup servers via 10.119.10.XX IP ; but the IP bounded to FQDN of backups servers are 10.119.9.XX.....
so how do I make sure that the EMC storage indeed communicate with backup servers via 10.119.10.XX, not 10.119.9.XX ?
Do miss any step on EMC storage?
==================================================
2013-3-5 23:33:43 - Info nbjm (pid=26743) starting backup job (jobid=1222996) for client 10.119.35.139, policy lascx4_phd_a_stg01, schedule Full
2013-3-5 23:33:43 - Info nbjm (pid=26743) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1222996, request id:{2C994AA0-8630-11E2-AED1-FBB1D04F8B49})
2013-3-5 23:33:43 - requesting resource lx0034nbumed01_phd_dd670_rdsu01
2013-3-5 23:33:43 - requesting resource lx0034nbumast.NBU_CLIENT.MAXJOBS.10.119.35.139
2013-3-5 23:33:43 - requesting resource lx0034nbumast.NBU_POLICY.MAXJOBS.lascx4_phd_a_stg01
2013-3-5 23:33:44 - Info bpbrm (pid=8162) 10.119.35.139 is the host to backup data from
2013-3-5 23:33:44 - Info bpbrm (pid=8162) reading file list from client
2013-3-5 23:33:44 - Info bpbrm (pid=8162) starting ndmpagent on client
2013-3-5 23:33:44 - Info ndmpagent (pid=8164) Backup started
2013-3-5 23:33:44 - Info bpbrm (pid=8162) bptm pid: 8165
2013-3-5 23:33:44 - Info bptm (pid=8165) start
2013-3-5 23:33:44 - granted resource lx0034nbumast.NBU_CLIENT.MAXJOBS.10.119.35.139
2013-3-5 23:33:44 - granted resource lx0034nbumast.NBU_POLICY.MAXJOBS.lascx4_phd_a_stg01
2013-3-5 23:33:44 - granted resource MediaID=@aaaam;Path=/dd670-uswc02/backup/non_pci/repl_wcdc/lx0034nbumast_phd_bkups;MediaServer=lx003...
2013-3-5 23:33:44 - granted resource lx0034nbumed01_phd_dd670_rdsu01
2013-3-5 23:33:44 - estimated 0 kbytes needed
2013-3-5 23:33:44 - Info nbjm (pid=26743) started backup (backupid=10.119.35.139_1362555224) job for client 10.119.35.139, policy lascx4_phd_a_stg01, schedule Full on storage unit lx0034nbumed01_phd_dd670_rdsu01
2013-3-5 23:33:44 - started process bpbrm (pid=8162)
2013-3-5 23:33:44 - connecting
2013-3-5 23:33:44 - connected; connect time: 0:00:00
2013-3-5 23:33:45 - Info bptm (pid=8165) using 30 data buffers
2013-3-5 23:33:45 - Info bptm (pid=8165) using 262144 data buffer size
2013-3-5 23:33:46 - Info bptm (pid=8165) start backup
2013-3-5 23:33:46 - begin writing
2013-3-6 0:33:47 - Error bpbrm (pid=8162) socket read failed: errno = 62 - Timer expired
2013-3-6 1:33:47 - Error bpbrm (pid=8162) socket read failed: errno = 62 - Timer expired
2013-3-6 1:33:47 - Error bptm (pid=8165) media manager exiting because bpbrm is no longer active
termination requested by administrator (150)
03-07-2013 09:22 PM
hi,
to enable to backups for the IP 10.119.10.XX,
1) you sould enable to proper route
2) name resolution should ponit the right IP.
why you would like to use the 10.119.10.XX, to communicate with NDMP, does it backup LAN?
what are the FQDN associate with the IPS 10.119.10.XX? ( are they not refferting the backup Name of the server)?
03-07-2013 09:27 PM
hi, Nagalla
thanks for your reply
1) you sould enable to proper route
host side: I add one route as below
Destination Gateway Genmask Flags MSS Window irtt Iface
10.119.35.139 10.119.10.1 255.255.255.255 UGH 0 0 0 eth4
EMC side:[nasadmin@lascxcss01 ~]$ server_route server_2 -list
.............
host 10.119.10.99 10.119.35.139 255.255.255.255 int_nfs
host 10.119.10.98 10.119.35.139 255.255.255.255 int_nfs
2) name resolution should ponit the right IP.
why you would like to use the 10.119.10.XX, to communicate with NDMP, does it backup LAN?
because the default IP 10.119.9.XX(which is bounded to FQDN name) is management IP.
3) what are the FQDN associate with the IPS 10.119.10.XX? ( are they not refferting the backup Name of the server)?
no they are not..........
03-07-2013 09:43 PM
if 10.119.9.XX is managemnet IP , what is the name associated for the IP 10.119.10.XX, that is the Name you sould use for the backup configuartion.
Okay..
check if you are able to ping the IP 10.119.10.XX from EMC box.
if yes, use the hosts entries in EMC box to map the FQDN to the 10.119.10.XX
like in hosts file of EMC shoudl have below.
10.119.10.XX FDQN of master
10.119.10.XX FDQN of media
03-07-2013 09:49 PM
yes, I can ping 10.119.10.XX from EMC box.
I added the 2 entries below to the /etc/hosts of EMC control station.
10.119.10.XX lx0034nbumed01.active.tan lx0034nbumed01
10.119.10.YY lx0034nbumast.active.tan lx0034nbumast
but I am not sure if it works!.
since /etc/hosts is residing on EMC control station, but we want to talk the IP of data mover, not control station.
03-10-2013 10:18 PM
since the verification "/usr/openv/volmgr/bin/tpautoconf -verify 10.119.35.139" on both the master and media servers are good , So i think the failure may not caused by network.
but still not sure what's the possible reason..........................
03-11-2013 09:00 PM
2013-3-5 23:33:44 - granted resource MediaID=@aaaam;Path=/dd670-uswc02/backup/non_pci/repl_wcdc/lx0034nbumast_phd_bkups;MediaServer=lx003...
2013-3-5 23:33:44 - granted resource lx0034nbumed01_phd_dd670_rdsu01
2013-3-5 23:33:44 - estimated 0 kbytes needed