10-21-2014 06:32 AM
Environment:
Master Server Prod = VM ESXi 5.1, Red Hat Enterprise Linux Server release 5.9 (Tikanga), 64 bit, 32 GB, 8 vCPU, Netbackup v7.6.0.2, Capacity License
Media Servers Prod = (5) Windows Standard 2008 R2 64 bit, 32 GB, 8 and 16 Logical Processors, Netbackup 7.6.0.2 (Physical)
Backup Target = Data Domain dd670, 51 useable TB, Data is encrypted on the Data Domain
***
Master Server DR = VM ESXi 5.5, Red Hat Enterprise Linux Server release 6.5 (Santiago), 64 bit, 32 GB, 8 vCPU, Netbackup v7.6.0.2
Media Servers DR = (3) Windows Standard 2008 R2 64 bit, 32 GB, (6) Logical Processors, Netbackup v7.6.0.2 (Physical)
Backup Target = Data Domain dd670, 51 useable TB, data is encrypted on the Data Domain
***
Problematic operation = A.I.R. Imports in the DR environment are taking a long time.
Description: Our Production site initiates a replication from Prod Data Domain to DR Data Domain. This works as expected however the DR site is taking a very long time to import the replicated images into the DR catalog. Our consultant told us that only the "header" of the image is read and so each import should take roughly the same amount of time. Our research combined with our consultants found articles of other environments using Data Domains and A.I.R. where image importants were taking about 6 to 7 minutes each. We could live with that. Howerver, most imports are taking much longer. We have replications with Prod retention of 1 Mo and DR retention 1 Mo and these are essentially differential and Db log offloads. Our full backups have Prod retention of 1 Mo and DR retention of 6 Mo. and include some large file servers and most of our file (non-Db) backups. The 1 Mo to 1 Mo take around 16 to 40 minutes and the 1 Mo to 6 Mo can take as long as 8 hours or more for a single image for a large file server. Clearly this is not consistent with expectations that reading only the header of the image would yield similar import times and is completely unacceptable performance.
Our SLP parameters for imports are set as follows:
Min import images = 50
Max import images = 200
Force interval for small job = 20
***
Disk pool max i/o stream = 30
Storage Unit Max Concurrent = 30
We are using the latest ost plugins from EMC that are available. 2.6.2.0-410681
Please help. We are falling way behind with a queued jobs in excess of 1200 and rising thus we are also not meeting our SLA to the business units.
Thank You.
Solved! Go to Solution.
11-21-2014 03:36 PM
Hope I am not too late..
Stable DDOS version is 5.4.4.2. This version is only 20 days old. I have it on my 990,890,160,620. fixed issues related to Datadomain and AIR. I still have issues with SLOW AIR IMPORTS and working with support.
Cheers
10-21-2014 07:38 AM
Oh dear - what a mess !
Consider to open support tickets with Symantec and EMC.
You did not mention what DDOS you are on. DDOS 5.3 was quite buggy so upgrade to DDOS 5.4.X if possible.
10-21-2014 08:24 AM
The OST plugin version is at the bottom of the above post and is 2.6.2.0-410681
The dd OS is: OS: 5.4.0.8-404909
I already have cases open with Symantec, our Symantec consulting firm and EMC.
Terry
10-21-2014 10:26 AM
You got the DDOS covered then.
Please do get back to us findings !!
10-21-2014 08:02 PM
Check out this note https://www-secure.symantec.com/connect/forums/netbackup-imports-running-longer-when-using-airdata-domain.
10-22-2014 07:33 AM
Thanks. Our Symantec consultant discovered references similar to the article you suggested as well as from Symantec support. We are trying to get a newer OST plugin from EMC and Detailed Status and bpdm debug logs have been sent to Symantec support.
10-22-2014 09:45 PM
Can you perhaps go on to your DR DataDomain unit and Run the following[just want to see if the unit is coping with the load]:
Kind Regards
William
10-23-2014 09:58 AM
Welcome to Data Domain OS 5.4.0.8-404909
----------------------------------------
sysadmin@DD670DR# sysadmin@DD670DR# net show hardware
Port Speed Duplex Supp Speeds Hardware Address Physical Link Status
----- -------- ------- ----------- ----------------- -------- -----------
eth0a 1000Mb/s full 10/100/1000 00:8c:fa:02:57:b1 Copper yes
eth0b 1000Mb/s full 10/100/1000 00:8c:fa:02:57:b1 Copper yes
eth4a unknown unknown 1000/10000 90:e2:ba:06:8d:c0 Fiber no
eth4b unknown unknown 1000/10000 90:e2:ba:06:8d:c1 Fiber no
----- -------- ------- ----------- ----------------- -------- -----------
sysadmin@DD670DR# iostat 1
10/23 11:57:24
INTERVAL: 1 secs
"-" indicates that system is busy and unable to get recent data.
-----------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
CPU |State |NFS |CIFS |Net |Disk |NVRAM
|Repl
aggr aggr| | | |eth0a eth0a eth0b eth0b eth4a eth4a eth4b eth4b| | aggr
aggr|
busy max| | ops/s load data in data out|ops/s in out| in out in out in out in out| read write busy| read
write| in out
% %|CDBVMSFIR| % MB/s MB/s| MB/s MB/s| MB/s MB/s MB/s MB/s MB/s MB/s MB/s MB/s| KiB/s KiB/s %| KiB/s
KiB/s| KB/s KB/s
---- ---- --------- ------- ----- ------- -------- ----- ------- ------- ----- ----- ----- ----- ----- ----- ----- ----- ------- ------- ---- -------
------- ------- -------
93 93 - 87 0 0.01 1.85 0 0 0 0.12 5.71 0 0 0 0 0 0 198026 7148 33 0
0 0 0
26 26 154 - 0.02 3.14 0 0 0 0.12 5.71 0 0 0 0 0 0 63731 14152 12 0
21599 0 0
26 26 154 - 0.02 3.14 0 0 0 0.12 5.71 0 0 0 0 0 0 63731 14152 12 0
21599 0 0
3 3 0 0 0 0 0 0 0 0.14 0.76 0 0 0 0 0 0 4 103 0 0
21599 0 0
3 3 0 0 0 0 0 0 0 0.14 0.76 0 0 0 0 0 0 4 103 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - 0
33 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 40 0 0
33 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 40 0 0
sysadmin@DD670DR# sysadmin@DD670DR# ddboost show connections
Active Clients: 0
Clients:
Client Idle CPUs Memory(MiB) Plugin Version OS Version Applica
tion Version
------------------------ ---- ---- ----------- -------------- -------------------------------------------------------------------- -------
------------
UTLR5349.ad.ama-assn.org YES 6 32,765 2.6.2.0-410681 Microsoft Windows Server 2008 R2 Service Pack 1 (build 7601), 64-bit NetBac
kup: 7.6.0.2
UTLR5347.ad.ama-assn.org YES 6 32,765 2.6.2.0-410681 Microsoft Windows Server 2008 R2 Service Pack 1 (build 7601), 64-bit NetBac
kup: 7.6.0.2
UTLR5348.ad.ama-assn.org YES 6 32,765 2.6.2.0-410681 Microsoft Windows Server 2008 R2 Service Pack 1 (build 7601), 64-bit NetBac
kup: 7.6.0.2
------------------------ ---- ---- ----------- -------------- -------------------------------------------------------------------- -------
------------
Client Connections:
Max Client Connections: 140
---------------------- ifgroup ------------------- ----------------------- Connections -----------------------------
Group Name Status Interface Backup Restore Src-repl Dst-repl Synthetic Total
------------------------ -------- ---------------- --------- --------- --------- --------- --------- ---------
none 10.194.97.103 0 0 0 0 0 0
------------------------ -------- ---------------- -------- -------- --------- --------- --------- --------
Total Connections: 0 0 0 0 0 0
10-23-2014 10:02 AM
sysadmin@DD670DR# sysadmin@DD670DR# net show hardware
Port Speed Duplex Supp Speeds Hardware Address Physical Link Status
----- -------- ------- ----------- ----------------- -------- -----------
eth0a 1000Mb/s full 10/100/1000 00:8c:fa:02:57:b1 Copper yes
eth0b 1000Mb/s full 10/100/1000 00:8c:fa:02:57:b1 Copper yes
eth4a unknown unknown 1000/10000 90:e2:ba:06:8d:c0 Fiber no
eth4b unknown unknown 1000/10000 90:e2:ba:06:8d:c1 Fiber no
----- -------- ------- ----------- ----------------- -------- -----------
sysadmin@DD670DR# iostat 1
10/23 11:57:24
INTERVAL: 1 secs
"-" indicates that system is busy and unable to get recent data.
-----------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
CPU |State |NFS |CIFS |Net |Disk |NVRAM
|Repl
aggr aggr| | | |eth0a eth0a eth0b eth0b eth4a eth4a eth4b eth4b| | aggr
aggr|
busy max| | ops/s load data in data out|ops/s in out| in out in out in out in out| read write busy| read
write| in out
% %|CDBVMSFIR| % MB/s MB/s| MB/s MB/s| MB/s MB/s MB/s MB/s MB/s MB/s MB/s MB/s| KiB/s KiB/s %| KiB/s
KiB/s| KB/s KB/s
---- ---- --------- ------- ----- ------- -------- ----- ------- ------- ----- ----- ----- ----- ----- ----- ----- ----- ------- ------- ---- -------
------- ------- -------
93 93 - 87 0 0.01 1.85 0 0 0 0.12 5.71 0 0 0 0 0 0 198026 7148 33 0
0 0 0
26 26 154 - 0.02 3.14 0 0 0 0.12 5.71 0 0 0 0 0 0 63731 14152 12 0
21599 0 0
26 26 154 - 0.02 3.14 0 0 0 0.12 5.71 0 0 0 0 0 0 63731 14152 12 0
21599 0 0
3 3 0 0 0 0 0 0 0 0.14 0.76 0 0 0 0 0 0 4 103 0 0
21599 0 0
3 3 0 0 0 0 0 0 0 0.14 0.76 0 0 0 0 0 0 4 103 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - 0
33 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 40 0 0
33 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 40 0 0
0 0 0
sysadmin@DD670DR# sysadmin@DD670DR# ddboost show connections
Active Clients: 0
Clients:
Client Idle CPUs Memory(MiB) Plugin Version OS Version Applica
tion Version
------------------------ ---- ---- ----------- -------------- -------------------------------------------------------------------- -------
------------
UTLR5349.ad.ama-assn.org YES 6 32,765 2.6.2.0-410681 Microsoft Windows Server 2008 R2 Service Pack 1 (build 7601), 64-bit NetBac
kup: 7.6.0.2
UTLR5347.ad.ama-assn.org YES 6 32,765 2.6.2.0-410681 Microsoft Windows Server 2008 R2 Service Pack 1 (build 7601), 64-bit NetBac
kup: 7.6.0.2
UTLR5348.ad.ama-assn.org YES 6 32,765 2.6.2.0-410681 Microsoft Windows Server 2008 R2 Service Pack 1 (build 7601), 64-bit NetBac
kup: 7.6.0.2
------------------------ ---- ---- ----------- -------------- -------------------------------------------------------------------- -------
------------
Client Connections:
Max Client Connections: 140
---------------------- ifgroup ------------------- ----------------------- Connections -----------------------------
Group Name Status Interface Backup Restore Src-repl Dst-repl Synthetic Total
------------------------ -------- ---------------- --------- --------- --------- --------- --------- ---------
none 10.194.97.103 0 0 0 0 0 0
------------------------ -------- ---------------- -------- -------- --------- --------- --------- --------
Total Connections: 0 0 0 0 0 0
-------------------------------------------------- -----------------------------------------------------------------
10-23-2014 10:15 PM
Hi
My reccomendations would be:
There seems to be something fishy going on with the datadomain. That first performance iterations indicated that the CPU is at 93% for 5MB of reading:
Can you perhaps provide us with the following:
10-23-2014 10:26 PM
It might be worth while to configure ifgroups and adding your interfaces in there to ensure you get more throughput from the network on the datadomain.
ddboost ifgroup create NBU
ddboost ifgroup add NBU interface <ipaddr of eth0a>
ddboost ifgroup add NBU interface <ipaddr of eth0b>
ddboost ifgroup add NBU client UTLR5349.ad.ama-assn.org
ddboost ifgroup add NBU client UTLR5347.ad.ama-assn.org
ddboost ifgroup add NBU client UTLR5348.ad.ama-assn.org
10-24-2014 12:56 AM
Please attach such output as a file. debug text usually get caught by Connect anti spam feature plus it clutter readability.
10-24-2014 04:15 AM
It would worth checking the replication throttle setting on the DR DD as there are issues with replication throttling in the DDOS 5.4 family of releases:
11-21-2014 03:36 PM
Hope I am not too late..
Stable DDOS version is 5.4.4.2. This version is only 20 days old. I have it on my 990,890,160,620. fixed issues related to Datadomain and AIR. I still have issues with SLOW AIR IMPORTS and working with support.
Cheers