cancel
Showing results for 
Search instead for 
Did you mean: 

Slowness of Netbackup Java console , high I/O only on Saturday

turguns
Level 5

Hi ALL,

Version 7.1

Redhat 6.1

Master server

We have issue with netbackup master  every Saturday . Java console is very slow. High I/O wait on server.

Backups which are through FC backup speed is normal. On Saturday no big backups, just small archive backups, which are running another day also.

Is it possible run background process of Netbackup which are not visible from GUI?

P.S. we're not using any media,cliend deduplications.

After restarting three times the server ,It returned to normal state.

Any idea?

 

BR,

Turgun

1 ACCEPTED SOLUTION

Accepted Solutions

turguns
Level 5

I've done two changes. Which one is the reason. I don't know ))

1. Replacing failed internal disk (It was mirrored)

2. NBU-Catalog error (https://www.veritas.com/community/forums/nbcatalog-error)

 

Br,

Turgun

View solution in original post

6 REPLIES 6

Nicolai
Moderator
Moderator
Partner    VIP   

The high I/O does that originate from intern or external SAN attached disks ?

The following command may help identify the issue :

  • top (show cpu/memory usage per process)
  • free (free memory)
  • As root run "crontab -l" to show schedules tasks

 

revarooo
Level 6
Employee

Indeed, check the available memory and use top as recommended.

Out of interest how much memory is in your Master server?

Also consider upgrading from 7.1 to 7.7 soon as NBU 7.0 - 7.6.1.2 goes out of support by February 2017.

 

tunix2k
Level 5
Partner Accredited

Yes use the systat tools to monitor cpu/memory/disk.

The answer to first question of Nicolai is very interesting.

Is it possible that every saturday some tasks are schdules on your SAN-storage ?

Otherwise you have to find out the processes wich are responsible for high I/O. May be you can stop NetBackup if you have I/O waits and monitor the system without NBU.

Do you have clientsoftware of some monitoring or inventory software ? If so check for these processes.

ciao

Martin

sdo
Moderator
Moderator
Partner    VIP    Certified

I worked somewhere where the sys admins swore blind that the SAN infrastructure had an issue at around 7am every few days or so... we proved no huge volumes of SAN traffic etc... turned out it was due to McAfee EPO sending out AV updates to 1500 guest developement and test VMs at the same time, which all then tried to scan themselves.

The sys admins were adamant that it was a storage array issue.  I only found the true root cause by manually corelating Windows application event log IDs for McAfee  (from quite a lot of guest VMs) with the time of the appparent slowness.  I had previously checked all the usual storage array and SAN switch charactistics (IOps, queue depth, IO payload size, throughput, latency, port saturation, LUN traffic, parity group traffic, back-end SAS bus traffic, internal storage array performance etc) - and it also showed up as quite long IO (but not horrendous) response times within some of the VMs - but oddly not all of them - it was difficult to spot amongst all the SAN and storage traffic generated by just the normal workload of the other thousands of physical production and virtual production servers.  The issue wasn't even attributable to McAfee either really, it was due to the way the sys admins had configured McAfee, or more correctly, it could be descibed as being caused because McAfee EPO work load scheduling hadn't been configured for a large entprise. Anyway, once they changed EPO to spread its updates across a wider time frame, and only do about 50 VMs every 10 minutes or so, then all was ok again.  I didn't get to prove it, but the only thing that I hadn't been able to check was the IO chracteristics from the ESX hosts at the physical FC HBA layer, so I suspect that the queue depth of the ESX host HBAs was saturated, so think of it as IO getting stuck within ESX between the vHBA and pHBA.  So, some IO gets through ok, and the IO that does get through is all serviced absolutely fine by the SAN switching and storage arrays... but the ESX hosts were being swamped internally, but not by any huge degree at the individual guest VM layer.

My point is... you have to be able to check and view and see everything to be able to find odd intermittent wide scale storage perofrmance issues.

turguns
Level 5

Hi All,

At crontab nothing for saturday. When there was big load, I've checked top -a , free -m. All were okay. I couldn't catch anythink.

I've found this command : (while true; do date; ps auxf | awk '{if($8=="D") print $0;}'; sleep 1; done  ) . It is showin the processes ,that how many times waited when there is IO interrupts.

I've activated sar, and I'm getting output of dstat. Tomorrow, I will monitor. I will get back on that later. Thank you for valuable feedbacks.

 

turguns
Level 5

I've done two changes. Which one is the reason. I don't know ))

1. Replacing failed internal disk (It was mirrored)

2. NBU-Catalog error (https://www.veritas.com/community/forums/nbcatalog-error)

 

Br,

Turgun