Any antivirus or Firewall is ON

sreedhar2u · ‎10-06-2010

Hi,

Our Netbackup services are going down quite frequently when ever a Windows client job takes two or three days to complete. Please let me know how to fix this.

OS: Windows 2003 Std Edition

NB Version: 6.5.4

Marianne · ‎10-06-2010

Services going down on Master?

Please look in Event Viewer Application and System log for evidence and possible reason for services being terminated.

Netbackup will never go down by itself, unless a 'disk full' condition is experienced and services are shut down to prevent EMM database corruption.

Handy NetBackup Links

RiaanBadenhorst · ‎10-06-2010

And let us know which services when you look at the event viewer as pointed out by Marianne. (Who should be having fun in Barcelona for that matter and not "connecting" :P)

sreedhar2u · ‎10-06-2010

I found the below System event log error

Event Type:    Error
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7034
Date:        10/5/2010
Time:        10:23:56 PM
User:        N/A
Computer:    Server02
Description:
The NetBackup Policy Execution Manager service terminated unexpectedly. It has done this 1 time(s).

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Below is the detailed job status window output

10/2/2010 8:36:39 PM - requesting resource Any
10/2/2010 8:36:39 PM - requesting resource Server02.NBU_CLIENT.MAXJOBS.172.20.20.20
10/2/2010 8:36:39 PM - requesting resource Server02.NBU_POLICY.MAXJOBS.Windows_ProjShares_hbadfs01
10/2/2010 8:36:39 PM - awaiting resource Any Reason: Tape media server is not active, Media Server: Server02,
     Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
     Volume Pool: SL500, Storage Unit: Server02-hcart2-robot-tld-0, Drive Scan Host: N/A

10/2/2010 8:41:36 PM - granted resource Server02.NBU_CLIENT.MAXJOBS.172.20.20.20
10/2/2010 8:41:36 PM - granted resource Server02.NBU_POLICY.MAXJOBS.Windows_ProjShares_hbadfs01
10/2/2010 8:41:36 PM - granted resource SRH948
10/2/2010 8:41:36 PM - granted resource HP.ULTRIUM3-SCSI.001
10/2/2010 8:41:36 PM - granted resource Server02-hcart3-robot-tld-1
10/2/2010 8:41:36 PM - estimated 0 kbytes needed
10/2/2010 8:41:36 PM - started
10/2/2010 8:41:37 PM - started process bpbrm (2716)
10/2/2010 8:41:45 PM - connecting
10/2/2010 8:41:46 PM - connected; connect time: 00:00:01
10/2/2010 8:41:50 PM - mounting SRH948
10/2/2010 8:42:30 PM - mounted; mount time: 00:00:40
10/2/2010 8:42:32 PM - positioning SRH948 to file 23
10/2/2010 8:43:48 PM - positioned SRH948; position time: 00:01:16
10/2/2010 8:43:48 PM - begin writing
10/3/2010 6:39:50 AM - current media SRH948 complete, requesting next resource Any
10/3/2010 6:39:50 AM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: Server02,
     Robot Number: 1, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
     Volume Pool: SL500, Storage Unit: Server02-hcart3-robot-tld-1, Drive Scan Host: N/A

10/3/2010 6:40:56 AM - granted resource SRK464
10/3/2010 6:40:56 AM - granted resource HP.ULTRIUM3-SCSI.001
10/3/2010 6:40:56 AM - granted resource Server02-hcart3-robot-tld-1
10/3/2010 6:40:56 AM - mounting SRK464
10/3/2010 6:41:34 AM - mounted; mount time: 00:00:38
10/3/2010 6:41:36 AM - positioning SRK464 to file 1
10/3/2010 6:41:39 AM - positioned SRK464; position time: 00:00:03
10/3/2010 6:41:39 AM - begin writing
10/4/2010 9:51:23 PM - current media SRK464 complete, requesting next resource Any
10/4/2010 9:51:23 PM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: Server02,
     Robot Number: 1, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
     Volume Pool: SL500, Storage Unit: Server02-hcart3-robot-tld-1, Drive Scan Host: N/A

10/4/2010 9:52:09 PM - granted resource SRK463
10/4/2010 9:52:09 PM - granted resource HP.ULTRIUM3-SCSI.001
10/4/2010 9:52:09 PM - granted resource Server02-hcart3-robot-tld-1
10/4/2010 9:52:11 PM - mounting SRK463
10/4/2010 9:52:54 PM - mounted; mount time: 00:00:43
10/4/2010 9:52:56 PM - positioning SRK463 to file 1
10/4/2010 9:52:59 PM - positioned SRK463; position time: 00:00:03
10/4/2010 9:52:59 PM - begin writing
10/5/2010 1:35:51 PM - current media SRK463 complete, requesting next resource Any
10/5/2010 1:35:51 PM - granted resource SRK478
10/5/2010 1:35:51 PM - granted resource HP.ULTRIUM3-SCSI.000
10/5/2010 1:35:51 PM - granted resource Server02-hcart3-robot-tld-1
10/5/2010 1:35:53 PM - mounting SRK478
10/5/2010 1:36:36 PM - mounted; mount time: 00:00:43
10/5/2010 1:36:39 PM - positioning SRK478 to file 1
10/5/2010 1:36:42 PM - positioned SRK478; position time: 00:00:03
10/5/2010 1:36:42 PM - begin writing
10/5/2010 10:57:13 PM - current media SRK478 complete, requesting next resource Any
10/5/2010 10:57:19 PM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: Server02,
     Robot Number: 1, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
     Volume Pool: SL500, Storage Unit: Server02-hcart3-robot-tld-1, Drive Scan Host: N/A

10/5/2010 10:58:20 PM - granted resource SRK452
10/5/2010 10:58:20 PM - granted resource HP.ULTRIUM3-SCSI.000
10/5/2010 10:58:20 PM - granted resource Server02-hcart3-robot-tld-1
10/5/2010 10:58:30 PM - mounting SRK452
10/5/2010 10:59:14 PM - mounted; mount time: 00:00:44
10/5/2010 10:59:16 PM - positioning SRK452 to file 1
10/5/2010 10:59:19 PM - positioned SRK452; position time: 00:00:03
10/5/2010 10:59:19 PM - begin writing
10/6/2010 12:42:21 AM - Error bpbrm(pid=2716) from client 172.20.20.20: ERR - failure reading file: G:\Departments\HR\SG&HR\Strat Growth\Referral Candidates Correspondence\Referral - 2005 - 06\Referral - 2004 - 05\Profiles - Sept\Testing\ajay.doc (WIN32 5: Access is denied. )
10/6/2010 12:42:21 AM - Error bpbrm(pid=2716) from client 172.20.20.20: ERR - Snapshot Error while reading file: Volume{2ed31c6e-915d-4364-b2f7-5ba3f42ccdc4}\Departments\HR\SG&HR\Strat Growth\Referral Candidates Correspondence\Referral - 2005 - 06\Referral - 2004 - 05\Profiles - Sept\Testing\ajay.doc
10/6/2010 12:42:21 AM - Critical bpbrm(pid=2716) from client 172.20.20.20: FTL - Backup operation aborted!
10/6/2010 12:43:16 AM - end writing; write time: 01:43:57
snapshot error encountered(156)

Will_Restore · ‎10-07-2010

I am c urious about. Why so long to backup one client?

RiaanBadenhorst · ‎10-07-2010

Hi,

I dont think the two are related, the PEM service schedules jobs, so its shouldn't be affected by a job running for a long period. And vice versa, it shouldn't affect the job either.

How big is the client that you are backing up, have you thought of splitting the work into multiple jobs. That way you can rerun which ever job did not succeed.

The PEM issue might just be something that can be fixed by patching your master server.

Please share more info about the client/job.

Mahesh_Roja · ‎10-09-2010

Any antivirus or Firewall is ON?

VOX

NetBackup Services