Facing issue with Image cleanup Jobs

wannawin · ‎08-02-2013

Hello Team.

Round about 400 image cleanup jobs is in progress and they are in hung state.. because of this amster server was hung twice and we rebooted the server but after reboot again lots of image clean up jobs ran and due to capacity issue server was hung after every 8-10 hrs..

We have DD in our environment and what i found in logs is below..

23:00:19.920 [26303] <16> delete_by_backupid: OVsystem(/usr/openv/netbackup/bin/admincmd/nbdelete -backup_id usnbst8017-brn_1375198910) failed (174)

23:00:19.944 [26303] <2> put_string: cannot write data to network: Broken pipe (32)

23:00:19.944 [26303] <2> db_senddata: put_string(): connection dropped or not connected, Broken pipe, (32)

23:00:19.944 [26303] <2> db_sendSTATUS: db_senddata() failed: network connection broken, (40)

Can you please clear me , whether DD clean up jobs is showing in NBU activity monitor or not?

Regards

Davinder

RonCaplinger · ‎08-02-2013

Data Domain's internal scheduled cleanup jobs do NOT appear in the NetBackup Activity Monitor.

The cleanup jobs scheduled on the Data Domain will only delete an image once the NetBackup Image Cleanup job has marked a block as expired. The Netbackup Image Cleanup job must run first to mark the images to be expired, then the Data Domain scheduled cleanup job will remove the unneeded blocks of data from disk.

You messages above indicate that you have connectivity issues between your media server and your Data Domain.

wannawin · ‎08-02-2013

Hi .

Now connectivity issue resolved means one issue resolved.

But still image cleaup issue persiust, more then 400 image clean up job is in progress from yesterday and server was rebooted twice it was in hung state..

when i double clicked on image clean up job , it is empty , it is not showin any information..

wannawin · ‎08-02-2013

Hi.

Lots of bpdbm processes is running in bpps -a output

wannawin · ‎08-02-2013

Hi.

Please find attached bpdbm consistence check

RonCaplinger · ‎08-02-2013

You have quite a few corrupted files, there are a lot of messages:

>>PRIMARY_COPY is set to an invalid copy
>>EXPIRATION is not set to the next valid copy to expire

I would suggest calling in a support ticket with Symantec to have them go through correcting all the issues found in the output you attached.

And in the meantime, watch the first few Image Cleanup jobs to see if they are, in fact, doing anything. I've had more than one in the queue at once, but the first one is usually the only one doing any actual modification of the catalog. The rest just seem to sit in a hung state, then run one at a time until they complete.

wannawin · ‎08-02-2013

Hi Ron.

Can we stop this image cleanup job for the time being till this issue resolved..

i already created one touch file

-rwxrwxrwx 1 root root 0 Aug 1 08:25 NOexpire

but it is of no use.. still image cleanup jobs are running ..

Please suggest..

wannawin · ‎08-02-2013

Hi Team.

Is there any method to stop these image cleanup jobs..

mph999 · ‎08-02-2013

Create an empty file /usr/openv/netbackup/bin/NOexpire Do not forget about this file, else you will eventually run out of space /tapes. (Note, it does not prevent manual expirations) Martin

wannawin · ‎08-02-2013

Hello mph999

i already craeted this file and below are with 777 permissions.

I stopped NBU services and after that craeted touch file NOexpire with 777 permissions and after that start nbu services but still image clean job is running in bulk and this hang the server..

-rwxrwxrwx 1 root root 0 Aug 1 08:25 NOexpire

Omar_Villa · ‎08-03-2013

NOExpire will only prevent the images to expire but the image cleanup will continue to run after each backup session completes.

Marianne · ‎08-03-2013

As per previous advice - log a support call!!

Handy NetBackup Links

wannawin · ‎08-04-2013

Hello Team.

did movement of corrupted jobs by command bpdbm -consistency 2 -move and after completion of this command we did the reboot but still loads of image cleanup jobs is in hung state(round about 450)

Now another issue has been raised,

[80101937@usnbub5500 admincmd]$ sudo ./nbemmcmd -listhosts
Password:
NBEMMCMD, Version:7.1.0.4
Failed to initialize EMM connection. Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.

[80101937@usnbub5500 bin]$ sudo ./vmoprcmd -d
Password:
network protocol error (39)

Loads of backup has been failing with errror code 800, below is the description of one or two backups..

08/03/2013 11:20:11 - Info nbjm (pid=6751) starting backup job (jobid=2387529) for client @@@@, policy @@@@, schedule Full
08/03/2013 11:20:11 - Info nbjm (pid=6751) requesting MEDIA_SERVER_WITH_ATTRIBUTES resources from RB for backup job (jobid=2387529, request id:{91ECC5E6-FC58-11E2-B113-871F552E92D9})
08/03/2013 11:20:11 - requesting resource usnbub5500-omaha-local-disk-su
08/03/2013 11:20:11 - requesting resource usnbub5500.NBU_CLIENT.MAXJOBS.trackwt1ora-t
08/03/2013 11:20:11 - requesting resource usnbub5500.NBU_POLICY.MAXJOBS.DB-trackwt1-dblogs-no_altsite
08/03/2013 11:20:11 - Error nbjm (pid=6751) NBU status: 800, EMM status: CORBA communication failure
08/03/2013 11:30:11 - Info nbjm (pid=6751) starting backup job (jobid=2387529) for client trackwt1ora-t, policy DB-trackwt1-dblogs-no_altsite, schedule Full
resource request failed (800)

mph999 · ‎08-06-2013

Thank you Omar, you are quite correct, my error, NOexpire only stops things expiring, not the cleanup job itself ... Back to the problem(s) - what happened on the system prior to these issues ? M

mph999 · ‎08-06-2013

Thank you Omar, you are quite correct, my error, NOexpire only stops things expiring, not the cleanup job itself ... Back to the problem(s) - what happened on the system prior to these issues ? M

wannawin · ‎08-07-2013

Hello mph999

no activityt has been happened prior to this issue (not at network side or at DD side) , suddenly lots of image cleanup jobs were in hang state.

we increased the sybase database memory, earlier it was 500 MB and we will increase to 1 Gb.

[80101937@usnbub5500 global]$ cat server.conf
-n NB_usnbub5500
-x tcpip(LocalOnly=YES;ServerPort=13785) -gp 4096 -gd DBA -gk DBA -gl DBA -ti 0 -c 25M -ch 1G -cl 25M -zl -os 1M -o /usr/openv/db//log/server.log
-ud

And we also create one touch file , please find below..

[80101937@usnbub5500 global]$ cat emm.conf
USE_HASH=1

But still no luck, still round about 500 image cleanup jobs is in hung state..

Marianne · ‎08-07-2013

You were advised on more than one occation to log a Support call................

Handy NetBackup Links

wannawin · ‎08-07-2013

we logged a support call with symantec and its being 3 days but did not get solution yet :(

mph999 · ‎08-07-2013

What is the case number

wannawin · ‎08-07-2013

Its 04866945, please take a look and help.

VOX

Facing issue with Image cleanup Jobs