08-02-2013 08:11 AM
Hello Team.
Round about 400 image cleanup jobs is in progress and they are in hung state.. because of this amster server was hung twice and we rebooted the server but after reboot again lots of image clean up jobs ran and due to capacity issue server was hung after every 8-10 hrs..
We have DD in our environment and what i found in logs is below..
23:00:19.920 [26303] <16> delete_by_backupid: OVsystem(/usr/openv/netbackup/bin/admincmd/nbdelete -backup_id usnbst8017-brn_1375198910) failed (174)
23:00:19.944 [26303] <2> put_string: cannot write data to network: Broken pipe (32)
23:00:19.944 [26303] <2> db_senddata: put_string(): connection dropped or not connected, Broken pipe, (32)
23:00:19.944 [26303] <2> db_sendSTATUS: db_senddata() failed: network connection broken, (40)
Can you please clear me , whether DD clean up jobs is showing in NBU activity monitor or not?
Regards
Davinder
08-02-2013 09:54 AM
Data Domain's internal scheduled cleanup jobs do NOT appear in the NetBackup Activity Monitor.
The cleanup jobs scheduled on the Data Domain will only delete an image once the NetBackup Image Cleanup job has marked a block as expired. The Netbackup Image Cleanup job must run first to mark the images to be expired, then the Data Domain scheduled cleanup job will remove the unneeded blocks of data from disk.
You messages above indicate that you have connectivity issues between your media server and your Data Domain.
08-02-2013 10:49 AM
Hi .
Now connectivity issue resolved means one issue resolved.
But still image cleaup issue persiust, more then 400 image clean up job is in progress from yesterday and server was rebooted twice it was in hung state..
when i double clicked on image clean up job , it is empty , it is not showin any information..
08-02-2013 11:07 AM
Hi.
Lots of bpdbm processes is running in bpps -a output
08-02-2013 11:31 AM
Hi.
Please find attached bpdbm consistence check
08-02-2013 11:39 AM
You have quite a few corrupted files, there are a lot of messages:
>>PRIMARY_COPY is set to an invalid copy >>EXPIRATION is not set to the next valid copy to expire
I would suggest calling in a support ticket with Symantec to have them go through correcting all the issues found in the output you attached.
And in the meantime, watch the first few Image Cleanup jobs to see if they are, in fact, doing anything. I've had more than one in the queue at once, but the first one is usually the only one doing any actual modification of the catalog. The rest just seem to sit in a hung state, then run one at a time until they complete.
08-02-2013 11:44 AM
Hi Ron.
Can we stop this image cleanup job for the time being till this issue resolved..
i already created one touch file
-rwxrwxrwx 1 root root 0 Aug 1 08:25 NOexpire
but it is of no use.. still image cleanup jobs are running ..
Please suggest..
08-02-2013 12:14 PM
Hi Team.
Is there any method to stop these image cleanup jobs..
08-02-2013 12:25 PM
08-02-2013 12:28 PM
Hello mph999
i already craeted this file and below are with 777 permissions.
I stopped NBU services and after that craeted touch file NOexpire with 777 permissions and after that start nbu services but still image clean job is running in bulk and this hang the server..
-rwxrwxrwx 1 root root 0 Aug 1 08:25 NOexpire
08-03-2013 08:40 AM
NOExpire will only prevent the images to expire but the image cleanup will continue to run after each backup session completes.
08-03-2013 11:27 AM
As per previous advice - log a support call!!
08-04-2013 06:21 PM
Hello Team.
did movement of corrupted jobs by command bpdbm -consistency 2 -move and after completion of this command we did the reboot but still loads of image cleanup jobs is in hung state(round about 450)
Now another issue has been raised,
[80101937@usnbub5500 admincmd]$ sudo ./nbemmcmd -listhosts
Password:
NBEMMCMD, Version:7.1.0.4
Failed to initialize EMM connection. Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.
[80101937@usnbub5500 bin]$ sudo ./vmoprcmd -d
Password:
network protocol error (39)
Loads of backup has been failing with errror code 800, below is the description of one or two backups..
08/03/2013 11:20:11 - Info nbjm (pid=6751) starting backup job (jobid=2387529) for client @@@@, policy @@@@, schedule Full
08/03/2013 11:20:11 - Info nbjm (pid=6751) requesting MEDIA_SERVER_WITH_ATTRIBUTES resources from RB for backup job (jobid=2387529, request id:{91ECC5E6-FC58-11E2-B113-871F552E92D9})
08/03/2013 11:20:11 - requesting resource usnbub5500-omaha-local-disk-su
08/03/2013 11:20:11 - requesting resource usnbub5500.NBU_CLIENT.MAXJOBS.trackwt1ora-t
08/03/2013 11:20:11 - requesting resource usnbub5500.NBU_POLICY.MAXJOBS.DB-trackwt1-dblogs-no_altsite
08/03/2013 11:20:11 - Error nbjm (pid=6751) NBU status: 800, EMM status: CORBA communication failure
08/03/2013 11:30:11 - Info nbjm (pid=6751) starting backup job (jobid=2387529) for client trackwt1ora-t, policy DB-trackwt1-dblogs-no_altsite, schedule Full
resource request failed (800)
08-06-2013 02:42 PM
08-06-2013 02:42 PM
08-07-2013 03:40 AM
Hello mph999
no activityt has been happened prior to this issue (not at network side or at DD side) , suddenly lots of image cleanup jobs were in hang state.
we increased the sybase database memory, earlier it was 500 MB and we will increase to 1 Gb.
[80101937@usnbub5500 global]$ cat server.conf
-n NB_usnbub5500
-x tcpip(LocalOnly=YES;ServerPort=13785) -gp 4096 -gd DBA -gk DBA -gl DBA -ti 0 -c 25M -ch 1G -cl 25M -zl -os 1M -o /usr/openv/db//log/server.log
-ud
And we also create one touch file , please find below..
[80101937@usnbub5500 global]$ cat emm.conf
USE_HASH=1
But still no luck, still round about 500 image cleanup jobs is in hung state..
08-07-2013 03:54 AM
You were advised on more than one occation to log a Support call................
08-07-2013 06:44 AM
we logged a support call with symantec and its being 3 days but did not get solution yet :(
08-07-2013 06:46 AM
What is the case number
08-07-2013 06:54 AM
Its 04866945, please take a look and help.