cancel
Showing results for 
Search instead for 
Did you mean: 

Image cleanup keeps failing with 158 error failed accessing daemon lock file (158)

Johncu
Level 3
Hi

For the last few days the image cleanup has failed with 158 'failed accessing daemon lock file (158)'
3/26/2009 14:31:39 - Info bpdbm (pid=24116) processing client wellingtonbk
03/26/2009 14:31:40 - Info bpdbm (pid=24116) deleted 40 expired records, compressed 35, tir removed 0, deleted 0 expired copies
failed accessing daemon lock file (158)

I have turned the vebose of the bpdbm log up to 5 and this this the output:
14:31:40.164 [24116] <2> IsCatalogCleanupTerminated: Terminated = 0
14:31:40.165 [24116] <2> delete_expired_backups: /usr/openv/netbackup/db/images/wellingtonbk/1238000000/ALL-LOCAL-DRIVES-UNIX-LIV                       E-B_1238007680_INCR expires Wed Apr  8 20:01:20 2009 (1239217280) - retaining
14:31:40.165 [24116] <2> IsCatalogCleanupTerminated: Terminated = 0
14:31:40.166 [24116] <2> delete_unneeded_timestamp_dirs: Found 9, deleted 0
14:31:40.166 [24116] <2> index_dir: ?
14:31:40.166 [24116] <2> index_dir: ?
14:31:40.166 [24116] <2> clean_stream: wellingtonbk
14:31:40.169 [24116] <2> IsCatalogCleanupTerminated: Terminated = 0
14:31:40.173 [24116] <2> db_error_add_to_file: dberrorq.c:midnite = 1238025600
14:31:40.220 [24116] <4> delete_expired_backups: image delete 7650 ms, image verify 0 ms, tir_info_remove 0 ms, copies delete 0 m                       s
14:31:40.220 [24116] <2> db_error_add_to_file: dberrorq.c:midnite = 1238025600
14:31:40.232 [24116] <4> delete_expired_backups: compression 162710 ms, # of bptm calls 0
14:31:40.232 [24116] <4> delete_expired_backups: deleted 40 expired records, compressed 35, tir removed 0, deleted 0 expired copi                       es
14:31:40.233 [24116] <2> db_error_add_to_file: dberrorq.c:midnite = 1238025600
14:31:40.256 [24116] <2> IsCatalogCleanupTerminated: Terminated = 0
14:31:40.675 [24116] <16> delete_expired_backups: OVsystem(/usr/openv/netbackup/bin/admincmd/nbdelete -allvolumes -jobid 206735)                        failed (158)
14:31:40.675 [24116] <2> IsCatalogCleanupTerminated: Terminated = 0
14:31:40.676 [24116] <2> job_end_try: Done
14:31:40.679 [24116] <2> job_monitoring_exex: ACK disconnect
14:31:40.679 [24116] <2> job_disconnect: Disconnected
14:31:40.681 [24116] <4> delete_expired_backups: Exiting
Looking on the Symantec support it suggests this maybe due to a permission problem, but I have not managed to get any more info. The Output does suggest that images are being deleted, so I am guessing one rouge image is causing the issue.

I am running a Solaris 10 master server and 3 Solaris 10 media servers and 2 Redhat media servers.
The Netbackup version is 6.5.2a

Many thanks for any input. John
1 ACCEPTED SOLUTION

Accepted Solutions

Johncu
Level 3
Hi

I have solved this with a reboot of one of my Linux media servers.

The media server used for one of my DataDomain's had a mount hanging between the media server and the DD. At the time the mount was cleared and remounted but the Media server was not rebooted at the time. I did a scheduled reboot of this media server to follow up from this issue and now the image cleanup seems to be working OK.

Many thanks for your help.

View solution in original post

7 REPLIES 7

Stumpr2
Level 6
Are there any core dumps from bpdbm?

Johncu
Level 3
I'm not getting any Core dumps,

I will raise a call out today and see what Symantec come back with.

Johncu
Level 3

 Thanks for the reply, I did see this bug report, but this is for Windows, I am on Solaris, this is for v6, I am on 6.5.2a and also my problem does not affect the actual running backups.

Thanks

Stumpr2
Level 6
This is from an older version of netbackup but it may prove useful

A lock file, /usr/openv/netbackup/bin/cleanup.lock is also created/updated with a time stamp and a bpdbm process id number (pid). The lock file always exists and is in the format of:  <ctime stamp> <bpdbm pid #> For time stamp conversion, use: /usr/openv/netbackup/bin/bpdbm -ctime xxxxxxxxxx.  

If a second cleanup is submitted, it will exit with the message "Cleanup already active" and will not do the cleanup of the subsequent bpimage -cleanup command.  Only the bpdbm log will show this effect.  A return code of zero will be made to the bpimage command.

If the bpsched process is continuously busy and cannot find a window to start the database cleanup process, it may be necessary to script the cleanup. To script the cleanup, the first step in the script needs to check for the existence of the lock file and if it does not exist, start a single cleanup.  If the lock file exists (as it should), collect the bpdbm pid # listed and then check if that process is active either using a bpps command or a ps command.  If that pid is still active, it would be best to wait before submitting the next bpimage -cleanup command.  Do not remove the cleanup.lock file unless it exists when NetBackup is not running, otherwise there is a risk of data corruption. It is updated the next time a database cleanup is run.


This is from my master server NB 6.5.3.1 with no cleanup running
# ls -l /usr/openv/netbackup/bin/cleanup.lock
-rw-------   1 root     root          17 Mar 27 03:59 /usr/openv/netbackup/bin/cleanup.lock

# cat /usr/openv/netbackup/bin/cleanup.lock
1238144391 22837

# ps -ef | grep 22837
#


If I were having probems, I would stop netbackup and see if the lock file is removed.
Then if it still existed when netbackup is shutdown, then I would rm the file

Johncu
Level 3

Thanks

I will stop/start later today and report back.

Johncu
Level 3
Hi

I have solved this with a reboot of one of my Linux media servers.

The media server used for one of my DataDomain's had a mount hanging between the media server and the DD. At the time the mount was cleared and remounted but the Media server was not rebooted at the time. I did a scheduled reboot of this media server to follow up from this issue and now the image cleanup seems to be working OK.

Many thanks for your help.