Solved: thanks for the advice, but

wellssh · ‎04-02-2014

For the last week or so we've noticed that our Image Cleanup jobs are not completing successfully. The status is 1 with the following job details:

4/2/2014 12:53:19 AM - Info bpdbm(pid=4876) image catalog cleanup
4/2/2014 12:53:19 AM - Info bpdbm(pid=4876) Cleaning up tables in the relational database
4/2/2014 12:53:19 AM - Info bpdbm(pid=4876) deleting images which expire before Wed Apr 02 00:53:19 2014 (1396414399)
4/2/2014 12:53:43 AM - Info nbdelete(pid=7664) deleting expired images. Media Server: wilbs003.wilm.ppdi.com Media: @aaaag
4/2/2014 12:54:13 AM - Error nbdelete(pid=7664) Cannot obtain resources for this job : error [25]
4/2/2014 12:54:15 AM - Info nbdelete(pid=7664) deleting expired images. Media Server: wilbs001 Media: @aaaai
4/2/2014 12:54:45 AM - Error nbdelete(pid=7664) Cannot obtain resources for this job : error [25]
4/2/2014 12:54:47 AM - Info nbdelete(pid=7664) deleting expired images. Media Server: wildisk Media: D:
4/2/2014 12:54:54 AM - Error nbdelete(pid=7664) Fragments were not removed. Server wildisk is invalid (37 )
4/2/2014 12:54:54 AM - Warning bpdbm(pid=4876) nbdelete failed with status (37)
4/2/2014 12:54:54 AM - Info bpdbm(pid=4876) deleted 20 expired records, compressed 0, tir removed 0, deleted 38 expired copies
the requested operation was partially successful(1)

The job was successfully completed, but some files may have been
busy or unaccessible. See the problems report or the client's logs for more details

Any ideas on where to start/look, and can I perform the same processes manually on the Master server?

Thanks,

Sven

Deb_Wilmot · ‎04-03-2014

Interesting issue. You might want to create the bpdm log on the media servers to see if we ever connect. A 25 typically is a socket connection error(normally reverse hostname lookup).

Another thing that might be helpful is to check the database tables to see if you still have an entity named wildisk in the tables.

To do that, from the master run: (Windows) <install_path>Veritas\NetBackup\bin\nbdb_unload <path>

(Unix) /usr/openv/db/bin/nbdb_nload <path>

** Create a directory somewhere to dump the data too - insert that into 'path' in the command.

Run findstr or grep to see if wildisk is in any of the tables. If that entity is in the tables, try to use nbemmcmd -deletehost (in the NetBackup\bin\admincmd directory).

Fields required are:

*** If you don't know the type of machine - try running nbemmcmd -listhosts to if the type of machine is listed in that output.

Also note if there are images still assigned to that host, the deletion will fail (which I suspect since we're trying to clean it).

If this information doesn't help, I'd suggest you open a support case and have someone help you fix the invalid references in the database.

Deb

View solution in original post

Mark_Solutions · ‎04-02-2014

Try:

nbdelete -allvolumes -force

Then run:

bpimage -cleanup -allclients

to make sure it worked

#edit#

work checking if anything else has been left behind from that media server too

wellssh · ‎04-02-2014

thanks for the advice, but I'm seeing the following error show up on the activity monitor for the image cleanup job:

4/2/2014 11:43:01 AM - begin
4/2/2014 11:43:01 AM - Info nbdelete(pid=7888) deleting expired images. Media Server: wilbs003.wilm.ppdi.com Media: @aaaag
4/2/2014 11:43:30 AM - Error nbdelete(pid=7888) Cannot obtain resources for this job : error [25]

Ideas?

Thanks,

Sven

Mark_Solutions · ‎04-02-2014

So what is the state of that media server and what type of storage does / did it have?

wellssh · ‎04-02-2014

The media server is up and running, as is the Master server (which is also a media server). The images we are trying to cleanup are from backup-to-disk, so Disk storage (Data Domain).

I also received these errors:

4/2/2014 1:05:50 PM - begin
4/2/2014 1:05:50 PM - Info nbdelete(pid=7320) deleting expired images. Media Server: wildisk Media: D:
4/2/2014 1:05:50 PM - Error nbdelete(pid=7320) Fragments were not removed. Server wildisk is invalid (37 )
4/2/2014 1:05:50 PM - end ; elapsed time: 00:00:00
operation requested by an invalid server(37)

However, we've never had a media server called 'WILDISK', so this is odd.

Thanks,

Sven

SymTerry · ‎04-02-2014

Very strange indeed. Have you ever had any servers named 'WILDISK' at some point? It had to have come from somewhere.

Just in looking, I found in some cases a Status Code 37 is caused by corrupt lock files within the master server's bprd.d directory (/usr/openv/netbackup/bin on a UNIX master server, or <install_path>\veritas\netbackup\bin on a Windows master server). I wonder if that is some how preventing the image cleanup?

You could try to clear up the corrupt lock files, use the following steps:
1. Stop all NetBackup daemons/services on the master server
2. While all NetBackup daemons/services on the master server are stopped, rename the bprd.d directory to bprd.d_old
3. Recreate a new, empty bprd.d directory on the master server
4. Start the master server NetBackup daemons/services

Marianne · ‎04-02-2014

You NEVER had a media server called wildisk?

So, where do you you think NBU is getting the name from?

Do you have bpdbm log on the master server? If not, create the folder and restart NBU.
Next image cleanup attempt should log additional info to bpdbm log. Hopefully image id as well.

About the rest of the image cleanup failures (Cannot obtain resources for this job : error [25] ) :

Is this a new issue?
Or a regular issue on a Tuesday morning when DD maintenance is running?

Ensure bpdm log folders exist on all media servers. This process will log image cleanup on each of the media servers.

Check DD for logs as well during this period.

Handy NetBackup Links

mnolan · ‎04-02-2014

I never recommend running nbdelete -allvolumes -force

as the force of the option causes NetBackup to forget about an image which can create a storage leak, specially on DataDomains.

wellssh · ‎04-03-2014

Thanks for this suggestion. I've just completed it and will let things "sit" overnight, when the image cleanup occurs, and see what happens in the morning.

Regards,

Sven

wellssh · ‎04-03-2014

There may have an instance of WILDISK at one point years and years (ie. 5-8yrs) before my time, back when we were running NBU 5X and NBU 6X, but there hasn't been an instance of this since then.

The bpdbm log folders exist, and have existed for quite some time.

The Error [25] issue appears to be new, at least in the last 2 weeks. I don't see any other Image Cleanup jobs in the Activity Monitor past the last 11 days, seems these jobs roll over in the Activity Monitor.

Last night's Image Cleanup resulted in the same issue, so we'll see how tonight's Image Cleanup job 'acts' and how the logs look in the morning.

Thanks,

Sven

Deb_Wilmot · ‎04-03-2014

Interesting issue. You might want to create the bpdm log on the media servers to see if we ever connect. A 25 typically is a socket connection error(normally reverse hostname lookup).

Another thing that might be helpful is to check the database tables to see if you still have an entity named wildisk in the tables.

To do that, from the master run: (Windows) <install_path>Veritas\NetBackup\bin\nbdb_unload <path>

(Unix) /usr/openv/db/bin/nbdb_nload <path>

** Create a directory somewhere to dump the data too - insert that into 'path' in the command.

Run findstr or grep to see if wildisk is in any of the tables. If that entity is in the tables, try to use nbemmcmd -deletehost (in the NetBackup\bin\admincmd directory).

Fields required are:

*** If you don't know the type of machine - try running nbemmcmd -listhosts to if the type of machine is listed in that output.

Also note if there are images still assigned to that host, the deletion will fail (which I suspect since we're trying to clean it).

If this information doesn't help, I'd suggest you open a support case and have someone help you fix the invalid references in the database.

Deb

VOX

NBU 7.5.04 - Image Cleanup Status 1