07-27-2018 01:00 AM
07-27-2018 06:43 AM
Please compare the permissions on the /usr/openv/netbackup/logs/user_ops/ with the client that is working fine.
With 8.1 you will also have to check if client has been issued a certificate for secure communication.
07-29-2018 11:49 PM
The issue seems to be with the 'progress log' :
Error bprd (pid=6411) Unable to write progress log on client
Is there something different about the progress logs for these 2 DBs ?
Have you tried to compare scripts with those of the working DBs?
Have you checked bprd log on the master to see if there is more info w.r.t. to the log path?
07-30-2018 02:33 AM - edited 07-30-2018 02:35 AM
Actually, as I said, the 2 DBs on the same client is working fine, and 2 aren't. Also, the permissions are 777 to the user_ops, and its sub directories.
The DBs that are getting successful are writing the progress log well in the folder but the other 2 DBs doesn't.
Since, 2 DBs are writing fine, this doesn't look like a secure connection issue.
Since it started happening after upgrading from 7.6.0.2 to 8.1, I checked what has changed but couldn't find anything except the WHITELIST that I mentioned.
07-30-2018 02:39 AM
Have you tried to check bprd log on the master server?
The log file name and path that is having the issue is normally documented in bprd log.
About the scripts - I was wondering if there was something different about the 2 non-working DBs.
Maybe some other path is specified for progress logs. That is why I asked if the scripts could be compared.
07-30-2018 07:28 AM - edited 07-30-2018 09:00 AM
@Marianne Thanks for your time for looking into it.
I don't find anything wrong with the script. Morever, the backups were fine before the upgrade.. I am attaching the scripts for your reference and bpdb2 logs for working db and non working db for your reference.
PS: bprd logs were not building up on master and I guess I have to restart the services on master which I cannot do it now (unless you suggest if there is any other option so that it starts generating).
07-31-2018 01:18 AM
Yes - bprd needs to be restarted to enable logging.
About the scripts - I can see that there are no output files specified in the backup scripts.
The only difference seems to be the Database name and User.
And the fact that the non-working one loads a profile before the backup command is started:
. /db2/BID/sqllib/db2profile;db2 BACKUP DATABASE $MY_DB2 $MY_SCHED online LOAD $MY_LIB BUFFER 1024
Any idea why? (I realize that this was working on 7.6, but we need to look at this as if it is a new issue.)
What is listed in this db2profile that the other user for working DB does not need?
My suggestion:
Find a time to restart NBU so that we can have bprd logs.
Log on to the client as user db2bid.
Manually run the commands in CMD_LINE and watch the output.
I will have a look at the logs a bit later in the day (if time permits...)
07-31-2018 06:27 AM
Hi @N_J
I had a look at the bpdb2 logs - there are 3 -
db2bid.073018_00001.txt
db2d20.073018_00001.txt
log.073018.txt
Neither of these logs contain complete info for a backup from beginning to end.
There is no reference whatsoever to a backup in db2d20.073018_00001.txt.
The other 2 logs contain errors and the name of user_ops log files, but the beginning of the backup and progress of the backup do not appear in these logs.
If the backups started the previous day, we need those as well.
log.073018.txt starts with an error shortly after midnight - we can only assume that the backup started the previous evening?
Error:
00:21:25.230 [28002] <16> readCommFile: ERR - timed out after 3600 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/28002.0.1532899283
00:21:25.231 [28002] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/28002.0.1532899283>
00:21:25.231 [28002] <16> CreateNewImage: ERR - serverResponse() failed
00:21:25.231 [28002] <16> VxBSACreateObject: ERR - Could not create new image with file /DB2/AUDBOE14/LOGFILE/node0000/db2bid/C0000000_S0020131.LOG.
It seems as if the backup failed on an archive log file :
/DB2/AUDBOE14/LOGFILE/node0000/db2bid/C0000000_S0020131.LOG
What is worrying is the timeout after 1 hour (3600 sec).
Can you see what is logged in /usr/openv/netbackup/logs/user_ops/dbext/logs/28002.0.1532899283 ?
And find the bpdb2 logs for the previous day to see what happened since the beginning of the backup?
Especially in the hour before 00:21.
Another timeout error for another log a while later:
00:54:29.610 [6764] <16> readCommFile: ERR - timed out after 3600 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/6764.0.1532901267
00:54:29.610 [6764] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/6764.0.1532901267>
00:54:29.610 [6764] <16> CreateNewImage: ERR - serverResponse() failed
00:54:29.610 [6764] <16> VxBSACreateObject: ERR - Could not create new image with file /DB2/BOE14/LOGFILE/node0000/db2bid/C0000000_S0033833.LOG.
There are more user_ops logs listed here along with backup failure for DB2 archive logs.
The fact that the backup seems to fail on archive logs poses another question :
Is there a difference between archive log methods for working and non-working backups?
Does the profile that is loaded in the non-working script (/db2/BID/sqllib/db2profile), perhaps reference a different db2.conf file with different archive log method?
PS:
It will probably be best to log a Support call with Veritas and work with a knowledgeable support engineer to track down the issue.
You will need all of the logs listed in NBU for DB2 manual (including bprd on the master and bpbrm and bptm on the master server.
Support will ask for high level logs - they have the time and the tools to sift through these massive logs.