Catalog backup fails with STATUS CODE: 35

susindran_surul · ‎06-24-2014

Hi Team,

My catalog backup getting failed after upgrade from 7.5.0.4 to 7.6.0.2.All other backup running fine but this catalog backup getting failed.I could see below logs on bpdbm before fails.

Please help on here.

10:29:53.951 [6809] <2> DbmOdbcConnect::setIsolationLevel: (-7) Isolation Level=<32> (SNAPSHOT)
10:29:53.952 [6809] <2> DbmOdbcConnect::beginTransaction: (-7) transaction started
10:29:53.952 [6809] <16> ImageCatalogBackup::executeQuery: failed to execute /usr/openv/netbackup/bin/bpbackup -i -p Test-Catalog -h master -t 35 -pop 5436071 -s Full -cbustart 0 -cbuend 1403578455 -w (13)
10:29:53.952 [6809] <2> DbmOdbcView::DbmOdbcView: (-7.17) Executing "SELECT @@SPID" Bindings <>
10:29:53.953 [6809] <2> DbmOdbcView::DbmOdbcView: SQL_SUCCESS
10:29:53.954 [6809] <2> DbmOdbcView::DbmOdbcView: (-7.17) ConnectionId = '10953'
10:29:53.954 [6809] <2> DbmOdbcConnect::set_connection_id: -7 -> 10953
10:29:53.954 [6809] <2> DbmOdbcView::DbmOdbcView: (10953.17) Created
10:29:53.954 [6809] <2> update_DBM_Image: Entering
10:29:53.954 [6809] <2> update_DBM_Image: (10953.17) Executing "UPDATE DBM_Main.DBM_Image SET Expiration = 1403605793 WHERE ImageKey = ?" Bindings <1624269>
10:29:53.965 [6809] <2> update_DBM_Image: SQL_SUCCESS
10:29:53.965 [6809] <2> update_DBM_Image: Success executing (ImageKey = 1624269) (0)
10:29:53.966 [6809] <2> exec_catalog_DR_protection: Entering
10:29:53.966 [6809] <2> exec_catalog_DR_protection: classname=Test-Catalog, fqDRImagePath=<NULL>, status=13
10:29:53.966 [6809] <2> read_catdrinfo: ?
10:29:53.967 [6809] <2> read_catdrinfo_file: ?
10:29:53.981 [6809] <2> read_catdrinfo_file: EMAIL xx@xx.com
10:29:53.981 [6809] <2> read_catdrinfo_file: PATH /opt/openv/DR
10:29:53.981 [6809] <2> read_catdrinfo_file: FLAGS 0
10:29:53.981 [6809] <2> exec_catalog_DR_protection: Sending DR image info via email
10:29:53.981 [6809] <2> generate_DR_file_and_send_mail: Entering
10:29:53.986 [6809] <2> generate_DR_report: Entering, copynum:0, subject:, fqfilePath:, updateExistingDR:0, reportFP:1
10:29:53.987 [6809] <2> send_DR_report: Entering
10:29:53.988 [6809] <2> OpenSendmailPipe: /usr/lib/sendmail -t >/dev/null 2>/dev/null
10:29:53.990 [6809] <16> send_mail2: fopen() failed (2)
10:29:54.121 [6809] <4> db_error_add_to_file: Could not send DR report
10:29:54.121 [6809] <16> generate_DR_file_and_send_mail: Could not send DR report
10:29:54.122 [6809] <16> exec_catalog_DR_protection: Failed to send DR image to xx@xx.com (0)
10:29:54.122 [6809] <2> ImageCatalogBackup::executeQuery: Deleting /usr/openv/netbackup/db/images/master/1403000000/Test-Catalog_1403578455_FULL
10:29:54.124 [6809] <2> ImageCatalogBackup::executeQuery: Deleting /usr/openv/netbackup/db/images/master/1403000000/Test-Catalog_1403578132_FULL
10:29:54.125 [6809] <2> ~DbmLockFile: Successfully release lockfile /usr/openv/netbackup/bin/catbackup.lock file descriptor 7
10:29:54.125 [6809] <2> ImageCatalogBackup::~ImageCatalogBackup: Exiting

Thanks,

Susi.S

Marianne · ‎06-24-2014

So, incorrect symbolic link, pointing to itself instead of a folder. Seems someone wanted to ensure that all possible names for RAC nodes will save image files in the same folder, but created the link incorrectly. From what I see, the link should be directly under images folder, gbsRAC -> gbsdb-bkp. I cannot tell you why this is only causing an issue now. Simply delete this link.

Handy NetBackup Links

View solution in original post

Marianne · ‎06-24-2014

Noy enough info in bpdbm log extract...

What is the status code of the failed job?

How many streams are started by Catalog backup?
Which stream is failing?
Please show us all text in Details tab of failed job
as well as Policy config:

bppllist Test-Catalog -U

Can you confirm that the DR path is accessible and can be written to?
Can you see if DR-file was created?

Is sendmail working on the master?

Handy NetBackup Links

inn_kam · ‎06-24-2014

ANSWER the marriane question

also chk this tech note

Catalog backup fails with STATUS CODE: 35

Article:TECH191339

|

Created: 2012-06-20

|

Updated: 2013-04-25

|

Article URL http://www.symantec.com/docs/TECH191339

susindran_surul · ‎06-24-2014

Hi Marianne,

Please find the below for details log.

Backup getting failed on 3rd stream only after writing 170+GB..Already media server have 7200 client read timeout.

Send mail working fine.I can able to create touch file under DR directory.

Details Log :-
===========

06/24/2014 05:11:24 - Info nbjm (pid=6265) starting backup job (jobid=5436278) for client master, policy Test-Catalog, schedule Full
06/24/2014 05:11:24 - Info nbjm (pid=6265) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=5436278, request id:{9B251C74-FB55-11E3-A96B-00144F3C15DC})
06/24/2014 05:11:24 - requesting resource media1-hcart3-robot-tld-0
06/24/2014 05:11:24 - requesting resource master.NBU_CLIENT.MAXJOBS.master
06/24/2014 05:11:24 - requesting resource master.NBU_POLICY.MAXJOBS.Test-Catalog
06/24/2014 05:11:25 - granted resource master.NBU_CLIENT.MAXJOBS.master
06/24/2014 05:11:25 - granted resource master.NBU_POLICY.MAXJOBS.Test-Catalog
06/24/2014 05:11:25 - granted resource 001061
06/24/2014 05:11:25 - granted resource media1-000-hcart3-11
06/24/2014 05:11:25 - granted resource media1-hcart3-robot-tld-0
06/24/2014 05:11:25 - estimated 0 kbytes needed
06/24/2014 05:11:25 - Info nbjm (pid=6265) started backup (backupid=master_1403583085) job for client master, policy Test-Catalog, schedule Full on storage unit media1-hcart3-robot-tld-0
06/24/2014 05:11:29 - Info bpbrm (pid=14561) master is the host to backup data from
06/24/2014 05:11:29 - Info bpbrm (pid=14561) reading file list for client
06/24/2014 05:11:29 - started process bpbrm (pid=14561)
06/24/2014 05:11:29 - connecting
06/24/2014 05:11:30 - Info bpbrm (pid=14561) starting bpbkar on client
06/24/2014 05:11:30 - connected; connect time: 0:00:00
06/24/2014 05:11:31 - Info bpbkar (pid=23995) Backup started
06/24/2014 05:11:31 - Info bpbrm (pid=14561) bptm pid: 14562
06/24/2014 05:11:33 - Info bptm (pid=14562) start
06/24/2014 05:11:33 - Info bptm (pid=14562) using 524288 data buffer size
06/24/2014 05:11:33 - Info bptm (pid=14562) using 32 data buffers
06/24/2014 05:11:34 - Info bptm (pid=14562) start backup
06/24/2014 05:11:34 - Info bptm (pid=14562) backup child process is pid 14567
06/24/2014 05:11:34 - Info bptm (pid=14562) Waiting for mount of media id 001061 (copy 1) on server media1.
06/24/2014 05:11:34 - mounting 001061
06/24/2014 05:12:31 - Info bptm (pid=14562) media id 001061 mounted on drive index 10, drivepath /dev/rmt/8cbn, drivename media1-000-hcart3-11, copy 1
06/24/2014 05:12:31 - mounted 001061; mount time: 0:00:57
06/24/2014 05:12:32 - positioning 001061 to file 3
06/24/2014 05:12:58 - positioned 001061; position time: 0:00:26
06/24/2014 05:12:58 - begin writing
06/24/2014 11:29:00 - Error bpbrm (pid=14561) socket read failed: errno = 62 - Timer expired
06/24/2014 11:29:02 - Error bptm (pid=14562) media manager terminated by parent process
06/24/2014 11:29:53 - Info bpbkar (pid=23995) done. status: 13: file read failed
06/24/2014 11:29:53 - end writing; write time: 6:16:55
file read failed (13)

$ sudo ./bppllist Test-Catalog -U
------------------------------------------------------------

Policy Name: Test-Catalog

Policy Type: NBU-Catalog
Active: yes
Effective date: 06/23/2014 13:33:56
Mult. Data Streams: no
Client Encrypt: no
Checkpoint: no
Policy Priority: 0
Max Jobs/Policy: 1
Disaster Recovery: 0
Collect BMR info: no
Residence: media1-hcart3-robot-tld-0
Volume Pool: CatalogBackup
Server Group: *ANY*
Keyword: (none specified)
Data Classification: -
Residence is Storage Lifecycle Policy: no
Application Discovery: no
Discovery Lifetime: 0 seconds
ASC Application and attributes: (none defined)

Granular Restore Info: no
Ignore Client Direct: no
Enable Metadata Indexing: no
Index server name: NULL
Use Accelerator: no
HW/OS/Client: Solaris SunOS master

Include: CATALOG_DRIVEN_BACKUP

Schedule: Full
Type: Full Backup
Frequency: every 7 days
Excluded Dates----------
No specific exclude dates entered
No exclude days of week entered
PFI Recovery: 0
Maximum MPX: 1
Retention Level: 1 (2 weeks)
Number Copies: 1
Fail on Error: 0
Residence: (specific storage unit not required)
Volume Pool: (same as policy volume pool)
Server Group: (same as specified for policy)
Residence is Storage Lifecycle Policy: 0
Schedule indexing: 0
Daily Windows:
Sunday 00:00:00 --> Sunday 24:00:00
Monday 00:00:00 --> Monday 24:00:00
Tuesday 00:00:00 --> Tuesday 24:00:00
Wednesday 00:00:00 --> Wednesday 24:00:00
Thursday 00:00:00 --> Thursday 24:00:00
Friday 00:00:00 --> Friday 24:00:00
Saturday 00:00:00 --> Saturday 24:00:00

Catalog Disaster Recovery Configuration:
Email Address: xxx@xx.com
Disk Path: /opt/openv/DR
User Name: (none specified)
Pass Word: (none specified)
Critical policy: (none specified)

$

susindran_surul · ‎06-24-2014

HI ,

Above Tech not matching to our case.I could see DR path name is correct and i can able to create test file under that path.

Answered above all questions.

Thanks,

Susi.S

Marianne · ‎06-24-2014

Hopefully /opt/openv/DR is a separate mountpoint?
Dr-file should not be saved to the same filesystem as NBU catalogs.

To get back to the failure:

Yes, it looks like a Client Read timeout.

06/24/2014 05:12:58 - begin writing
06/24/2014 11:29:00 - Error bpbrm (pid=14561) socket read failed: errno = 62 - Timer expired
06/24/2014 11:29:02 - Error bptm (pid=14562) media manager terminated by parent process
06/24/2014 11:29:53 - Info bpbkar (pid=23995) done. status: 13: file read failed

To troubleshoot, we need the following logs:

On media server: bptm and bpbrm

On master: bpbkar

Please copy log files to .txt files that match process name (e.g. bptm.txt) and upload as File attachments.

Handy NetBackup Links

susindran_surul · ‎06-24-2014

No its on same mount point.We used to take backup on 7.5.0.4 on same path.Any issue for using same mount point?

I m pulling all logs and will update soon

Marianne · ‎06-24-2014

I do not believe the DR-path is causing the issue.
I was merely concerned because it looked like the same filesystem as the catalogs...
No issue there.

Another concern is Catalog backup that seems to be scheduled for only once per week.
When this issue is fixed, please add another schedule for daily backups.
It can be Diff or Inc.

As a matter of interest - how big is your catalog?

Hopefully logs will tell us why catalog backups seem to be going into a 'hang state', resulting in the timeout.

It may be necessary to increase logging levels - level 3 should be fine.
If you decide to open Support call with Symantec, they will need level 5 logs.

Handy NetBackup Links

mph999 · ‎06-24-2014

Need bpbkar log as well ....

Marianne · ‎06-24-2014

bpbkar is where we hope to see evidence of process 'hanging' on the master server.

bptm will just tell us up to when data was received, bpbrm will confirm the timeout.

So, really important that get master's bpbkar log that corresponds with media server logs.

We need pid=23995 in bpbkar log - if the media server logs correspond with job details above.

**** Update ****

We see in bptm that data was received at a steady space until 08:27:

08:27:03.160 [14562] <2> write_data: received first buffer (524288 bytes), begin writing data

Then nothing.... Just the termination from bpbrm at 10:29 because of the timeout.

bpbrm shows last catalog update as 07:54:

07:54:50.082 [14561] <2> bpbrm main: end write catalog
10:29:00.101 [14561] <16> bpbrm readline: socket read failed: errno = 62 - Timer expired

So, we need to see master server's bpbkar log from more or less 07:50 onwards (assuming that the system time on master and media server correspond).

Handy NetBackup Links

susindran_surul · ‎06-24-2014

Hi Marianne,

I could see on bpbkar below messages repeated till the end from starting of 8:29.After backup failed also this message repeated and i have killed manually bpbkar.

I could see below server oracle RAC server.I could see its having some sympolic link under images of particular directory but not able to read it.

What is that sympolic link?

$ ls -lart
total 154
lrwxrwxrwx 1 root root 6 Mar 23 2010 gbsRAC -> gbsRAC

$ more gbsRAC
gbsRAC: Number of symbolic links encountered during path name traversal exceeds MAXSYMLINKS
$

8:29:00.092 [23995] <2> bpbkar SelectFile: cwd=/opt/openv/netbackup/db/images/gbsdb-bkp path=gbsRAC
08:29:00.092 [23995] <2> check_symlink_touchfile: Ignore symlinks? ANS=0
08:29:00.093 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.093 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.094 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.094 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.094 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.095 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.095 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.095 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.096 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.096 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC

Marianne · ‎06-24-2014

So, incorrect symbolic link, pointing to itself instead of a folder. Seems someone wanted to ensure that all possible names for RAC nodes will save image files in the same folder, but created the link incorrectly. From what I see, the link should be directly under images folder, gbsRAC -> gbsdb-bkp. I cannot tell you why this is only causing an issue now. Simply delete this link.

Handy NetBackup Links

susindran_surul · ‎06-25-2014

Please find the logs from media server.I have around 8GB logs on master server.Is there any particular way to grep only this failed job related PID?

susindran_surul · ‎06-25-2014

Hi Marianne,

Yes..Its problem with that culprit link file.I have tested same senario on test server,I could see same error came.

Yesterday i have removed link fine and backup completed successfully.

For me to trace this link file took 3 days to find out.Hope this will help to otheres who and all going from 7.5 to 7.6.x.

Thanks to all.

Thanks,

Susi.S

VOX

Catalog Backup getting failed after upgrade from 7.5.0.4 to 7.6.0.2

Catalog backup fails with STATUS CODE: 35