cancel
Showing results for 
Search instead for 
Did you mean: 

Catalog Backup getting failed after upgrade from 7.5.0.4 to 7.6.0.2

susindran_surul
Level 6

Hi Team,

My catalog backup getting failed after upgrade from 7.5.0.4 to 7.6.0.2.All other backup running fine but this catalog backup getting failed.I could see below logs on bpdbm before fails.

Please help on here.

10:29:53.951 [6809] <2> DbmOdbcConnect::setIsolationLevel: (-7) Isolation Level=<32> (SNAPSHOT)
10:29:53.952 [6809] <2> DbmOdbcConnect::beginTransaction: (-7) transaction started
10:29:53.952 [6809] <16> ImageCatalogBackup::executeQuery: failed to execute /usr/openv/netbackup/bin/bpbackup -i -p Test-Catalog -h master -t 35 -pop 5436071 -s Full -cbustart 0 -cbuend 1403578455 -w (13)
10:29:53.952 [6809] <2> DbmOdbcView::DbmOdbcView: (-7.17) Executing "SELECT @@SPID" Bindings <>
10:29:53.953 [6809] <2> DbmOdbcView::DbmOdbcView: SQL_SUCCESS
10:29:53.954 [6809] <2> DbmOdbcView::DbmOdbcView: (-7.17) ConnectionId = '10953'
10:29:53.954 [6809] <2> DbmOdbcConnect::set_connection_id: -7 -> 10953
10:29:53.954 [6809] <2> DbmOdbcView::DbmOdbcView: (10953.17) Created
10:29:53.954 [6809] <2> update_DBM_Image: Entering
10:29:53.954 [6809] <2> update_DBM_Image: (10953.17) Executing "UPDATE DBM_Main.DBM_Image SET Expiration = 1403605793 WHERE ImageKey = ?" Bindings <1624269>
10:29:53.965 [6809] <2> update_DBM_Image: SQL_SUCCESS
10:29:53.965 [6809] <2> update_DBM_Image: Success executing (ImageKey = 1624269) (0)
10:29:53.966 [6809] <2> exec_catalog_DR_protection: Entering
10:29:53.966 [6809] <2> exec_catalog_DR_protection: classname=Test-Catalog, fqDRImagePath=<NULL>, status=13
10:29:53.966 [6809] <2> read_catdrinfo: ?
10:29:53.967 [6809] <2> read_catdrinfo_file: ?
10:29:53.981 [6809] <2> read_catdrinfo_file: EMAIL xx@xx.com
10:29:53.981 [6809] <2> read_catdrinfo_file: PATH /opt/openv/DR
10:29:53.981 [6809] <2> read_catdrinfo_file: FLAGS 0
10:29:53.981 [6809] <2> exec_catalog_DR_protection: Sending DR image info via email
10:29:53.981 [6809] <2> generate_DR_file_and_send_mail: Entering
10:29:53.986 [6809] <2> generate_DR_report: Entering, copynum:0, subject:, fqfilePath:, updateExistingDR:0, reportFP:1
10:29:53.987 [6809] <2> send_DR_report: Entering
10:29:53.988 [6809] <2> OpenSendmailPipe: /usr/lib/sendmail -t >/dev/null 2>/dev/null
10:29:53.990 [6809] <16> send_mail2: fopen() failed (2)
10:29:54.121 [6809] <4> db_error_add_to_file: Could not send DR report
10:29:54.121 [6809] <16> generate_DR_file_and_send_mail: Could not send DR report
10:29:54.122 [6809] <16> exec_catalog_DR_protection: Failed to send DR image to xx@xx.com (0)

10:29:54.122 [6809] <2> ImageCatalogBackup::executeQuery: Deleting /usr/openv/netbackup/db/images/master/1403000000/Test-Catalog_1403578455_FULL
10:29:54.124 [6809] <2> ImageCatalogBackup::executeQuery: Deleting /usr/openv/netbackup/db/images/master/1403000000/Test-Catalog_1403578132_FULL
10:29:54.125 [6809] <2> ~DbmLockFile: Successfully release lockfile /usr/openv/netbackup/bin/catbackup.lock file descriptor 7
10:29:54.125 [6809] <2> ImageCatalogBackup::~ImageCatalogBackup: Exiting

 

Thanks,

Susi.S

 

1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Level 6
Partner    VIP    Accredited Certified
So, incorrect symbolic link, pointing to itself instead of a folder. Seems someone wanted to ensure that all possible names for RAC nodes will save image files in the same folder, but created the link incorrectly. From what I see, the link should be directly under images folder, gbsRAC -> gbsdb-bkp. I cannot tell you why this is only causing an issue now. Simply delete this link.

View solution in original post

13 REPLIES 13

Marianne
Level 6
Partner    VIP    Accredited Certified

Noy enough info in bpdbm log extract...

What is the status code of the failed job?

How many streams are started by Catalog backup?
Which stream is failing? 
Please show us all text in Details tab of failed job
as well as Policy config:

bppllist Test-Catalog -U

Can you confirm that the DR path is accessible and can be written to?
Can you see if DR-file was created?

Is sendmail working on the master?
 

 

inn_kam
Level 6
Partner Accredited

ANSWER the marriane question

also chk this tech note

Catalog backup fails with STATUS CODE: 35

Article:TECH191339  |  Created: 2012-06-20  |  Updated: 2013-04-25  |  Article URL http://www.symantec.com/docs/TECH191339

susindran_surul
Level 6

Hi Marianne,

Please find the below for details log.

Backup getting failed on 3rd stream only after writing 170+GB..Already media server have 7200 client read timeout.

Send mail working fine.I can able to create touch file under DR directory.

Details Log :-
===========

06/24/2014 05:11:24 - Info nbjm (pid=6265) starting backup job (jobid=5436278) for client master, policy Test-Catalog, schedule Full
06/24/2014 05:11:24 - Info nbjm (pid=6265) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=5436278, request id:{9B251C74-FB55-11E3-A96B-00144F3C15DC})
06/24/2014 05:11:24 - requesting resource media1-hcart3-robot-tld-0
06/24/2014 05:11:24 - requesting resource master.NBU_CLIENT.MAXJOBS.master
06/24/2014 05:11:24 - requesting resource master.NBU_POLICY.MAXJOBS.Test-Catalog
06/24/2014 05:11:25 - granted resource  master.NBU_CLIENT.MAXJOBS.master
06/24/2014 05:11:25 - granted resource  master.NBU_POLICY.MAXJOBS.Test-Catalog
06/24/2014 05:11:25 - granted resource  001061
06/24/2014 05:11:25 - granted resource  media1-000-hcart3-11
06/24/2014 05:11:25 - granted resource  media1-hcart3-robot-tld-0
06/24/2014 05:11:25 - estimated 0 kbytes needed
06/24/2014 05:11:25 - Info nbjm (pid=6265) started backup (backupid=master_1403583085) job for client master, policy Test-Catalog, schedule Full on storage unit media1-hcart3-robot-tld-0
06/24/2014 05:11:29 - Info bpbrm (pid=14561) master is the host to backup data from
06/24/2014 05:11:29 - Info bpbrm (pid=14561) reading file list for client
06/24/2014 05:11:29 - started process bpbrm (pid=14561)
06/24/2014 05:11:29 - connecting
06/24/2014 05:11:30 - Info bpbrm (pid=14561) starting bpbkar on client
06/24/2014 05:11:30 - connected; connect time: 0:00:00
06/24/2014 05:11:31 - Info bpbkar (pid=23995) Backup started
06/24/2014 05:11:31 - Info bpbrm (pid=14561) bptm pid: 14562
06/24/2014 05:11:33 - Info bptm (pid=14562) start
06/24/2014 05:11:33 - Info bptm (pid=14562) using 524288 data buffer size
06/24/2014 05:11:33 - Info bptm (pid=14562) using 32 data buffers
06/24/2014 05:11:34 - Info bptm (pid=14562) start backup
06/24/2014 05:11:34 - Info bptm (pid=14562) backup child process is pid 14567
06/24/2014 05:11:34 - Info bptm (pid=14562) Waiting for mount of media id 001061 (copy 1) on server media1.
06/24/2014 05:11:34 - mounting 001061
06/24/2014 05:12:31 - Info bptm (pid=14562) media id 001061 mounted on drive index 10, drivepath /dev/rmt/8cbn, drivename media1-000-hcart3-11, copy 1
06/24/2014 05:12:31 - mounted 001061; mount time: 0:00:57
06/24/2014 05:12:32 - positioning 001061 to file 3
06/24/2014 05:12:58 - positioned 001061; position time: 0:00:26
06/24/2014 05:12:58 - begin writing
06/24/2014 11:29:00 - Error bpbrm (pid=14561) socket read failed: errno = 62 - Timer expired
06/24/2014 11:29:02 - Error bptm (pid=14562) media manager terminated by parent process
06/24/2014 11:29:53 - Info bpbkar (pid=23995) done. status: 13: file read failed
06/24/2014 11:29:53 - end writing; write time: 6:16:55
file read failed  (13)

 

$ sudo ./bppllist Test-Catalog -U
------------------------------------------------------------

Policy Name:       Test-Catalog

  Policy Type:         NBU-Catalog
  Active:              yes
  Effective date:      06/23/2014 13:33:56
  Mult. Data Streams:  no
  Client Encrypt:      no
  Checkpoint:          no
  Policy Priority:     0
  Max Jobs/Policy:     1
  Disaster Recovery:   0
  Collect BMR info:    no
  Residence:           media1-hcart3-robot-tld-0
  Volume Pool:         CatalogBackup
  Server Group:        *ANY*
  Keyword:             (none specified)
  Data Classification:       -
  Residence is Storage Lifecycle Policy:    no
  Application Discovery:      no
  Discovery Lifetime:      0 seconds
  ASC Application and attributes: (none defined)

  Granular Restore Info:  no
  Ignore Client Direct:  no
  Enable Metadata Indexing:  no
  Index server name:  NULL
  Use Accelerator:  no
  HW/OS/Client:  Solaris       SunOS         master

  Include:  CATALOG_DRIVEN_BACKUP

  Schedule:              Full
    Type:                Full Backup
    Frequency:           every 7 days
     Excluded Dates----------
        No specific exclude dates entered
        No exclude days of week entered
    PFI Recovery:        0
    Maximum MPX:         1
    Retention Level:     1 (2 weeks)
    Number Copies:       1
    Fail on Error:       0
    Residence:           (specific storage unit not required)
    Volume Pool:         (same as policy volume pool)
    Server Group:        (same as specified for policy)
    Residence is Storage Lifecycle Policy:         0
    Schedule indexing:   0
    Daily Windows:
          Sunday     00:00:00  -->  Sunday     24:00:00
          Monday     00:00:00  -->  Monday     24:00:00
          Tuesday    00:00:00  -->  Tuesday    24:00:00
          Wednesday  00:00:00  -->  Wednesday  24:00:00
          Thursday   00:00:00  -->  Thursday   24:00:00
          Friday     00:00:00  -->  Friday     24:00:00
          Saturday   00:00:00  -->  Saturday   24:00:00

Catalog Disaster Recovery Configuration:
  Email Address:   xxx@xx.com
  Disk Path:       /opt/openv/DR
  User Name:       (none specified)
  Pass Word:       (none specified)
  Critical policy: (none specified)

$

susindran_surul
Level 6

HI ,

Above Tech not matching to our case.I could see DR path name is correct and i can able to create test file under that path.

Answered above all questions.

Thanks,

Susi.S

Marianne
Level 6
Partner    VIP    Accredited Certified

Hopefully /opt/openv/DR is a separate mountpoint?
Dr-file should not be saved to the same filesystem as NBU catalogs.

To get back to the failure:

Yes, it looks like a Client Read timeout.

06/24/2014 05:12:58 - begin writing
06/24/2014 11:29:00 - Error bpbrm (pid=14561) socket read failed: errno = 62 - Timer expired
06/24/2014 11:29:02 - Error bptm (pid=14562) media manager terminated by parent process
06/24/2014 11:29:53 - Info bpbkar (pid=23995) done. status: 13: file read failed

 

To troubleshoot, we need the following logs:

On media server: bptm and bpbrm

On master: bpbkar

Please copy log files to .txt files that match process name (e.g. bptm.txt) and upload as File attachments.

 

susindran_surul
Level 6

No its on same mount point.We used to take backup on 7.5.0.4 on same path.Any issue for using same mount point?

I m pulling all logs and will update soon

Marianne
Level 6
Partner    VIP    Accredited Certified

I do not believe the DR-path is causing the issue. 
I was merely concerned because it looked like the same filesystem as the catalogs...
No issue there.

Another concern is Catalog backup that seems to be scheduled for only once per week.
When this issue is fixed, please add another schedule for daily backups.
It can be Diff or Inc.

As a matter of interest - how big is your catalog?

Hopefully logs will tell us why catalog backups seem to be going into a 'hang state', resulting in the timeout.

It may be necessary to increase logging levels - level 3 should be fine.
If you decide to open Support call with Symantec, they will need level 5 logs.

mph999
Level 6
Employee Accredited

Need bpbkar log as well ....

Marianne
Level 6
Partner    VIP    Accredited Certified

bpbkar is where we hope to see evidence of process 'hanging' on the master server.

bptm will just tell us up to when data was received, bpbrm will confirm the timeout.

So, really important that get master's bpbkar log that corresponds with media server logs.

We need  pid=23995 in bpbkar log  - if the media server logs correspond with job details above.

 

**** Update ****

We see in bptm that data was received at a steady space until 08:27:

08:27:03.160 [14562] <2> write_data: received first buffer (524288 bytes), begin writing data

 

Then nothing.... Just the termination from bpbrm at 10:29 because of the timeout.

bpbrm shows last catalog update as  07:54:

07:54:50.082 [14561] <2> bpbrm main: end write catalog
10:29:00.101 [14561] <16> bpbrm readline: socket read failed: errno = 62 - Timer expired

So, we need to see master server's bpbkar log from more or less 07:50 onwards (assuming that the system time on master and media server correspond).

susindran_surul
Level 6

Hi Marianne,

 

I could see on bpbkar below messages repeated till the end from starting of 8:29.After backup failed also this  message repeated and i have killed manually bpbkar.

 

I could see below server oracle RAC server.I could see its having some sympolic link under images of particular directory but not able to read it.

 

What is that sympolic link?

$ ls -lart
total 154
lrwxrwxrwx   1 root     root           6 Mar 23  2010 gbsRAC -> gbsRAC

 

$ more gbsRAC
gbsRAC: Number of symbolic links encountered during path name traversal exceeds MAXSYMLINKS
$

 

 

 

8:29:00.092 [23995] <2> bpbkar SelectFile: cwd=/opt/openv/netbackup/db/images/gbsdb-bkp path=gbsRAC
08:29:00.092 [23995] <2> check_symlink_touchfile: Ignore symlinks? ANS=0
08:29:00.093 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.093 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.094 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.094 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.094 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.095 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.095 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.095 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.096 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC
08:29:00.096 [23995] <2> bpbkar resolve_path: INF - /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC resolves to /opt/openv/netbackup/db/images/gbsdb-bkp/gbsRAC

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified
So, incorrect symbolic link, pointing to itself instead of a folder. Seems someone wanted to ensure that all possible names for RAC nodes will save image files in the same folder, but created the link incorrectly. From what I see, the link should be directly under images folder, gbsRAC -> gbsdb-bkp. I cannot tell you why this is only causing an issue now. Simply delete this link.

susindran_surul
Level 6

Please find the logs from media server.I have around 8GB logs on master server.Is there any particular way to grep only this failed job related PID?

 

susindran_surul
Level 6

Hi Marianne,

Yes..Its problem with that culprit link file.I have tested same senario on test server,I could see same error came.

Yesterday i have removed link fine and backup completed successfully.

For me to trace this link file took 3 days to find out.Hope this will help to otheres who and all going from 7.5 to 7.6.x.

Thanks to all.

Thanks,

Susi.S