10-13-2014 08:50 PM
Hi All,
My Oracle backups failing with error as below. Please help me with the workaround.
Starting Control File and SPFILE Autobackup at 14.10.2014 11:37:22
released channel: ch00
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of Control File and SPFILE Autobackup command on ch00 channel at 10/14/2014 11:38:36
ORA-19506: failed to create sequential file, name="c-535381093-20141014-05", parms=""
ORA-27027: sbtremove2 returned error
ORA-19511: Error received from media manager layer, error text:
Failed to remove, c-535381093-20141014-05, from image catalog.
I have already checked the permissions of bp.conf file , everything seems in place.
Thanks,
Nayab
Solved! Go to Solution.
10-21-2014 06:23 AM
IMHO - the issue seems to be with this error seen in dbclient log:
10:36:35.174 [28727] <2> bprd_connect: errno = 98 - Address already in use 10:36:35.174 [28727] <16> dbc_GetMediaListByName: Can't connect to host zsswmasb: cannot connect on socket (25)
I found this TN that seems to be caused by a lack of available sockets (OS issue):
http://www.symantec.com/docs/TECH52587
10-14-2014 01:20 AM
It could be a timeout, but impossible to say without more information. Please post the job details
As always when troubleshooting database backup create the bphdb and dbclient log folders to get more information
10-14-2014 02:08 AM
Hey Michael,
Please find the dbclient log folder attached , Please let me know where i will be able to find bphdb ?
Thanks,
Nayab
10-14-2014 06:23 AM
Network problems (configuration not hardware).
Verify hostname lookup. Make sure your client and server can communicate bi-directionally.
13:58:06.055 [14563] <16> bsa_bplist: Can't connect to host zsswmasb: cannot connect on socket (25)
13:46:27.382 [14563] <8> dbc_GetServerClientConfig: WARNING - NBU's client name= <bsswmppp1> differs from gethostname()= <zsswmppp1>
10-14-2014 07:06 AM
If the bphdb folder does not exist you have to create it under netbackup/logs on the client
Can see from dbclient log, that there seems to be a connection issue
10:36:35.174 [28727] <2> bprd_connect: Cannot connect to server zsswmasb
10:36:35.174 [28727] <2> bprd_connect: errno = 98 - Address already in use
10:36:35.174 [28727] <16> dbc_GetMediaListByName: Can't connect to host zsswmasb: cannot connect on socket (25)
What does bptestbpcd -client bsswmppp1 -debug from the master and media server show ?
Also what does bpclntcmd -pn & bpclntcmd -self on client show ?
10-14-2014 07:14 AM
@nayabsk : i think this is root of problem.
12:43:27.748 [5558] <2> bprd_connect: Cannot connect to server zsswmasb automatically: 21
12:43:27.748 [5558] <2> bprd_connect: Cannot connect to server zsswmasb
12:43:27.748 [5558] <2> bprd_connect: errno = 98 - Address already in use
12:43:27.748 [5558] <16> bsa_bplist: Can't connect to host zsswmasb: cannot connect on socket (25)
12:43:27.748 [5558] <16> VxBSAQueryObject: ERR - bsa_bplist() failed 25
12:43:27.748 [5558] <16> xbsa_QueryObject: ERR - VxBSAQueryObject: Failed with error:
Server Status: cannot connect on socket
12:43:27.748 [5558] <16> int_RemoveImage: ERR - Failed to remove, c-535381093-20141014-06, from image catalog.
12:43:34.656 [5558] <4> sbtend: INF - --- END of SESSION ---
13:46:27.382 [14563] <8> dbc_GetServerClientConfig: WARNING - NBU's client name= <bsswmppp1> differs from gethostname()= <zsswmppp1>
System name: Linux
Node name: zsswmppp1(i think its a cluster)
Client Host: bsswmppp1(virtual server on cluster)
please confirm above info. and do share ur script as well along with bpclntcmd output.
10-14-2014 06:36 PM
@rookie , Please find the bpclntcmd output and backup script attached in btwn i see few Archivelog job are successful but few are failing , Its strange
ZSSWMPPP1 is Clusterd with ZSSWMPPP2 and bsswmppp1 , bsswmppp2 were the hostnames through which the NBU communicates these hosts ( THE BACKUP LAN )
zsswmppp1:/usr/openv/netbackup/bin # ./bpclntcmd -pn
expecting response from server zsswmasb
bsswmppp1 bsswmppp1 172.29.48.54 59665
zsswmppp1:/usr/openv/netbackup/bin # ./bpclntcmd -self
current domain =
NIS does not seem to be running: (1) Request arguments bad
gethostname() returned: zsswmppp1
host zsswmppp1: zsswmppp1.ssw.corp at 172.29.26.25 (0x191a1dac)
aliases: zsswmppp1
zsswmppp1:/usr/openv/netbackup/bin # ./bpclntcmd -hn zsswmasb2
host zsswmasb2: zsswmasb2 at 172.29.48.82 (0x52301dac)
aliases:
zsswmppp1:/usr/openv/netbackup/bin # ./bpclntcmd -hn zsswmasb
host zsswmasb: zsswmasb at 172.29.48.83 (0x53301dac)
aliases:
zsswmppp1:/usr/openv/netbackup/bin # ./bpclntcmd -ip 172.29.48.54
checkhaddr: host : bsswmppp1: bsswmppp1 at 172.29.48.54 (0x36301dac)
checkhaddr: aliases:
10-14-2014 07:27 PM
@Michael Please note that my CLIENT ITSELF acts as MEDIA SERVER and backups up self.
Please find the outputs attached as requested and can refer to the bpclntcmd in my reply to rookie :)
10-14-2014 10:48 PM
Hi All,
Now my backups are failing with error 11 system call failed ?
11 system call failed bsswmppp_sp25_oracle_daily bsswmppp1
11 system call failed bsswmppp_sp27_oracle_daily bsswmppp2
Thanks,
Nayab
10-15-2014 12:43 AM
Have you changed anything ?
Which process is failing ?
10-15-2014 01:54 AM
I made entries in this location in my MASTER server and made this entry for my DB servers MPPP1 and MPPP2 as below
zsswmasb2:/usr/openv/netbackup/db/altnames
echo echo "bsswmppp1" >> bsswmppp1.aaa.corp
Apart from this i havent chnaged anything
10-15-2014 03:59 AM
error code 11 can also come if disk space is less
have you updated host file entries on master and media+client ? It should look like
ZSSWMPPP1 bsswmppp1 <ipaddress>
ZSSWMPPP2 bsswmppp2 <ipaddress>
10-15-2014 04:03 AM
if would suggest to make changes on script as well.
SEND 'NB_ORA_CLIENT=$CLIENTHOST, NB_ORA_POLICY=$NB_ORA_POLICY';
put client name as bsswmppp1 and policy name as policy which is in NBU gui
10-15-2014 10:38 PM
Hey Rookie,
Let me make some things clear about my SERVER and INSTANCES running
ZSSWMPPP1 ( Instances Name - 20, 27 ) Clustered with server ZSSWMPPP2 ( Instance Name - 25 )
Now status of my backups are as below
Instance 20 - Both Archivelog and DB backups are successful
Instance 27 - Only Archivelog backups successful but DB backups failing continously since 2 days
Instance 25 - ArchiveLog backups failing sometimes but DB backups failing continously since 2 days
Thanks,
Nayab
10-16-2014 01:41 AM
Seems to me that something changed 2 days ago, and are pretty sure there is something in the network setup/NetBackup configuration that not quite match.
Also as you are using a backup network, are you sure all the backup related traffic goes through that and not some through the production network.
What are the logical names/VIP on the cluster and what are the names/IP on physical nodes ?
Sorry but is now very confused about what you setup actually are
10-16-2014 04:23 AM
@Nayabsk,
I have some questions for you before going further in troubleshooting:
Are you Oracle DBA or have you experience in RDBMS?
On which platform are you running Oracle: Linux? Unix? Windows?
RMAN backup was working before? if so, Is there something that has changed since the last backup?
=> If backup was running fine before, you can run the following command from RMAN prompt to see the list of all previous backup:
list backup summary ;
Can you run show all command from RMAN prompt to see your RMAN configuration and make sure CONTROFILE AUTOBACKUP is properly configured?
Can you paste the complete RMAN backup script or the command that you run for backup?
Are you backing up to DISK or TAPE (SBT_TAPE param)?
10-20-2014 12:58 AM
Hey Nathanmike,
Answer to your questions
Are you Oracle DBA or have you experience in RDBMS?
Nope i am Storage and Backup Admin
On which platform are you running Oracle: Linux? Unix? Windows?
Linux
RMAN backup was working before? if so, Is there something that has changed since the last backup?
As per my SYSADMIN and DBA nothing changed ans also i see other instance backups on the same server is working fine.
=> If backup was running fine before, you can run the following command from RMAN prompt to see the list of all previous backup:
list backup summary ;
Can you run show all command from RMAN prompt to see your RMAN configuration and make sure CONTROFILE AUTOBACKUP is properly configured?
Yes it is Properly Configured i have checked with my DBA
Can you paste the complete RMAN backup script or the command that you run for backup?
Are you backing up to DISK or TAPE (SBT_TAPE param)
Backup going to SBT_TAPE, Please find the script attached
Nathan Mike
10-20-2014 01:33 AM
Cannot connect to server zsswmasb
I remember from one of your other posts (Understanding the Connectivity of Tape Library with my Media Servers) that your master server is clustered, right?
CLUSETR NAME IS ZSSWMASB
Check /etc/hosts on the client that correct IP address is entered for the virtual name.
Maybe 172.29.48.83 is the IP address for zsswmasb1?
If so, backups will work while zsswmasb1 is the active node but fail when zsswmasb2 is active node.
When the backup fails, check bprd on the active node to see if connection request was received from the client.
10-21-2014 05:41 AM
@Nayabsk,
I already face this issue few weeks ago:
Starting Control File and SPFILE Autobackup at 23-MAY-14
released channel: ch00
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of Control File and SPFILE Autobackup command on ch00 channel at 05/23/2014 12:29:24
ORA-19506: failed to create sequential file, name="c-4185468684-20140523-03", parms=""
ORA-27027: sbtremove2 returned error
ORA-19511: Error received from media manager layer, error text:
Failed to remove, c-4185468684-20140523-03, from image catalog.
I solved this issue as follow:
1) If it is a Clustered environment, make sure all Clients (nodes and virtual host) are properly created in Netbackup Admin Console
2) Make sure all Clients attribute from the Master server are properly created for each nodes and virtual host.
3) Make sure you have created the follwing files "/usr/openv/netbackup/db/altnames/No.Restrictions" on the Master server to avoid Client access conflict especially if you have multihomed system:
[root@nbumas01 ~]# touch /usr/openv/netbackup/db/altnames/No.Restrictions
[root@nbumas01 ~]# ls -lrt /usr/openv/netbackup/db/altnames/No.Restrictions
-rw-r--r-- 1 root root 0 May 23 11:12 /usr/openv/netbackup/db/altnames/No.Restrictions
http://www.symantec.com/business/support//index?page=content&pmv=print&impressions=&viewlocale=&id=HOWTO86696
4) Make sure you can lookup/reverse lookup all of your client (nodes and virutal host) from master server and client
http://www.symantec.com/business/support/index?page=content&id=TECH135349
5) Make sure RMAN is properly configured for CONTROLFILE AUTOBACKUP, ARCHIVELOG, DATAFILE AND DEVICE TYPE
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE SBT_TAPE TO '%F';
new RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE 'SBT_TAPE' TO '%F';
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete
RMAN> CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE SBT_TAPE TO 1
2> ;
new RMAN configuration parameters:
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE 'SBT_TAPE' TO 1;
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete
RMAN> CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE SBT_TAPE TO 1;
new RMAN configuration parameters:
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE 'SBT_TAPE' TO 1;
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete
RMAN> CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT 'ora_df%t_s%s_s%p';
new RMAN configuration parameters:
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT 'ora_df%t_s%s_s%p';
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete
RMAN> show all;
RMAN configuration parameters for database with db_unique_name NBUORA are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 3;
CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/backup/rman/ora_cf%F';
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE 'SBT_TAPE' TO '%F';
CONFIGURE DEVICE TYPE DISK PARALLELISM 2 BACKUP TYPE TO BACKUPSET;
CONFIGURE DEVICE TYPE SBT_TAPE PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE 'SBT_TAPE' TO 1;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE 'SBT_TAPE' TO 1;
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT '/backup/rman/ora_df%t_s%s_s%p';
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/11.1.0/dbs/snapcf_nbuora.f'; # default
RMAN> exit
6) Make sure your RMAN script is properly configured with correct entries for backup. Please find below an example:
[oracle@rhel6 scripts]$ cat fullrmancmd
set echo on;
run {
allocate channel ch00
device type sbt_tape
PARMS='ENV=( NB_ORA_CLIENT=nbuora.nbulabs.be,
NB_ORA_POLICY=nbu-oracle-pol,
NB_ORA_SCHED=weekly_full_oracle
)'
;
backup
format='%U'
incremental level 0
database
include current controlfile
plus archivelog
;
release channel ch00;
}
list backup summary device type sbt;
exit;
7) Run your script and see if it is running fine:
[oracle@rhel6 scripts]$ ./nbuorafull.sh
Recovery Manager: Release 11.2.0.1.0 - Production on Fri May 23 16:27:16 2014
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: NBUORA (DBID=4185468684)
connected to recovery catalog database
RMAN> set echo on;
2> run {
3> allocate channel ch00
4> device type sbt_tape
5> PARMS='ENV=( NB_ORA_CLIENT=nbuora.nbulabs.be,
6> NB_ORA_POLICY=nbu-oracle-pol,
7> NB_ORA_SCHED=weekly_full_oracle
8> )'
9> ;
10>
11> backup
12> format='%U'
13> incremental level 0
14> database
15> include current controlfile
16> plus archivelog
17> ;
18>
19> release channel ch00;
20> }
21> list backup summary device type sbt;
22> exit;
echo set on
allocated channel: ch00
channel ch00: SID=64 device type=SBT_TAPE
channel ch00: Veritas NetBackup for Oracle - Release 7.5 (2013061020)
Starting backup at 23-MAY-14
current log archived
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_22/o1_mf_1_21_9qwb9d94_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_22/o1_mf_1_22_9qwbc8nv_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_23_9qwwzzww_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_24_9qwx0nb1_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_25_9qxzcmgq_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_26_9qxzyc65_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_27_9qy0gbqt_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_28_9qy3l3h3_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_29_9qy48llt_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_30_9qy82dg3_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_31_9qy8o2jh_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_32_9qybofpw_.arc; already backed up 1 time(s)
skipping archived log file /u01/app/oracle/fast_recovery_area/NBUORA/archivelog/2014_05_23/o1_mf_1_33_9qyc6v58_.arc; already backed up 1 time(s)
channel ch00: starting archived log backup set
channel ch00: specifying archived log(s) in backup set
input archived log thread=1 sequence=34 RECID=28 STAMP=848334441
channel ch00: starting piece 1 at 23-MAY-14
channel ch00: finished piece 1 at 23-MAY-14
piece handle=26p913je_1_1 tag=TAG20140523T162726 comment=API Version 2.0,MMS Version 5.0.0.0
channel ch00: backup set complete, elapsed time: 00:03:35
Finished backup at 23-MAY-14
Starting backup at 23-MAY-14
channel ch00: starting incremental level 0 datafile backup set
channel ch00: specifying datafile(s) in backup set
input datafile file number=00004 name=/oradata/users/nbuora/users01.dbf
input datafile file number=00005 name=/oradata/data01/nbuora/data01.dbf
input datafile file number=00006 name=/oradata/data02/nbuora/data02.dbf
input datafile file number=00007 name=/oradata/data03/nbuora/data03.dbf
input datafile file number=00008 name=/oradata/index01/nbuora/index01.dbf
input datafile file number=00009 name=/oradata/index02/nbuora/index02.dbf
input datafile file number=00010 name=/oradata/index03/nbuora/index03.dbf
input datafile file number=00001 name=/oradata/system/nbuora/system01.dbf
input datafile file number=00002 name=/oradata/system/nbuora/sysaux01.dbf
input datafile file number=00003 name=/oradata/undo01/nbuora/undo01.dbf
input datafile file number=00011 name=/oradata/rman/rman01.dbf
channel ch00: starting piece 1 at 23-MAY-14
channel ch00: finished piece 1 at 23-MAY-14
piece handle=27p913q5_1_1 tag=TAG20140523T163101 comment=API Version 2.0,MMS Version 5.0.0.0
channel ch00: backup set complete, elapsed time: 00:08:27
channel ch00: starting incremental level 0 datafile backup set
channel ch00: specifying datafile(s) in backup set
including current control file in backup set
channel ch00: starting piece 1 at 23-MAY-14
channel ch00: finished piece 1 at 23-MAY-14
piece handle=28p914a0_1_1 tag=TAG20140523T163101 comment=API Version 2.0,MMS Version 5.0.0.0
channel ch00: backup set complete, elapsed time: 00:03:06
Finished backup at 23-MAY-14
Starting backup at 23-MAY-14
current log archived
channel ch00: starting archived log backup set
channel ch00: specifying archived log(s) in backup set
input archived log thread=1 sequence=35 RECID=29 STAMP=848335355
channel ch00: starting piece 1 at 23-MAY-14
channel ch00: finished piece 1 at 23-MAY-14
piece handle=29p914ft_1_1 tag=TAG20140523T164237 comment=API Version 2.0,MMS Version 5.0.0.0
channel ch00: backup set complete, elapsed time: 00:03:35
Finished backup at 23-MAY-14
Starting Control File and SPFILE Autobackup at 23-MAY-14
piece handle=ora_cfc-4185468684-20140523-05 comment=API Version 2.0,MMS Version 5.0.0.0
Finished Control File and SPFILE Autobackup at 23-MAY-14
released channel: ch00
10-21-2014 06:23 AM
IMHO - the issue seems to be with this error seen in dbclient log:
10:36:35.174 [28727] <2> bprd_connect: errno = 98 - Address already in use 10:36:35.174 [28727] <16> dbc_GetMediaListByName: Can't connect to host zsswmasb: cannot connect on socket (25)
I found this TN that seems to be caused by a lack of available sockets (OS issue):
http://www.symantec.com/docs/TECH52587