cancel
Showing results for 
Search instead for 
Did you mean: 

Problem backup with Oracle 11 on Oracle Linux

MMoret
Level 3
Partner Accredited

Hi,

I have a new Oracle Linux 6.2 client running Oracle 11.
I cannot get the backup to work. A normal filesystem backup works just fine.

In the log near the end there is a problem with a file I do not understand, all the permissions are in order.


# ls -l /usr/openv/netbackup/logs/user_ops/dbext/logs/8231.0.1369146890
-rw-r--r-- 1 oracle oinstall 42 May 21 16:34 /usr/openv/netbackup/logs/user_ops/dbext/logs/8231.0.1369146890

Here is the log:

 

16:49:18.775 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:18.776 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:18.777 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:18.777 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:18.778 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:18.778 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:18.779 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:18.779 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:18.780 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:18.780 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:18.781 [11803] <8> xbsa_GetEnv: WRN - NBBSA_SCHEDULE not found in environment block
 
16:49:18.781 [11803] <2> int_LogSystemInfo: INF -
Veritas NetBackup for Oracle - Release 7.5 (2012060523)
        System name:    Linux
        Node name:      sv-ora15-p.domain.local
        Release:        2.6.39-200.32.1.el6uek.x86_64
        Version:        #1 SMP Wed Sep 26 23:11:38 PDT 2012
        Machine:        x86_64
        User name:      oracle
        Client Host:    sv-ora15-p.domain.local
 
16:49:18.781 [11803] <2> int_GetMMInfo: INF - Initialized Signal
16:49:18.781 [11803] <2> int_GetMMInfo: INF - support for Proxy Copy enabled
16:49:19.549 [11803] <4> int_ProcessCommandString: INF - cmd_key=<NB_ORA_CLIENT> cmd_val=<sv-ora15-p.domain.local>
16:49:19.549 [11803] <2> int_ProcessCommand: INF - Client <sv-ora15-p.domain.local> will be used for this API session.
16:49:19.549 [11803] <4> int_ProcessCommandString: INF - cmd_key=<NB_ORA_SID> cmd_val=<tmeld>
16:49:19.549 [11803] <2> int_ProcessCommand: INF - SID <tmeld> will be used for metadata collection.
16:49:19.549 [11803] <4> int_ProcessCommandString: INF - cmd_key=<NB_ORA_SERV> cmd_val=<sv-bu01-p.domain.local>
16:49:19.549 [11803] <2> int_ProcessCommand: INF - Server <sv-bu01-p.domain.local> will be used for this API session.
16:49:19.549 [11803] <4> int_ProcessCommandString: INF - cmd_key=<NB_ORA_POLICY> cmd_val=<P_Oracle_DBs>
16:49:19.549 [11803] <2> int_ProcessCommand: INF - Policy <P_Oracle_DBs> will be used for this API session.
16:49:19.550 [11803] <4> int_ProcessCommandString: INF - cmd_key=<NB_ORA_PARENT_JOBID> cmd_val=<39461>
16:49:19.550 [11803] <2> int_ProcessCommand: INF - Parent Job ID <39461> will be used for this API session.
16:49:19.638 [11803] <4> int_ProcessCommandString: INF - cmd_key=<NB_ORA_PLOG> cmd_val=</usr/openv/netbackup/logs/user_ops/dbext/oracle/progress.1369147
749.11697.log>
16:49:19.638 [11803] <2> int_ProcessCommand: INF - Progress Log </usr/openv/netbackup/logs/user_ops/dbext/oracle/progress.1369147749.11697.log> will be
used for this API session.
16:49:20.552 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:20.552 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:20.558 [11803] <2> vnet_pbxConnect: pbxConnectEx Succeeded
16:49:20.561 [11803] <2> logconnections: BPRD CONNECT FROM 10.100.100.77.23558 TO 10.100.100.120.1556 fd = 11
16:49:20.653 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:20.653 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:20.654 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:20.654 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:20.660 [11803] <2> vnet_pbxConnect: pbxConnectEx Succeeded
16:49:20.663 [11803] <2> logconnections: BPRD CONNECT FROM 10.100.100.77.37045 TO 10.100.100.120.1556 fd = 11
16:49:21.369 [11803] <16> writeToServer: ERR - send() to server on socket failed: Bad file descriptor (9)
16:49:21.369 [11803] <16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket
16:49:21.369 [11803] <16> VxBSABeginProgressLogging: ERR - Unable to write to progress file.
16:49:21.369 [11803] <8> xbsa_ProgressLogSetup: WRN - VxBSABeginProgressLogging: Failed with error:
   Server Status:  system error occurred
16:49:21.369 [11803] <8> xbsa_ProgressLogSetup: WRN - Job will proceed without progress logging
16:49:21.370 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:21.370 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:21.371 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:21.371 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:21.372 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:21.372 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:21.373 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:21.373 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:21.375 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:21.376 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:21.376 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:21.376 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:21.380 [11803] <2> vnet_pbxConnect: pbxConnectEx Succeeded
16:49:21.382 [11803] <2> logconnections: BPRD CONNECT FROM 10.100.100.77.50826 TO 10.100.100.120.1556 fd = 12
16:49:21.384 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:21.384 [11803] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:51.391 [8231] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/8231.0.1369146
890
16:49:51.391 [8231] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/8231.0.1369146890>
16:49:51.391 [8231] <16> CreateNewImage: ERR - serverResponse() failed
16:49:51.391 [8231] <16> VxBSACreateObject: ERR - Could not create new image with file /04oa6vpa_1_1.
16:49:51.391 [8231] <16> xbsa_CreateObject: ERR - VxBSACreateObject: Failed with error:
   Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
16:49:51.395 [8231] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
16:49:51.395 [8231] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
16:49:51.396 [8231] <4> sbtend: INF - --- END of SESSION ---
16:49:51.396 [8231] <8> close_image: Session being terminated abnormally, cleaning up
16:49:51.396 [8231] <4> close_image: INF - backup FAILED
16:49:51.396 [8231] <4> close_image: INF ---- end of Backup ---
 
16:49:51.396 [8231] <16> VxBSAEndTxn: ERR - Transaction ended with active Backup/Restore.
16:49:51.396 [8231] <16> xbsa_EndTransaction: ERR - VxBSAEndTxn: Failed with error:
   The transaction was aborted.
 
Any help??
 
Thanks in advance!
Martijn
1 ACCEPTED SOLUTION

Accepted Solutions

MMoret
Level 3
Partner Accredited

Hi,

This is what I got back from the DBA:

As there are many databases and instances,
The following files are modified:

/etc/security/limits.conf
/etc/security/limits.d/90-nproc.conf

65k to 125k (approx.)

I have asked for detailed adjustments to the server.

Regards,

Martijn

View solution in original post

21 REPLIES 21

huanglao2002
Level 6

1 Are you set up the oracle backup policy correct?

2 From the error log

/home/oracle/bp.conf

Netbackup try to  get some info from oracle home bp.conf file.

3  From logs

16:49:21.369 [11803] <16> writeToServer: ERR - send() to server on socket failed: Bad file descriptor (9)
16:49:21.369 [11803] <16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket
16:49:21.369 [11803] <16> VxBSABeginProgressLogging: ERR - Unable to write to progress file.
16:49:21.369 [11803] <8> xbsa_ProgressLogSetup: WRN - VxBSABeginProgressLogging: Failed with error:
Server Status: system error occurred

 

You can theck /usr/openv/netbackup/logs directory permission.

 

4. in my test env,you can reference

[root@ logs]# ls -ld user_ops
drwxrwxrwx 5 root bin 4096 05-18 22:31 user_ops

[root@ user_ops]# ls -l
总计 12
drwxrwxrwx 4 root root 4096 04-22 10:22 dbext
drwxrwxrwx 2 root bin  4096 05-22 00:04 nbjlogs
drwxr-x--x 4 root root 4096 01-28 09:08 root
[root@redhatovocs user_ops]# ls -lR dbext/
dbext/:
总计 8
drwxrwxrwx 2 root root 4096 05-22 00:02 jobs
drwxrwxrwx 2 root root 4096 05-22 00:02 logs

dbext/jobs:
总计 12
-rw-r--r-- 1 root root 177 05-20 00:00 vxbsa.1368979208.13111.prog.pcb_std.j
-rw-r--r-- 1 root root 177 05-21 00:00 vxbsa.1369065610.7885.prog.pcb_std.j
-rw-r--r-- 1 root root 177 05-22 00:04 vxbsa.1369152167.2886.prog.pcb_std.j

dbext/logs:
总计 36
-rw-r--r-- 1 root root 2726 05-20 00:00 vxbsa.1368979208.13111.files.1
-rw-r--r-- 1 root root 4994 05-20 00:00 vxbsa.1368979208.13111.prog.pcb_std
-rw-r--r-- 1 root root 2726 05-21 00:00 vxbsa.1369065610.7885.files.1
-rw-r--r-- 1 root root 4994 05-21 00:00 vxbsa.1369065610.7885.prog.pcb_std
-rw-r--r-- 1 root root 2726 05-22 00:02 vxbsa.1369152167.2886.files.1
-rw-r--r-- 1 root root 4994 05-22 00:04 vxbsa.1369152167.2886.prog.pcb_std
[root@redhatovocs user_ops]#

 

 

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified

 

16:49:20.663 [11803] <2> logconnections: BPRD CONNECT FROM 10.100.100.77.37045 TO 10.100.100.120.1556 fd = 11
16:49:21.369 [11803] <16> writeToServer: ERR - send() to server on socket failed: Bad file descriptor (9)
Oracle client seems to be unable to connect to bprd on the master server via port 1556.
Is that IP address (10.100.100.120) correct for master server?
 
Firstly, check that port 1556 is open in both directions between master and client.
 
Next, verify correct forward and reverse name lookup between master and client.
Use 'bpclntcmd -hn <name>' and 'bpclntcmd -ip <ip-address>' in both directions to check.
If you need to change DNS or hosts entries to fix lookup issues, remember to clear host cache:
bpclntcmd -clear_host_cache
 
Hope this helps.
 

MMoret
Level 3
Partner Accredited

Thanks for your replies!

The communication is fine, normal file level backups run ok.

the 10.100.100.120 is the Master server.

 

[root@sv-ora15-p bin]# ./bpclntcmd -hn sv-bu01-p
host sv-bu01-p: sv-bu01-p.domain.local at 10.100.100.120
aliases:     sv-bu01-p.domain.local     sv-bu01-p     10.100.100.120
[root@sv-ora15-p bin]# ./bpclntcmd -ip 10.100.100.120
host 10.100.100.120: sv-bu01-p.domain.local at 10.100.100.120
aliases:     sv-bu01-p.domain.local     10.100.100.120
 
C:\Users\adm>bpclntcmd -hn sv-ora15-p
host sv-ora15-p: sv-ora15-p.domain.local at 10.100.100.77
aliases:     sv-ora15-p.domain.local     sv-ora15-p     10.100.100.77
 
C:\Users\adm>bpclntcmd -ip 10.100.100.77
host 10.100.100.77: sv-ora15-p.domain.local at 10.100.100.77
aliases:     sv-ora15-p.domain.local     10.100.100.77

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Comms for file system backups work different - there is no connection that is initiated from the client directly to the master.

Your master is resolving client's IP address as FQDN: sv-ora15-p.domain.local.
How is hostname defined in the Oracle policy? Shortname or FQDN?

To know what is happening on the master server, we need to see bprd log.
If the log folder does not exist, please create it and restart NBU.

After next failure, copy bprd log to bprd.txt and post as file attachment.
 

 

MMoret
Level 3
Partner Accredited

The client is FQDN in the policy.

Will generate the log file.

Thanks!
Martijn

MMoret
Level 3
Partner Accredited

Hi,

Here is the bprd log file from the master server.

Regards,
Martijn

Marianne
Level 6
Partner    VIP    Accredited Certified

OK - no comms errors.

This is what I see in bprd log:

bkarfiles: User backup failed (client = sv-ora15-p.domain.local user = oracle group = oinstall): system error occurred

add_msg_to_progress_file: Can't become user oracle and group oinstall on sv-ora15-p.domain.local 

 

Can Oracle dba login as user oracle (group oinstall) and run rman backup commands?

MMoret
Level 3
Partner Accredited

Yes, we only have two accounts (root and oracle).

 

[root@sv-ora15-p ~]# rman target backup/*****@db nocatalog
 
Recovery Manager: Release 11.2.0.3.0 - Production on Wed May 22 10:59:35 2013
 
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
 
connected to target database: DB (DBID=2014933852)
using target database control file instead of recovery catalog
 
RMAN>
 
[oracle@sv-ora15-p ~]$ rman target backup/*****@db nocatalog
 
Recovery Manager: Release 11.2.0.3.0 - Production on Wed May 22 11:00:25 2013
 
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
 
connected to target database: DB (DBID=2014933852)
using target database control file instead of recovery catalog
 
RMAN>

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Do you have bphdb and dbclient logs on the client?

If not, create folders and chmod 777 on both of them.

This will hopefully tell us why NBU believes that it cannot su to oracle user.

Please also post the rman script and/or template.

MMoret
Level 3
Partner Accredited

Hi,

I really appreciate your effort!

Here are the logs and template.

Thanks again!
Regards, Martijn

MMoret
Level 3
Partner Accredited

I also tried a scripted backup, the same result, RMAN starts and times out with no data transfer.

Any more ideas?

Regards,
Martijn

Marianne
Level 6
Partner    VIP    Accredited Certified

Apologies - I have not had a chance to look at the logs yet.

What are the Client Connect and Client Read timeouts on the media server?

The default of 300 (5 min) is normally not enough for large databases.
We have good experience with timeout of 1800 for both settings.

MMoret
Level 3
Partner Accredited

Hi,

I found something weird, this customer has two servers running fine Oracle backups.
When checking differences, I found that I cannot login to the Java interface with the oracle account.

This is not a problem on the servers running fine.

Error message: 

ServerInterface:setDebugLevel:262144

BpjavaLoginModule:Setting ServerRequest debug:262144

BpjavaLoginModule:connectServer:[oracle][<users_pw>][sv-ora15-p.domain.local][0]

Connecting to vnetd service over PBX port = 1556

Acknowledgement from PBX1

 

        Protocol Code: 101

        Status: 31

        Time Taken: 194ms

        Error Msg: could not set user id for process

        Server Locale: en_US.UTF-8

        TO[0]: oracle

        TO[1]: sv-ora15-p.domain.local

        TO[2]: en_US.UTF-8

        TO[3]: XXXX

        TO[4]: auth.conf

        TO[5]: 750000 IPC

        FROM[0]: could not set user id for process

        Aux data: null

huanglao2002
Level 6

status code 31 explain

NetBackup status code: 31

Message: could not set user id for process

Explanation: Could not set the user ID of a process to the user ID of the requesting user. NetBackup runs client processes as the requesting user.

Recommended Action: Check the NetBackup All Log Entries report for clues on where and why the failure occurred. For detailed troubleshooting information, create a debug log directory for the process that you think may have returned this status code. Then, retry the operation and check the resulting debug log.

 

Can you chek the oracle id on the both host,check the user/user group difference.

#id oracle

 

Marianne
Level 6
Partner    VIP    Accredited Certified

The NBU status code 31 'Recommended Action' does not help much.... 

I have in all honesty never seen this error and have no idea where to go from here.

Please compare user profiles - maybe restricted shell? 

MMoret
Level 3
Partner Accredited

Hi,

Thanks for your comments.
The problem solved by the DBA, the oracle account did not have enough resources assigned.

Regards,
Martijn

Marianne
Level 6
Partner    VIP    Accredited Certified

Oracle user account did not have enough resources???

Please explain??

MMoret
Level 3
Partner Accredited

Hi,

This is what I got back from the DBA:

As there are many databases and instances,
The following files are modified:

/etc/security/limits.conf
/etc/security/limits.d/90-nproc.conf

65k to 125k (approx.)

I have asked for detailed adjustments to the server.

Regards,

Martijn

Will_Restore
Level 6

bad bug!

 

Description of problem:

/etc/security/limits.d/90-nproc.conf was introduced in pam package in RH6. It contains this line:

# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.

*          soft    nproc     1024


This line overrides the conventionally set /etc/security/limits.conf value of the same name. Years of expected behavior are thrown out the window and honest system administrators are exposed to outages on Redhat 6!

https://bugzilla.redhat.com/show_bug.cgi?id=919793