Resource connection errors when trying to backup Oracle

Dear Sirs,
  on a new install of Backup Exec, we are trying to configure a DBA-initiated backup of an Oracle instance. So far, we haven't had any success.
If the job is triggered by RMAN, BE queues it ok, but it immediately fails with this error:
<<
Final error: 0xe000846b - The resource could not be backed up because an error occurred while connecting to the Backup Exec for Windows Servers Remote Agent.
Make sure that the correct version of the Remote Agent is installed and running on the target computer.
Final error category: Resource Errors
Remote Agent not detected on QMS.
>>
If the job is configured in BE (which, by the way, can see the instance's tablespaces, etc.), when started it hangs indefinitely in the "Pre-processing"
state.
You'll find here attached two sets of log (the second one being more verbose) related to the DBA-initiated test.
Other various info:
- The same machine acts as BE Media Server and Oracle DB Server. It's _not_ in a Microsoft A/D domain.
- The backup targes a Backup-to-Disk folder.
- There's no NIC teaming involved.
- "Simple" file backups work fine.
- The Backup Exec Agent Browser can't be gracefully stopped, we have to manually kill "benetns.exe".
- We're running BE 11, rev. 7170 with Service Pack 1
- Example of an RMAN run
C:\Documents and Settings\Administrator>rman target / nocatalog
Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 13 16:38:59 2007
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: QMS (DBID=1494649833)
using target database control file instead of recovery catalog
RMAN> RUN {
2> ALLOCATE CHANNEL ch0 TYPE 'SBT_TAPE';
3> SEND 'BSA_SERVICE_HOST=QMS,NBBSA_TOTAL_STREAMS=1';
4> BACKUP TABLESPACE 'PRESET_TBS';
5> }
allocated channel: ch0
channel ch0: sid=87 devtype=SBT_TAPE
channel ch0: Symantec/BackupExec/1.1.0
sent command to channel: ch0
Starting backup at 13-DEC-07
channel ch0: starting full datafile backupset
channel ch0: specifying datafile(s) in backupset
input datafile fno=00043 name=Y:\ORACLE\ORADATA\QMS\PRESET_TBS.DBF
channel ch0: starting piece 1 at 13-DEC-07
released channel: ch0
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch0 channel at 12/13/2007 16:39:39
ORA-19506: failed to create sequential file, name="1mj3gtd8_1_1", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
   BEError(0xe000fe30) A communications failure has occurred.
- The only things communication related that I can find in the logs are:
[6024] 12/13/07 16:57:36 ndmp_writeit: NDMP Socket Error when trying to write message. Socket 0x588 len 0x20
[6024] 12/13/07 16:57:36 ndmp_writeit: ErrorCode :: 10054 : An existing connection was
forcibly closed by the remote host.
[1924] 12/13/07 16:56:52 BETCPConnection::LoopThroughListAndConnect: Could not connect to remote address "10.2.208.61" Error:10061.

thanks for your help,
--
Maurizio
 
QMS-bengine00.log
[6024] 12/13/07 16:57:36 ndmpCreateConnection
[6024] 12/13/07 16:57:36 ndmpConnectEx: 'QMS' resolves to the local machine itself. So, we will try and connect to a loopback address ...
[6024] 12/13/07 16:57:36 ndmpConnectEx : Control Connection information: A connection was established between end-points 0.0.0.0:2455 and 127.0.0.1:1026.
[6024] 12/13/07 16:57:36 ndmpSendRequest: message:0x900
[6024] 12/13/07 16:57:36 ndmp_writeit: len:32
[6024] 12/13/07 16:57:36 ndmp_writeit: NDMP Socket Error when trying to write message. Socket 0x588 len 0x20
[6024] 12/13/07 16:57:36 ndmp_writeit: ErrorCode :: 10054 : An existing connection was forcibly closed by the remote host.
[6024] 12/13/07 16:57:36 sent          seq#: 1 msg: 0x900
[6024] 12/13/07 16:57:36 ndmp_process_messages: reply_expected:TRUE
[6024] 12/13/07 16:57:36 ndmp_recv_msg
[6024] 12/13/07 16:57:36 ndmp_readit: len:4000
[6024] 12/13/07 16:57:36 ndmp_readit: Caught message on closed connection. Socket 0x588 len 0xffffffff
[6024] 12/13/07 16:57:36 ndmp_readit: ErrorCode :: 10054 : An existing connection was forcibly closed by the remote host.
[6024] 12/13/07 16:57:36 ndmp_process_messages: detected eof
[6024] 12/13/07 16:57:36 ERROR: ndmpcSendRequest->connection error
[6024] 12/13/07 16:57:36 ERROR: ndmpSendRequest failed:
[6024] 12/13/07 16:57:36 ndmpFreeMessage: message:0x0
[6024] 12/13/07 16:57:36 NDMPAgentConnector:Smiley FrustratedetupConnection: connectOpen failed on server QMS.
[6024] 12/13/07 16:57:36 ndmpClose
[6024] 12/13/07 16:57:36 @@@@@@@MyCloseSocket called with sockfd = 1416(0x588) retval = 0
[6024] 12/13/07 16:57:36 ndmpDestroyConnection
[6024] 12/13/07 16:57:36 @@@@@@@MyCloseSocket called with sockfd = -1(0xffffffff) retval = -1
[6024] 12/13/07 16:57:36 NDMPVxBSAHostBackupEngine:ConnectToDataServer() failed for machine QMS.
QMS-beremote00.log
[5316] 12/13/07 16:56:41 NDMP Daemon Running..
[5316] 12/13/07 16:56:41 Could not create an IPv6 socket at this time, reason:An error occurred during a socket creation operation: Error Code: 10047, System Error Message: An address incompatible with the requested protocol was used.
[5316] 12/13/07 16:56:41 @@@@@@@MyCloseSocket called with sockfd = 708(0x2c4) retval = 0
[5316] 12/13/07 16:56:41 SocketService::IsSystemDualIP: No support found for IPv6 currently
[5316] 12/13/07 16:56:41 Started NDMP Listener on port 10000
[1108] 12/13/07 16:56:42 Running... 'Q' to stop
[1924] 12/13/07 16:56:51 NrdsAdvertiserThread: advertisement cycle started.
[1924] 12/13/07 16:56:51 Informational: Restrict Anonymous Support is enabled
[1924] 12/13/07 16:56:51 creating DLE for local machine
[1924] 12/13/07 16:56:51 BENetConfigEx: Successfully refreshed adapter information.
[1924] 12/13/07 16:56:51 EnumClusterDLEs: Could not opening cluster : The system could not find the environment option that was entered.
[1924] 12/13/07 16:56:51 NrdsAdvertiserThread: EnumSelfDLE for file system 7 returned 0(0x0) and 1 DLEs
[1924] 12/13/07 16:56:51 NrdsAdvertiserThread: Nrds Message Len : 107.
[1924] 12/13/07 16:56:51 RMAN_EnumSelfDLE: Checking if DBName QMS is part of RAC.
[1924] 12/13/07 16:56:51 NrdsAdvertiserThread: EnumSelfDLE for file system 14 returned 0(0x0) and 0 DLEs
[1924] 12/13/07 16:56:51 ConnectToServerEndPoint: dest=QMS, service=6101
[1924] 12/13/07 16:56:52 BETCPConnection::LoopThroughListAndConnect: Could not connect to remote address "10.2.208.61" Error:10061.
[1924] 12/13/07 16:56:52 @@@@@@@MyCloseSocket called with sockfd = 856(0x358) retval = 0
[1924] 12/13/07 16:56:52 BETCPConnection::CreateConnectionFromHostAndPort: Remote Host: "QMS": There were no addresses returned, belonging to family: IPv6
[1924] 12/13/07 16:56:52 BETCPConnection::CreateConnectionFromHostAndPort: Could not create a connection to "QMS" because attempts with both IPv4 and IPv6 protocols failed
[1924] 12/13/07 16:56:52 Could not create a BETCPConnection object from address: QMS error=An error occurred during a socket connect operation: Error Code: 10061, System Error Message: No connection could be made because the target machine actively refused it.
[1924] 12/13/07 16:56:52 NrdsAdvertiserThread: connect to target=QMS port=6101 failed
[1924] 12/13/07 16:56:52 NrdsAdvertiserThread: Retrying in 60 seconds
[4456] 12/13/07 16:56:56 NrdsAdvertiserThread: negative (purge) advertisement cycle started.
[4456] 12/13/07 16:56:56 NrdsAdvertiserThread: no purge is pending.
[4456] 12/13/07 16:56:56 NrdsAdvertiserThread: negative (purge) advertisement cycle complete.  Waiting 240 minutes before advertising again.
[1924] 12/13/07 16:57:52 ConnectToServerEndPoint: dest=QMS, service=6101
[1924] 12/13/07 16:57:52 CreateConnection type=0 on socket 860 via BESocket OK
[1924] 12/13/07 16:57:52 NrdsAdvertiserThread: send of qms type 7 subtype 2 to target=QMS port=6101 succeeded
[1924] 12/13/07 16:57:52 @@@@@@@MyCloseSocket called with sockfd = 860(0x35c) retval = 0
[1924] 12/13/07 16:57:52 NrdsAdvertiserThread: advertisement cycle complete.  Waiting 240 minutes before advertising again.
[5316] 12/13/07 17:00:56 @@@@@@@MyCloseSocket called with sockfd = 688(0x2b0) retval = 0

 
4 Replies

Re: Resource connection errors when trying to backup Oracle

Maurizio,

I talked with some of our Oracle techs about what you are seeing and with all you have done already.  The commen suggestion was to call in and open a support ticket since it would be a bit difficult to troubleshoot this issue via the forums.

Re: Resource connection errors when trying to backup Oracle

 
 
Oracle Recovery Manager (RMAN) backups fail intermittently with error: ORA-19506: failed to create sequential file
 
Any relation?

Re: Resource connection errors when trying to backup Oracle

The issue turned out to be a port conflict. I copied the whole discussion at the end of this post, should someone be interested.
The workaround was either to restart Oracle Listener or (better) Tools -> Options -> Network and Security, check "Enable remote agent TCP dynamic port range", set the range from 60000 to 65535 .
 
We are thus able to backup the local Oracle instance, but still fail with a remote one and another error... Smiley Sad
 
RMAN-03002: failure of backup plus archivelog command at 01/15/2008 18:11:36
ORA-19506: failed to create sequential file, name="BE_1ej6836m_1_1", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
 
Note that the error text received from MML is blank... It seems that the issues never end...
 
ciao,
--
Giuliano
 
Hi Ivan,
 
Starting from your suggestions, I did the following steps.

C:\TEMP>netstat -abno | findstr 1025
  TCP    127.0.0.1:1025         0.0.0.0:0              LISTENING       1576
C:\TEMP>netstat -abno | findstr 1576
  TCP    0.0.0.0:1521           0.0.0.0:0              LISTENING       1576
  TCP    127.0.0.1:1025         0.0.0.0:0              LISTENING       1576
  [..]
PID 1576 os "TNSLSNR.exe"
 
Here I run the job:
C:\TEMP>netstat -abno | findstr 1025
  TCP    0.0.0.0:1025           0.0.0.0:0              LISTENING       1620
  TCP    127.0.0.1:1025         0.0.0.0:0              LISTENING       1576
  TCP    127.0.0.1:1265         127.0.0.1:1025         FIN_WAIT_2      4584
  TCP    127.0.0.1:1025         127.0.0.1:1265         CLOSE_WAIT      1576
PID 1620 is "oracle.exe"
PID 4584 is "bengine.exe"!
Hence "bengine.exe" tries to connect directly to Oracle...

Here I restarted Oracle Listener:
C:\TEMP>netstat -abno | findstr 1025
Nobody is listening on port 1025.
New PID of "TNSLSNR.exe" is 3904. It's now bound port 1321 (supposedly one of the free TCP ports above 1024) and, of course, 1521 (standard SQLNet port).
C:\TEMP>netstat -abno | findstr 3904
  TCP    0.0.0.0:1521           0.0.0.0:0              LISTENING       3904
  TCP    127.0.0.1:1321         0.0.0.0:0              LISTENING       3904
  [..]

I re-submit the job:
C:\TEMP>netstat -abno | findstr 1025
  TCP    0.0.0.0:1025           0.0.0.0:0              LISTENING       1620
  TCP    127.0.0.1:1025         127.0.0.1:1409         ESTABLISHED     1620
  TCP    127.0.0.1:1409         127.0.0.1:1025         ESTABLISHED     4584
And it works!
 
In short:
- A job is submitted
- BE talks to Oracle. For whatever reason, oracle.exe binds to TCP port 1025.
  If there's another process listening on that port, things go wrong.
  In our scenario, that other process is TNSLSNR.exe!
Given this behaviour, it's not clear if the issue is BE or Oracle related.
A possible workaround is, as I did, restarting the listener.
Both kinds of backup (BE or DBA initiated) now work.
 
I appreciate any further comments.
 

Re: Resource connection errors when trying to backup Oracle

You may need to run a shell script called changePerm.sh in ORACLE_HOME/install, if you've applied any of the patchsets (10.2.0.3-4).