ORB Timeouts

ADNW · ‎01-23-2015

Does anyone know where these timeout values are set? I think some of my jobs are running into the 5 minute limit...

17:58:34.536 [11510] <2> Orb::setOrbConnectTimeout: timeout seconds: 60(Orb.cpp:1498)
17:58:34.536 [11510] <2> Orb::setOrbRequestTimeout: timeout seconds: 1800(Orb.cpp:1507)
17:58:34.632 [11510] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1507)
17:58:34.632 [11510] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1507)
17:58:34.836 [11510] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1507)

RiaanBadenhorst · ‎01-23-2015

Please explain your issue. Those are informational messages and not necessarily doing anything.

Marianne · ‎01-24-2015

This looks like an extract from a log file. Which log? File or DB backup? On master, media or client? Please give as much info as possible...

Handy NetBackup Links

revarooo · ‎01-25-2015

These are Ace Orb settings being displayed, as mentioned they are informational. Are you having any issues with jobs?

RiaanBadenhorst · ‎01-26-2015

You left out this critical bit

"Could not open FT Client pipe for client"

Check the nbftsrv and nbftclt logs for more on why it can't open the pipe.

Restart the SAN client services on the client.

<< FT server on media server>>

vxlogview -a -o 199 -t 00:10

<< FT Client on the client>>

vxlogview -a -o 200 -t 00:10

ADNW · ‎01-26-2015

vxlogview on the server doesn't show any entries at all during the time of the backup. This is an intermittent failure. Usually the backup will complete upon successive attempts. A few clients per day, seemingly at random will fail with this 83 error. The coincendence of all of them failing at exactly 5 minutes is a litte too much for me to igonore, but I don't have any timeouts on the client, media or master set to 300 seconds. That's what led me to the Orb timeout that keeps popping up in the bptm.

CRZ · ‎01-26-2015

Stuff that defaults to 300:

CLIENT_CONNECT_TIMEOUT
CLIENT_READ_TIMEOUT
BPSTART_NOTIFY_TIMEOUT
BPEND_NOTIFY_TIMEOUT
Probably some others

My money's on you hitting one of the first two.

Anything in the bpbrm log?

ADNW · ‎01-27-2015

I've checked all the timeouts above and they're all > 300 seconds.

BPTM:

06:01:14.046 [26544] <2> Orb::init: initializing ORB bptm_FATClib with: Unknown -ORBSvcConfDirective "-ORBDottedDecimalAddresses 0" -ORBSvcConfDirective "static Resource_Factory '-ORBNativeCharCodeSet UTF-8'" -ORBSvcConfDirective "static PB XIOP_Factory '-enable_keepalive'" -ORBSvcConfDirective "static EndpointSelectorFactory ''" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory PBXIOP_Factory'" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory IIOP_Factory'" -ORBDefaultInitRef '' -ORBSvcConfDirective "static PBXIOP_Evaluator_Factory '-orb bptm_FATClib'" -ORBSvcConfDirective "static Resource_Factory'-ORBConnectionCacheMax 1024 '" -ORBSvcConf /dev/null -ORBSvcConfDirective "static Server_Strategy_Factory '-ORBMaxRecvGIOPPayloadSize 268435456'"(Orb.cpp:842)
06:01:14.047 [26544] <2> Orb::init: caching EndpointSelectorFactory(Orb.cpp:857)
06:01:14.047 [26544] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1507)
06:01:14.047 [26544] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1507)
06:01:14.617 [26544] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1507)
06:06:14.182 [26544] <2> FATClib::fatClientOpenPipe: CORBA exception caught system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'
06:06:14.183 [26544] <2> ConnectionCache::connectAndCache: Acquiring new connection for host xxx.xxx.com, query type 1
06:06:14.185 [26544] <2> vnet_pbxConnect: pbxConnectEx Succeeded
06:06:14.185 [26544] <2> logconnections: BPDBM CONNECT FROM 0.0.0.0.49495 TO 0.0.0.0.1556 fd = 25
06:06:14.185 [26544] <8> vnet_check_vxss_client_magic_with_info: [vnet_vxss_helper.c:871] Ignoring VxSS authentication 2 0x2
06:06:14.193 [26544] <2> db_end: Need to collect reply
06:06:14.194 [26544] <16> setup_fat_server_and_client: Could not open FT Client pipe for client xxx.xxx.com: pipe open failed (13369361)
06:06:14.194 [26544] <2> send_MDS_msg: OP_STATUS 0 15967675 xxx.xxx.com 20481 5 0 0 0 0 0 0 *NULL* 0
06:06:14.209 [26544] <2> send_ft_operation_error: Decoded status = 1 from 5
06:06:14.210 [26544] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1507)
06:06:15.668 [26544] <2> bptm: Calling tpunmount for media @aaaaw
06:06:15.668 [26544] <2> send_MDS_msg: MEDIA_DONE 0 -1421452857 0 @aaaaw 0 180 {DA034B7C-A489-11E4-97D5-3F794074E4A3}
06:06:15.668 [26544] <2> packageBptmResourceDoneMsg: msg (MEDIA_DONE 0 -1421452857 0 @aaaaw 0 180 {DA034B7C-A489-11E4-97D5-3F794074E4A3})
06:06:15.668 [26544] <2> packageBptmResourceDoneMsg: keyword MEDIA_DONE version 0 jobid -1421452857 copyNum 0 mediaId @aaaaw mediaKey 0 unloadDelay 180 allocId {DA034B7C-A489-11E4-97D5-3F794074E4A3}
06:06:15.668 [26544] <2> packageBptmResourceDoneMsg: returns 0
06:06:15.669 [26544] <2> JobInst::sendIrmMsg: returning
06:06:15.669 [26544] <2> bptm: EXITING with status 83 <----------
06:06:15.671 [26544] <2> bp_sts_close_server: STS session not initialized
06:06:15.671 [26544] <2> cleanup: Detached from BPBRM shared memory

BPBRM:

06:01:12.798 [26530] <2> bpbrm write_msg_to_progress_file: INF - Client read timeout = 1800
06:01:12.805 [26530] <2> bpbrm write_msg_to_progress_file: INF - Media mount timeout = 0
06:01:13.594 [26530] <2> bpbrm spawn_child: /usr/openv/netbackup/bin/bptm bptm -w -c xxx.xxx.com -dpath xx-lsu -stunit xx-su -cl policyname -bt 1422187270 -b xxx.xxx.com_1422187270 -st 2 -cj 1 -reqid -1421452857 -shmfat -jm -brm -hostname xxx.xxx.com -L /usr/openv/netbackup/logs/user_ops/dbext/logs/4776.0.1422187264 -ru oracle -rclnt xxx.xxx.com -rclnthostname xxx.xxx.com -rl 10 -rp 3024000 -sl rman -ct 4 -maxfrag 524288 -eari 0 -mediasvr xxx.xxx.com -no_callback -connect_options 0x01010100 -jobid 7871021 -jobgrpid 7871021 -masterversion 750000 -bpbrm_shm_id 1371504675 -blks_per_buffer 512
06:01:13.595 [26530] <2> bpbrm write_continue_backup: wrote CONTINUE BACKUP on COMM_SOCK <7>
06:01:13.595 [26530] <2> write_file_names: buffering file name '/arc.dI106PR.s16443.p1' for output
06:01:13.595 [26530] <2> write_file_names: successfully wrote buffer to COMM_SOCK
06:01:13.595 [26530] <2> bpbrm main: wrote CONTINUE on COMM_SOCK
06:01:13.595 [26530] <2> bpbrm main: closing DATA_SOCK
06:01:13.595 [26530] <2> bpbrm main: closing COMM_SOCK
06:06:15.685 [26530] <2> bpbrm check_for_signals: bpbrm received SIGCHLD
06:06:15.685 [26530] <2> bpbrm wait_for_child: start
06:06:15.686 [26530] <2> bpbrm wait_for_child: child exit_status = 83 signal_status = 0
06:06:15.686 [26530] <2> bpbrm check_for_signals: child exit abnormal - status = 83
06:06:15.686 [26530] <2> bpbrm kill_child_process: start
06:06:15.686 [26530] <2> bpbrm Exit: client backup EXIT STATUS 83: media open error
06:06:15.693 [26530] <2> job_monitoring_exex: ACK disconnect
06:06:15.693 [26530] <2> job_disconnect: Disconnected

revarooo · ‎01-27-2015

199 and 200 logs will be needed as mentioned

ADNW · ‎01-27-2015

There are no entries in the 199 log for these jobs.

ADNW · ‎01-27-2015

No entries at all between midnight and 1 PM...

$ ./vxlogview -o 199 -b "01/25/2015 12:00:00 AM" -e "1/25/2015 01:30:00 PM"
01/25/2015 00:15:11.721 [TestPipe] Sending VRTS_ASCQ_DATA_NOT_READY check condition to PN:IID:TID:LUN[1:6:0:0]
01/25/2015 00:15:11.721 [TestPipe] Sending VRTS_ASCQ_DATA_NOT_READY check condition to PN:IID:TID:LUN[1:6:0:1]
01/25/2015 13:00:30.497 [ProcessManage] node IOTID[3,10,0,65535] 0x7fbbe0033530 STMTMTYPE_PortDBChange type 0 STMTMLoggedIn
01/25/2015 13:00:36.506 [ProcessManage] node IOTID[3,10,0,65535] 0x7fbbe0033730 STMTMTYPE_PortDBChange type 0 STMTMLoggedIn

ADNW · ‎01-29-2015

I'm very curious about these timeouts. I've confirmed that they are not set by any of the standard timeout settings (CLIENT_CONNECT_TIMEOUT, SERVER_CONNECT_TIMEOUT, CLIENT_READ_TIMEOUT, BPSTART_NOTIFY_TIMEOUT, BPEND_NOTIFY_TIMEOUT, ).

17:58:34.536 [11510] <2> Orb::setOrbConnectTimeout: timeout seconds: 60(Orb.cpp:1498)
17:58:34.536 [11510] <2> Orb::setOrbRequestTimeout: timeout seconds: 1800(Orb.cpp:1507)
17:58:34.632 [11510] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1507)

Does anyone know where these timeout values are coming from? Is there no way to change them? If not, does anyone think lowering the keepalive would help?

VOX

ORB Timeouts