rman child jobs are not created

mahmoud_mohamed · ‎09-01-2017

Hi ,

i have a windows server client containg oracle db .

backup was running successfully but suddendly parent process stop generating child jobs

this is job details of parent job

ob Details :

Aug 31, 2017 5:10:00 PM - Info nbjm (pid=5608) starting backup job (jobid=6852875) for client CAISVGIS05, policy SV_GIS_DB_Weekly_New, schedule Weekly_Full

Aug 31, 2017 5:10:00 PM - Info nbjm (pid=5608) requesting MEDIA_SERVER_WITH_ATTRIBUTES resources from RB for backup job (jobid=6852875, request id:{7545E76E-8E5E-11E7-9618-F0921C18FD94})

Aug 31, 2017 5:10:00 PM - requesting resource SV_DD4500_Boost_caisvvmb01

Aug 31, 2017 5:10:00 PM - requesting resource bkpsrv1.NBU_CLIENT.MAXJOBS.CAISVGIS05

Aug 31, 2017 5:10:00 PM - requesting resource bkpsrv1.NBU_POLICY.MAXJOBS.SV_GIS_DB_Weekly_New

Aug 31, 2017 5:10:00 PM - granted resource bkpsrv1.NBU_CLIENT.MAXJOBS.CAISVGIS05

Aug 31, 2017 5:10:00 PM - granted resource bkpsrv1.NBU_POLICY.MAXJOBS.SV_GIS_DB_Weekly_New

Aug 31, 2017 5:10:00 PM - granted resource SV_DD4500_Boost_caisvvmb01

Aug 31, 2017 5:10:02 PM - estimated 0 kbytes needed

Aug 31, 2017 5:10:02 PM - Info nbjm (pid=5608) started backup (backupid=CAISVGIS05_1504192202) job for client CAISVGIS05, policy SV_GIS_DB_Weekly_New, schedule Weekly_Full on storage unit SV_DD4500_Boost_caisvvmb01

Aug 31, 2017 5:10:03 PM - started process bpbrm (pid=8384)

Aug 31, 2017 5:10:59 PM - Info bpbrm (pid=8384) CAISVGIS05 is the host to backup data from

Aug 31, 2017 5:10:59 PM - Info bpbrm (pid=8384) reading file list for client

Aug 31, 2017 5:10:59 PM - connecting

Aug 31, 2017 5:11:07 PM - Info bpbrm (pid=8384) starting bphdb on client

Aug 31, 2017 5:11:07 PM - connected; connect time: 0:00:00

Aug 31, 2017 5:12:14 PM - Info bphdb (pid=2604) Backup started

Aug 31, 2017 5:40:25 PM - Error bpbrm (pid=8384) cannot send mail to root

Aug 31, 2017 5:40:26 PM - Info bphdb (pid=2604) done. status: 150: termination requested by administrator

Aug 31, 2017 5:40:26 PM - end writing

termination requested by administrator (150)

and the backup selection is : G:\RMAN_Net_Backup\Scripts\Weekly_Inc0.cmd

the attached files are log files for bpcd , bphdb , dbclient , rman script , log files of rman

Thiago_Ribeiro · ‎09-01-2017

Hi @mahmoud_mohamed,

Im assuming that this status code 150 was because you cancel this job right?

From rman log, I was seeing this script at the SEND and I didnt find NB_ORA_SCHED...Did you try to put?

As far I know ,the oracle scripts backups use some attributes like these:

Example:
NB_ORA_CLIENT=
NB_ORA_SERV=
NB_ORA_POLICY=
NB_ORA_SCHED=

Recovery Manager: Release 11.2.0.4.0 - Production on Thu Aug 31 15:28:38 2017

connected to target database: EMEGS (DBID=1140985314)
connected to recovery catalog database

RMAN> RUN {
2>
3> ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
4> ALLOCATE CHANNEL ch02 TYPE 'SBT_TAPE';
5> ALLOCATE CHANNEL ch03 TYPE 'SBT_TAPE';
6>

Your SEND

7> SEND 'NB_ORA_CLIENT=CAISVGIS05,
NB_ORA_SERV=bkpsrv1,
NB_ORA_POLICY=SV_GIS_DB_Weekly_New'
NB_ORA_SCHED=???

Regards,

Thiago

mahmoud_mohamed · ‎09-01-2017

Hi Thiago‚

Thanks for you reply ‚ yes you are right ‚ cancelled it

It didn't provide it before and it was working good

Do you think this is the problem ?

Can you bphdb errors ?

Thiago_Ribeiro · ‎09-01-2017

Hi,

Maybe not, once you said that this backup was working well before...What changed in your environment?

I saw these errors on bphdb and Im would like to ask how is the communication between master, media and client?

Can you test the communication using these commands bpclntcmd and bptestbpcd on Master-Media and client.

bphdb

17:01:55.438 [8960.7560] <16> bphdb: ERR - send() to server failed: An existing connection was forcibly closed by the remote host.
17:01:55.438 [8960.7560] <16> bphdb: ERR - could not write keepalive to the NAME socket
17:01:56.454 [8960.7560] <16> bphdb: ERR - failed executing command <"G:\RMAN_Net_Backup\Scripts\Daily_Inc1.cmd">
17:01:56.454 [8960.7560] <16> bphdb: ERR - send() to server failed: An existing connection was forcibly closed by the remote host.
17:01:56.454 [8960.7560] <16> bphdb: ERR - could not write ERR - failed executing command <"G:\RMAN_Net_Backup\Scripts\Daily_Inc1.cmd">

Regards,

Thiago

mahmoud_mohamed · ‎09-01-2017

What should I test using bpclntcmd ?

I have tested telnet on port 13724 from all directions and it working good .

Thiago_Ribeiro · ‎09-01-2017

Hi,

You can follow this TN to test

Explanation of bpclntcmd options and recommended troubleshooting when the commands return errors - https://www.veritas.com/support/en_US/article.TECH50198

You can test these ports

Port	Description	Protocol	Direction of Connection
13724	VNETD – Netbackup Network Daemon	TCP	Bidirectional
1556	PBX – VxPBX Symantec Private Branch Exchange	TCP	Bidirectional
13782	BPCD – Netbackup Connection Daemon	TCP	Bidirectional
13720	BPRD – Netbackup Request Daemon	TCP	BiBidirectional

Regards,

Thiago

mahmoud_mohamed · ‎09-02-2017

I have test all connection and they are working except port 13720 on client

As bprd process run only on master server

I check the routing. , every thing is OK

When the DBA run the script manaul , it should generate job at Activity monitor and that doesn't happen

Marianne · ‎09-02-2017

Please show us output of this command on the client:
bpclntcmd -pn

Handy NetBackup Links

mahmoud_mohamed · ‎09-02-2017

Hi Marianne, Nice to see you here , thanks for support .

"C:\Program Files\Veritas\NetBackup\bin\bpclntcmd. exe" -pn e

xpecting response from server bkpsrv1 caisvgis05 CAISVGIS05 10.195.4.225 56866 >"

C:\Program Files\Veritas\NetBackup\bin\bpclntcmd. exe" -ip 10.195.4.55 host 10.195.4.55: bkpsrv1 at 10.195.4.55

DBA sent us this message

Some Rman sessions stuck with event “Backup: MML create a backup piece” for more than 23 hours , from activity monitor this was the parent job that last for 23 hours with out generating child jobs

I think the problem is from db itself , as we didn't change any thing , I also noticed that when he run manual job , it doesn't come to activity monitor And fs backup is running successfully

Marianne · ‎09-02-2017

Seems you have identified the problem.

"Rman sessions stuck with event “Backup: MML create a backup piece” " means that there is something wrong on Oracle side.
Nothing to do from NBU side.

Handy NetBackup Links

mahmoud_mohamed · ‎09-02-2017

Hi Marianne ,

But why this error appear

bphdb: ERR - send() to server failed: An existing connection was forcibly closed by the remote host.
17:01:55.438 [8960.7560] <16> bphdb: ERR - could not write keepalive to the NAME socket

And should I tell DBA to do to the rman backup without netbackup ?

Like trying to backup control file on local disk ?

mahmoud_mohamed · ‎09-03-2017

DBA was able to backup control file successfully on disk

Marianne · ‎09-03-2017

A normal rman backup script starts with databases, then archive logs, and only then control file.
So, doing a control file backup to disk does not proof anything.

Ask the dba to do control file backup to sbt_tape (NBU). And a database backup to disk.

About the errors in bphdb - it seems to be caused by os-level keepalive timeout:

"could not write keepalive to the NAME socket"
There seems to be long timeouts on the master and media server, causing the parent job to sit and wait for jobs to be initiated from the client.

Your dba should be able to increase logging level within rman. This should help with troubleshooting.

PS:
I have not been able to look at your logs. I only look at forum posts on my phone over weekends. Can only open .txt files.

Handy NetBackup Links

mahmoud_mohamed · ‎09-05-2017

Hi Marianne ,

after checking with support , we found allocate command not initaing request to master server , however client can reach master server normally

he think that it might be aproblem from database or corrupted files from netbackup agent

mahmoud_mohamed · ‎09-08-2017

policy was running successfully , we uninstall netbackup agent and reboot the server .

it seems there was some corrupted files , or problems with windows server

so we took downtime , and reboot the server and child jobs were creating successfully

Thanks all for your support

VOX

rman child jobs are not created