cancel
Showing results for 
Search instead for 
Did you mean: 

Backup SAP bcakup using rman fail with code 25

albatroz19
Level 5

Hi,

i have problem with a sap backup for db that fail at same time with this error:


02/10/2020 04:01:25 - Info nbjm (pid=27874) starting backup job (jobid=3416448) for client marcos, policy ORA-HOT_UNX_P1E_1M, schedule FULL_1M
02/10/2020 04:01:25 - Info nbjm (pid=27874) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=3416448, request id:{9FF1B7A0-4BB1-11EA-977F-2B0A0F647A79})
02/10/2020 04:01:25 - requesting resource CE_DOM1_5330_PDDO
02/10/2020 04:01:25 - requesting resource bcksrv.NBU_CLIENT.MAXJOBS.marcos
02/10/2020 04:01:25 - requesting resource bcksrv.NBU_POLICY.MAXJOBS.ORA-HOT_UNX_P1E_1M
02/10/2020 04:01:53 - awaiting resource CE_DOM1_5330_PDDO. Maximum job count has been reached for the storage unit.
02/10/2020 04:02:33 - Info bpbrm (pid=309154) marcos is the host to backup data from
02/10/2020 04:02:33 - Info bpbrm (pid=309154) reading file list for client
02/10/2020 04:02:33 - Info bpbrm (pid=309154) listening for client connection
02/10/2020 04:02:33 - Info bpbrm (pid=309154) connect failed STATUS (18) CONNECT_FAILED
02/10/2020 04:02:33 - Info bpbrm (pid=309154) status: FAILED, (41) ADDR_IN_USE; system: (98) Address already in use; FROM 0.0.0.0 TO marcos 10.201.24.140 bpcd
02/10/2020 04:02:33 - Error bpbrm (pid=309154) cannot connect to marcos, Address already in use (98)
02/10/2020 04:02:33 - Error bpbrm (pid=309154) listen for client protocol error - couldn't write necessary information on /usr/openv/netbackup/logs/user_ops/dbext/logs/47505.0.1581303684
02/10/2020 04:02:34 - Info dbclient (pid=0) done. status: 25: cannot connect on socket
02/10/2020 04:02:49 - granted resource bcksrv.NBU_CLIENT.MAXJOBS.marcos
02/10/2020 04:02:49 - granted resource bcksrv.NBU_POLICY.MAXJOBS.ORA-HOT_UNX_P1E_1M
02/10/2020 04:02:49 - granted resource MediaID=@aaaac;DiskVolume=PureDiskVolume;DiskPool=dp_disk_bcksrvms01;Path=PureDiskVolume;StorageServer=bcksrvms01;MediaServer=bcksrvms01
02/10/2020 04:02:49 - granted resource CE_DOM1_5330_PDDO
02/10/2020 04:02:49 - granted resource TRANSPORT
02/10/2020 04:03:00 - estimated 0 kbytes needed
02/10/2020 04:03:00 - Info nbjm (pid=27874) started backup (backupid=marcos_1581303780) job for client marcos, policy ORA-HOT_UNX_P1E_1M, schedule FULL_1M on storage unit CE_DOM1_5330_PDDO
02/10/2020 04:03:01 - started process bpbrm (pid=309154)
02/10/2020 04:03:01 - connecting
02/10/2020 04:03:01 - end writing
cannot connect on socket (25)

 

this error is present always at 04.00. please can you help me?

Thanks and Regards

1 ACCEPTED SOLUTION

Accepted Solutions

Michal_Mikulik1
Moderator
Moderator
Partner    VIP    Accredited Certified

@albatroz19 :

I saw on media server that "cannot connect on socket" appear at 04:00 also for other clients, and there are many backup FS that start window is at 04:00.

Then you  should approach this problem as a Media Server issue, not SAP related issue as indicated in the subject. Probably port exhaustion? See OS/network logs at the Media Server.

Is there a chance to spread backups accross more time? Or start SAP backup at least at 3:55?

Regards

Michal

View solution in original post

7 REPLIES 7

Krutons
Moderator
Moderator
   VIP   

What is going on with the client at 04:00 when this occurs? Is there a lot of activity on this client around the time of the failure?

The bpbrm and bptm logs would be helpful here.

 02/10/2020 04:02:33 - Info bpbrm (pid=309154) connect failed STATUS (18) CONNECT_FAILED
02/10/2020 04:02:33 - Info bpbrm (pid=309154) status: FAILED, (41) ADDR_IN_USE; system: (98) Address already in use; FROM 0.0.0.0 TO marcos 10.201.24.140 bpcd
02/10/2020 04:02:33 - Error bpbrm (pid=309154) cannot connect to marcos, Address already in use (98)
02/10/2020 04:02:33 - Error bpbrm (pid=309154) listen for client protocol error - couldn't write necessary information on /usr/openv/netbackup/logs/user_ops/dbext/logs/47505.0.1581303684
02/10/2020 04:02:34 - Info dbclient (pid=0) done. status: 25: cannot connect on socket

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

If this only happens at 04:00, you need to find out what exactly is happening on the client at that time.

Get your server admin to assist - if nobody can be onsite, create a cron to run commands such as 'netstat -a' shortly before and after 4am.

Ensure that all relevant log folders exist - bptm and bpbrm on the media server, bpcd and dbclient on the SAP client. 

Check in Host Properties if any Connect Options have been customized for this client that could limit ports. 
Default connection for NBU 7.7 is pbx (1556), not bpcd as shown in Job details. 

You can also verify Connect Options by running this command on the master server:

bptestbpcd -client <name> -debug -verbose

See Command Ref Guide for explanation of Connect Options in the output. 
https://www.veritas.com/support/en_US/doc/15263389-127350397-0/v14667804-127350397

 

Hi all, thanks for support everytime, so,

I spoked with sys/sapadmin and no action matter at that moment (04:00), directories logs are present but non special information for me...

I saw on media server that "cannot connect on socket" appear at 04:00 also for other clients, and there are many backup FS that start window is at 04:00.

Host properties > port configured as default, on dbclient log found this:

04:03:50.531 [64936] <2> int_WriteData: INF - writing buffer # 6696 of size 4194304
04:03:51.926 [64693] <2> int_WriteData: INF - writing buffer # 5642 of size 4194304
04:03:52.561 [64936] <2> int_WriteData: INF - writing buffer # 6758 of size 4194304
04:03:53.963 [64693] <2> int_WriteData: INF - writing buffer # 5704 of size 4194304
04:03:54.617 [64936] <2> int_WriteData: INF - writing buffer # 6820 of size 4194304
04:03:56.031 [64693] <2> int_WriteData: INF - writing buffer # 5766 of size 4194304
04:03:56.667 [64936] <2> int_WriteData: INF - writing buffer # 6882 of size 4194304
04:03:58.075 [64693] <2> int_WriteData: INF - writing buffer # 5828 of size 4194304
04:03:58.722 [64936] <2> int_WriteData: INF - writing buffer # 6944 of size 4194304
04:04:00.156 [64693] <2> int_WriteData: INF - writing buffer # 5890 of size 4194304
04:04:00.811 [64936] <2> int_WriteData: INF - writing buffer # 7006 of size 4194304
04:04:02.279 [64693] <2> int_WriteData: INF - writing buffer # 5952 of size 4194304
04:04:02.840 [64740] <16> serverResponse: ERR - server exited with status 25: cannot connect on socket
04:04:02.840 [64740] <16> CreateNewImage: ERR - serverResponse() failed
04:04:02.840 [64740] <16> VxBSACreateObject: ERR - Could not create new image with file /ON_P1E_1032580956_374203_1.
04:04:02.840 [64740] <16> xbsa_CreateObject: ERR - VxBSACreateObject: Failed with error:
Server Status: cannot connect on socket
04:04:02.900 [64936] <2> int_WriteData: INF - writing buffer # 7068 of size 4194304
04:04:02.906 [64891] <16> serverResponse: ERR - server exited with status 25: cannot connect on socket
04:04:02.906 [64891] <16> CreateNewImage: ERR - serverResponse() failed
04:04:02.906 [64891] <16> VxBSACreateObject: ERR - Could not create new image with file /ON_P1E_1032581001_374204_1.
04:04:02.906 [64891] <16> xbsa_CreateObject: ERR - VxBSACreateObject: Failed with error:
Server Status: cannot connect on socket
04:04:04.334 [64693] <2> int_WriteData: INF - writing buffer # 6014 of size 4194304
04:04:04.952 [64936] <2> int_WriteData: INF - writing buffer # 7130 of size 4194304
04:04:06.389 [64693] <2> int_WriteData: INF - writing buffer # 6076 of size 4194304
04:04:07.081 [64936] <2> int_WriteData: INF - writing buffer # 7192 of size 4194304
04:04:08.585 [64693] <2> int_WriteData: INF - writing buffer # 6138 of size 4194304
04:04:09.197 [64936] <2> int_WriteData: INF - writing buffer # 7254 of size 4194304
04:04:10.695 [64693] <2> int_WriteData: INF - writing buffer # 6200 of size 4194304
04:04:11.319 [64936] <2> int_WriteData: INF - writing buffer # 7316 of size 4194304
04:04:12.795 [64693] <2> int_WriteData: INF - writing buffer # 6262 of size 4194304

@Marianne: Whit netstat -a at 04:00, what i could find like information?

 

Thanks and Regards

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Each PID in dbclient log represents a different backup stream.

These 2 streams were still 'happily' writing:

04:04:00.811 [64936] <2> int_WriteData: INF - writing buffer # 7006 of size 4194304
04:04:02.279 [64693] <2> int_WriteData: INF - writing buffer # 5952 of size 4194304

This stream needed to connect to bprd on the master server, but was unable to do so.
04:04:02.840 [64740] <16> serverResponse: ERR - server exited with status 25: cannot connect on socket
04:04:02.840 [64740] <16> VxBSACreateObject: ERR - Could not create new image with file /ON_P1E_1032580956_374203_1.
You need to check dbclient for all PID [64740] entries and also bprd on the master server for connection attempt from the client at this time. What time did this stream start and what happened to it before 04:04?

Same for this failing stream: 
04:04:02.906 [64891] <16> serverResponse: ERR - server exited with status 25: cannot connect on socket
04:04:02.906 [64891] <16> CreateNewImage: ERR - serverResponse() failed
04:04:02.906 [64891] <16> VxBSACreateObject: ERR - Could not create new image with file /ON_P1E_1032581001_374204_1.

netstat -a will show port connections and port status on the client.

Michal_Mikulik1
Moderator
Moderator
Partner    VIP    Accredited Certified

@albatroz19 :

I saw on media server that "cannot connect on socket" appear at 04:00 also for other clients, and there are many backup FS that start window is at 04:00.

Then you  should approach this problem as a Media Server issue, not SAP related issue as indicated in the subject. Probably port exhaustion? See OS/network logs at the Media Server.

Is there a chance to spread backups accross more time? Or start SAP backup at least at 3:55?

Regards

Michal

Hi Michal,

i was started from sap backup because the issue present on it, but with these verifications i saw that on media server there are present these "cannot connect on socket" for others clients specifically on 04:00am. How may i verify if it's present a "port exhaustion" error on media server?

@Marianne: This, on dbclient log for "64740" pid:

00:01:27.132 [64740] <2> int_CloseImage: INF - Backup - closing <ON_P1E_1032566177_374027_1>
00:01:27.132 [64740] <4> CleanUpShM: INF - dbclient waited 13680 times for empty buffer, delayed 13718 times
00:01:36.134 [64740] <2> int_GetBfsDateRange: INF - Using default date range
00:01:36.134 [64740] <2> int_logDateRange: INF - Start Time = 12/26/95 01:00:00
00:01:36.134 [64740] <2> int_logDateRange: INF - End Time = 02/18/20 00:01:36
00:01:36.134 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.426: errno: 2 2 0x00000002
00:01:36.134 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.427: path: /oracle/ORACLE/bp.conf
00:01:36.140 [64740] <2> vnet_pbxConnect: pbxConnectEx Succeeded
00:01:36.140 [64740] <2> logconnections: BPRD CONNECT FROM 10.201.24.107.41399 TO 10.201.27.248.1556 fd = 41
00:01:39.643 [64740] <2> xbsa_ProgressLogSetup: INF - No progress log: Job will proceed without progress logging
00:01:39.646 [64740] <2> vnet_pbxConnect: pbxConnectEx Succeeded
00:01:39.646 [64740] <2> logconnections: BPRD CONNECT FROM 10.201.24.107.11474 TO 10.201.27.248.1556 fd = 42
00:05:19.664 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.426: errno: 2 2 0x00000002
00:05:19.664 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.427: path: /oracle/ORACLE/bp.conf
00:05:28.668 [64740] <2> int_StartJob: INF - copyID: 1 - 1581894318
00:05:29.031 [64740] <2> int_WriteData: INF - writing buffer # 1 of size 4194304
00:05:31.021 [64740] <2> int_WriteData: INF - writing buffer # 62 of size 4194304
00:05:33.125 [64740] <2> int_WriteData: INF - writing buffer # 124 of size 4194304
00:05:35.228 [64740] <2> int_WriteData: INF - writing buffer # 186 of size 4194304
00:05:37.301 [64740] <2> int_WriteData: INF - writing buffer # 248 of size 4194304
..... same row
00:10:52.819 [64740] <2> int_WriteData: INF - writing buffer # 7998 of size 4194304
00:10:55.362 [64740] <2> int_WriteData: INF - writing buffer # 8060 of size 4194304
00:10:57.843 [64740] <2> int_WriteData: INF - writing buffer # 8122 of size 4194304
00:11:00.002 [64740] <2> int_WriteData: INF - writing buffer # 8184 of size 4194304
00:11:00.104 [64740] <2> int_CloseImage: INF - Backup - closing <ON_P1E_1032566499_374031_1>
00:11:00.104 [64740] <4> CleanUpShM: INF - dbclient waited 15121 times for empty buffer, delayed 15144 times
00:11:07.106 [64740] <2> int_GetBfsDateRange: INF - Using default date range
00:11:07.106 [64740] <2> int_logDateRange: INF - Start Time = 12/26/95 01:00:00
00:11:07.106 [64740] <2> int_logDateRange: INF - End Time = 02/18/20 00:11:07
00:11:07.106 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.426: errno: 2 2 0x00000002
00:11:07.106 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.427: path: /oracle/ORACLE/bp.conf
00:11:07.109 [64740] <2> vnet_pbxConnect: pbxConnectEx Succeeded
00:11:07.109 [64740] <2> logconnections: BPRD CONNECT FROM 10.201.24.107.11776 TO 10.201.27.248.1556 fd = 41
00:11:10.810 [64740] <2> xbsa_ProgressLogSetup: INF - No progress log: Job will proceed without progress logging
00:11:10.813 [64740] <2> vnet_pbxConnect: pbxConnectEx Succeeded
00:11:10.814 [64740] <2> logconnections: BPRD CONNECT FROM 10.201.24.107.20346 TO 10.201.27.248.1556 fd = 42
00:11:22.817 [64740] <2> int_StartJob: INF - copyID: 1 - 1581894672
00:11:23.001 [64740] <2> int_WriteData: INF - writing buffer # 1 of size 4194304
...
00:16:39.697 [64740] <2> int_WriteData: INF - writing buffer # 7998 of size 4194304
00:16:42.222 [64740] <2> int_WriteData: INF - writing buffer # 8060 of size 4194304
00:16:44.547 [64740] <2> int_WriteData: INF - writing buffer # 8122 of size 4194304
00:16:46.479 [64740] <2> int_CloseImage: INF - Backup - closing <ON_P1E_1032567070_374034_1>
00:16:46.479 [64740] <4> CleanUpShM: INF - dbclient waited 14877 times for empty buffer, delayed 14901 times
00:16:52.481 [64740] <2> int_GetBfsDateRange: INF - Using default date range
00:16:52.481 [64740] <2> int_logDateRange: INF - Start Time = 12/26/95 01:00:00
00:16:52.481 [64740] <2> int_logDateRange: INF - End Time = 02/18/20 00:16:52
00:16:52.481 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.426: errno: 2 2 0x00000002
00:16:52.481 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.427: path: /oracle/ORACLE/bp.conf
00:16:52.484 [64740] <2> vnet_pbxConnect: pbxConnectEx Succeeded
00:16:52.484 [64740] <2> logconnections: BPRD CONNECT FROM 10.201.24.107.9999 TO 10.201.27.248.1556 fd = 41
00:16:54.861 [64740] <2> xbsa_ProgressLogSetup: INF - No progress log: Job will proceed without progress logging
00:16:54.863 [64740] <2> vnet_pbxConnect: pbxConnectEx Succeeded
00:16:54.863 [64740] <2> logconnections: BPRD CONNECT FROM 10.201.24.107.56472 TO 10.201.27.248.1556 fd = 42
00:17:08.867 [64740] <2> int_StartJob: INF - copyID: 1 - 1581895019
00:17:09.042 [64740] <2> int_WriteData: INF - writing buffer # 1 of size 4194304
....
it repeat until 04:00, then:

04:01:42.597 [64740] <2> int_WriteData: INF - writing buffer # 7998 of size 4194304
04:01:42.686 [64740] <2> int_CloseImage: INF - Backup - closing <ON_P1E_1032580581_374199_1>
04:01:42.686 [64740] <4> CleanUpShM: INF - dbclient waited 14139 times for empty buffer, delayed 14219 times
04:02:30.691 [64740] <2> int_GetBfsDateRange: INF - Using default date range
04:02:30.691 [64740] <2> int_logDateRange: INF - Start Time = 12/26/95 01:00:00
04:02:30.691 [64740] <2> int_logDateRange: INF - End Time = 02/18/20 04:02:30
04:02:30.691 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.426: errno: 2 2 0x00000002
04:02:30.691 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.427: path: /oracle/ORACLE/bp.conf
04:02:30.694 [64740] <2> vnet_pbxConnect: pbxConnectEx Succeeded
04:02:30.695 [64740] <2> logconnections: BPRD CONNECT FROM 10.201.24.107.15004 TO 10.201.27.248.1556 fd = 41
04:02:36.830 [64740] <2> xbsa_ProgressLogSetup: INF - No progress log: Job will proceed without progress logging
04:02:36.833 [64740] <2> vnet_pbxConnect: pbxConnectEx Succeeded
04:02:36.833 [64740] <2> logconnections: BPRD CONNECT FROM 10.201.24.107.46099 TO 10.201.27.248.1556 fd = 42
04:04:02.840 [64740] <16> serverResponse: ERR - server exited with status 25: cannot connect on socket
04:04:02.840 [64740] <16> CreateNewImage: ERR - serverResponse() failed
04:04:02.840 [64740] <16> VxBSACreateObject: ERR - Could not create new image with file /ON_P1E_1032580956_374203_1.
04:04:02.840 [64740] <16> xbsa_CreateObject: ERR - VxBSACreateObject: Failed with error:
10:05:50.099 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.426: errno: 2 2 0x00000002
10:05:50.099 [64740] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.427: path: /oracle/ORACLE/bp.conf
10:05:50.099 [64740] <2> int_PerformMetadataCataloging: INF - freeing node for backup piece <ON_P1E_1032580581_374199_1>
10:05:50.099 [64740] <2> int_PerformMetadataCataloging: INF - freeing node for backup piece <ON_P1E_1032580215_374195_1>
10:05:50.099 [64740] <2> int_PerformMetadataCataloging: INF - freeing node for backup piece <ON_P1E_1032579870_374191_1>
10:05:50.099 [64740] <2> int_PerformMetadataCataloging: INF - freeing node for backup piece <ON_P1E_1032579544_374187_1>
.......
10:05:50.101 [64740] <2> int_PerformMetadataCataloging: INF - freeing node for backup piece <ON_P1E_1032544831_373780_1>
10:05:50.101 [64740] <4> sbtend: INF - --- END of SESSION ---
10:05:50.101 [64740] <8> close_image: Session being terminated abnormally, cleaning up
10:05:50.101 [64740] <4> close_image: INF - backup FAILED
10:05:50.101 [64740] <4> close_image: INF ---- end of Backup ---
10:05:50.101 [64740] <16> VxBSAEndTxn: ERR - Transaction ended with active Backup/Restore.
10:05:50.101 [64740] <16> xbsa_EndTransaction: ERR - VxBSAEndTxn: Failed with error:

it's a backup sap started 06:00pm to 10:05am.

Thanks

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@albatroz19 

We are not going to win on this forum.

Best that you log a call with Veritas Support where you are prepared to share all relevant logs (full log files, not just snippets). They will ask for higher level logs as logging level seems to be very low.

They will ask at a minimum for all of these logs (for the day the backup started up to the day when the backup failed):
Master: bprd
Media: bptm and bpbrm
Client: bpcd and dbclient

Good luck!