cancel
Showing results for 
Search instead for 
Did you mean: 

Job running on Client but not on Master Server

Mincom_NT
Level 4
I am having a problem with a particular drive on a Cluster resource

Master Server - Netbackup DC v 4.5
Client Server - Windows 2000 - Netbackup agent 4.5

The client policy for the Cluster reource that is on this physical resource has 7 Drives to backup specifies as D: E: ect

When I run a backup of the client, every backup runs with no problem apart from the J: drive. The job fails with the following error 150 (termination requested by administrator) when I open the bprm log, there is a Socket connection failure for that job.

I logged onto the client and opened the job tracker, I noticed that there was a backup job running in a non progressing state from 1 week ago. But there is no active job in the netbackup console. Restarting the client agent or the server does not stop the job, uninstalling and re-installing the agent does not restart the job wither. I ran a NET SESSION \\%MASTERSERVER% command from the client and there was no network connection back to the master server.

So basically I need to find a way to cancel this supposedly active job in order for new jobs to run against the J: drive.
8 REPLIES 8

Mincom_NT
Level 4
This is the exact message that appears in the job window

16/01/2007 11:21:37 AM - mounting 0427L1
16/01/2007 11:21:37 AM - started process bpbrm (3596)
16/01/2007 11:21:37 AM - connecting
16/01/2007 11:21:37 AM - connected; connect time: 00:00:00
16/01/2007 11:22:23 AM - mounted; mount time: 00:00:46
16/01/2007 11:22:36 AM - positioning 0427L1 to file 195
16/01/2007 11:23:54 AM - positioned; position time: 00:01:18
16/01/2007 11:23:54 AM - begin writing
16/01/2007 12:05:45 PM - Error bpbrm(pid=3616) socket read failed, An existing connection was forcibly closed by the remote host. (10054)
16/01/2007 12:05:45 PM - Error bptm(pid=3184) socket operation failed - 10054 (at .\child.c.1099)
16/01/2007 12:05:45 PM - Error bptm(pid=3184) unable to perform read from client socket, connection may have been broken
16/01/2007 12:05:45 PM - Error bpbrm(pid=3616) could not send server status message
16/01/2007 12:06:01 PM - end writing; write time: 00:42:07

Dennis_Strom
Level 6
verify that you can connect to the client by running telnet clientname bpcd you can also try bpcoverage -c clientname from the server. If it does not work verify that you can just ping the client from the server and the server from the client. Do a nslookup hostname and nslookup IP from both client and master. You probably want to make sure that the client name is listed correctly in a policy that the master server is listed correctly in the clients bp.conf file. You should be able to kill that process on the client.

Mincom_NT
Level 4
All ports are listening on the client and name resolution is fine. This is of no help though, as I previously mentioned that other drives are getting backed up on the client, just not the J:\ drive. If there was a name resolution or port problem then the whole backup would be failing and not just 1 Drive.

I also mentioned that I cycled the services and restarted the client server (and master server as a test) as well as uninstalling and re-installing the client agent, but the job remains in the client job tracker as an active job. I know I "Should" be able to kill the job, but the job is not listed in the activity monitor.

What I need is a way to tell the client that there is no backup running so it will allow the master server to connect to the J:\ drive and back it up, as it thinks that it has a connection already


Results of port query on the client

Slow link delay enabled

portqry -n %MYCLIENTSERVER% -s -o 13782,13724,13783,13722

Querying target system called:
Slow link delay enabled
%MYCLIENTSERVER%
Attempting to resolve name to IP address...
Name resolved to %MYCLIENTIP%


TCP port 13782 (bpcd service): LISTENING
TCP port 13724 (vnetd service): LISTENING
TCP port 13783 (vopied service): LISTENING
TCP port 13722 (unknown service): LISTENING

Chia_Tan_Beng
Level 6
Hi Mincom,

If you have reboot the client, all orphan process/connections should have cleared. Is your J drive a mapped drive or network drive? If yes, you need to check the "Backup network drives" in the policy's attribute.

Can you try creating a separate test policy just to backup J drive alone?

Gerald_W__Gitau
Level 6
Certified
If you still have problems try this:

Open %installdirectory%VERITAS\NetBackup\db\jobs

open the folder ffilelogs and delete the coressponding job number(if you don't know the number just wipe everything if no backup is running).

Do the same for restart and trylogs folder. Restart the Netbackup services on master server and viola - restart the backup job.

Mincom_NT
Level 4
In answer to the earlier Post. I have tried several attempts at backing up this drive using a new policy and also modifying the original policy to remove the other drives with no success. I cleared out the Jobs directory (moved the entire contents to a backup directory and kicked off the test job again. Still no success. In the Job Tracker on the client, it states that it is processing files, but on the job that is listed under the activity Monitor on the master server there appears to be nothing happening, job status stays at "Writing" abut it does not state that it is actually writing any information. Then the job fails with the below message.



29/01/2007 9:52:19 AM - mounting 0428L1
29/01/2007 9:52:20 AM - started process bpbrm (4036)
29/01/2007 9:52:20 AM - connecting
29/01/2007 9:52:20 AM - connected; connect time: 00:00:00
29/01/2007 9:53:01 AM - mounted; mount time: 00:00:42
29/01/2007 9:53:14 AM - positioning 0428L1 to file 30
29/01/2007 9:55:09 AM - positioned; position time: 00:01:55
29/01/2007 9:55:09 AM - begin writing
29/01/2007 10:50:40 AM - Error bptm(pid=4356) socket operation failed - 10054 (at .\child.c.1099)
29/01/2007 10:50:40 AM - Error bpbrm(pid=3788) socket read failed, An existing connection was forcibly closed by the remote host. (10054)
29/01/2007 10:50:40 AM - Error bptm(pid=4356) unable to perform read from client socket, connection may have been broken
29/01/2007 10:50:40 AM - Error bpbrm(pid=3788) could not send server status message
29/01/2007 10:50:57 AM - end writing; write time: 00:55:48
termination requested by administrator(150)
29/01/2007 10:50:57 AM - Error bpsched(pid=4128) backup of client %SERVERNAME% exited with status 150 (termination requested by administrator)

Mincom_NT
Level 4
Interface: 10.1.2.155 --- 0x10005
Internet Address Physical Address Type
10.1.2.154 00-10-18-03-8a-45 dynamic
09:52:20.049 <2> bpcd main:
09:52:20.049 <2> bpcd main: BPCD_GET_STDOUT_HOST_SOCKET_RQST
09:52:20.049 <2> bpcd main: socket port number = 13724
09:52:20.221 <2> get_vnetd_socket: connected to vnetd socket 1900
09:52:20.221 <2> bpcd main: Connected on output socket
09:52:20.221 <2> bpcd main: Skipping shutdown of send side of stdout.
09:52:20.221 <2> bpcd main: Duplicated socket on stdout
09:52:20.221 <2> bpcd main: BPCD_FORK_CMD_RQST
09:52:20.221 <2> bpcd main: fork cmd = /usr/openv/netbackup/bin/bpbkar bpbkar32 -r 4864904 -ru root -dt 0 -to 1800 -clnt %SERVERNAME% -class %SERVERNAME%_Catchup -sched Weekly -st FULL -bpstart_to 300 -bpend_to 300 -read_to 1800 -stream_count 1 -stream_number 1 -jobgrpid 1 -tir -tir_plus -use_otm -b %SERVERNAME%_1170028336 -kl 365 -ct 13
09:52:20.221 <2> bpcd main: Convert args to CreateProcess format
09:52:20.221 <2> bpcd main: Done converting args to CreateProcess format
09:52:20.221 <2> bpcd main: new fork cmd = C:\Program Files\VERITAS\NetBackup\bin\bpbkar32.exe -r 4864904 -ru root -dt 0 -to 1800 -clnt %SERVERNAME% -class %SERVERNAME%_Catchup -sched Weekly -st FULL -bpstart_to 300 -bpend_to 300 -read_to 1800 -stream_count 1 -stream_number 1 -jobgrpid 1 -tir -tir_plus -use_otm -b %SERVERNAME%_1170028336 -kl 365 -ct 13
09:52:20.221 <2> bpcd main: Before CreateProcess
09:52:20.221 <2> bpcd main: StdOutput assigned the value STDOUTSOCK
09:52:20.237 <2> bpcd main: After CreateProcess, pid = 2128
09:52:20.237 <2> bpcd exit_bpcd: exit status 0 ----------->exiting
10:50:56.836 <2> bpcd main: offset to GMT -36000
10:50:56.851 <2> bpcd main: Got socket for input 300
10:50:56.867 <2> logconnections: BPCD ACCEPT FROM 172.16.100.38.598 TO 172.16.100.45.13782
10:50:56.867 <2> bpcd main: setup_sockopts complete
10:50:56.914 <2> bpcd peer_hostname: Connection from host %BACKUPSERVERNAME%.%FULLDOMAINNAME% (172.16.100.38) port 598
10:50:56.914 <2> bpcd valid_server: comparing %BACKUPSERVERNAME% and %BACKUPSERVERNAME%.%FULLDOMAINNAME%
10:50:56.914 <4> bpcd valid_server: hostname comparison succeeded
10:50:56.914 <2> bpcd main: output socket port number = 13782
10:50:57.336 <2> get_vnetd_socket: connected to vnetd socket 1924
10:50:57.336 <2> bpcd main: Duplicated vnetd socket on stderr
10:50:57.336 <2> bpcd main: <---- NetBackup 4.5FP_3GA 0 ------------initiated

-------------

NOTES

I have replaced the name of the client and master server

172.16.100.38 is the IP for the master server
172.16.100.45 is the IP for the client

Mincom_NT
Level 4
Well. I have received a work Around from Symantec.

They told me to disable the VSP for this client. Good for the moment, but not a permanent solution