i'm having a similar problem. left unattended, a job that stalls like this will run forever (despite the 23 hour time limit on the job). a job that stalls like this seems to resist being stopped, in fact, the only way i've found to stop these jobs is to terminate the job engine and restart all the services.
i happened to have debug logging enabled, here are the relevant entries of one of my stalled jobs:
6c4 10/31/2005 22:29:06: TF_NDMPGetResult(): MediaServer thread done, returning TFLE 0
6c4 10/31/2005 22:29:08: WriteEndSet( 1 ) returning 0
6c4 10/31/2005 22:29:09: WriteEndSet( 1 ) returning 0
6c4 10/31/2005 22:29:09: WriteEndSet( 0 ) returning 0
6c4 10/31/2005 22:29:09: HARDWARE COMPRESSION ===> Setting compression off.
6c4 10/31/2005 22:29:09: TF_CloseSet
6c4 10/31/2005 22:29:09: NDMPEngine::Run(): ProcessBSD() returned 0
6c4 10/31/2005 22:29:10: SetupNDMPConnection(RANT)
6c4 10/31/2005 22:29:10: SetupNDMPConnection('CAPTAIN' Use Auth: True)
6c4 10/31/2005 22:29:11: Informational: Subnet Address 192.168.1.0 specified
6c4 10/31/2005 22:29:11: Informational: Subnet Mask 255.255.255.0 input
6c4 10/31/2005 22:29:11: Informational: Local network address 192.168.1.200 returned
6c4 10/31/2005 22:29:11: Informational: Local network address 192.168.1.200 obtained in the subnet specified
6c4 10/31/2005 22:29:11: NDMP version 3 connection CONNECTED
6c4 10/31/2005 22:29:12: SetupNDMPConnection(CAPTAIN) connection open Success
6c4 10/31/2005 22:29:12: SetupNDMPConnection(CAPTAIN) Find User Name & Password
6c4 10/31/2005 22:29:12: ndmpcGetUsernamePassword('\\CAPTAIN\C:')
6c4 10/31/2005 22:29:12: ndmpcGetUsernamePassword('\\CAPTAIN\C:') Not Found
6c4 10/31/2005 22:29:12: ndmpcGetUsernamePassword('\\CAPTAIN')
6c4 10/31/2005 22:29:12: ndmpcGetUsernamePassword('\\CAPTAIN') Found
6c4 10/31/2005 22:29:12: SetupNDMPConnection(CAPTAIN): connectClientAuth()
6c4 10/31/2005 22:29:12: ndmpcSnapshotPrepare: Warning. OSId: 35 is not snappable. Skipped.
6c4 10/31/2005 22:29:12: ndmpcSnapshotPrepare: Warning. No devices to snap. Returning with NDMP_SNAPSHOT_NO_DEVICES2SNAP
6c4 10/31/2005 22:29:12: NDMPEngine::Run(): calling ProcessBSD()
6c4 10/31/2005 22:29:12: TF_OpenSet( )
6c4 10/31/2005 22:29:12: Requested Set: ID = ffffffff Seq = -1 Set = -1
6c4 10/31/2005 22:29:12: Current VCB: ID = 335bb149 Seq = 4 Set = 31
6c4 10/31/2005 22:29:12: PositionAtSet( :( TF Msg = 2
6c4 10/31/2005 22:29:12: UI Msg = 8002
6c4 10/31/2005 22:29:12: HARDWARE COMPRESSION ===> Compression is configurable.
6c4 10/31/2005 22:29:12: GET_DRV_INF: bsize = 8192
6c4 10/31/2005 22:29:12: SetupFormatEnv( fmt=0 )
6c4 10/31/2005 22:29:12: End of TF_OpenSet: Ret_val = 0 Buffs = 2 HiWater = 0
6c4 10/31/2005 22:29:12: HARDWARE COMPRESSION ===> Setting compression on.
6c4 10/31/2005 22:29:12: Current Block is = 1a4ba4
6c4 10/31/2005 22:29:12: ndmpcGetUsernamePassword('\\CAPTAIN\C:')
6c4 10/31/2005 22:29:12: ndmpcGetUsernamePassword('\\CAPTAIN\C:') Not Found
6c4 10/31/2005 22:29:12: ndmpcGetUsernamePassword('\\CAPTAIN')
6c4 10/31/2005 22:29:12: ndmpcGetUsernamePassword('\\CAPTAIN') Found
6c4 10/31/2005 22:29:13: Informational: Subnet Address 192.168.1.0 specified
6c4 10/31/2005 22:29:13: Informational: Subnet Mask 255.255.255.0 input
6c4 10/31/2005 22:29:13: Informational: Local network address 192.168.1.200 returned
6c4 10/31/2005 22:29:13: Informational: Local network address 192.168.1.200 obtained in the subnet specified
6c4 10/31/2005 22:29:13: OpenListenSocketSpNic: Specific address sent: c801a8c0
6c4 10/31/2005 22:29:13: OpenListenSocket: Media server IP address: 0
6c4 10/31/2005 22:29:13: OpenListenSocket: Media server port: aaff
6c4 10/31/2005 22:29:15:
dataStartBackup: ndmpSendRequest returned: 0x0, 0
fd4 10/31/2005 22:30:04: DeviceManager: timeout event fired
fd4 10/31/2005 22:30:04: DeviceManager: processing pending requests
fd4 10/31/2005 22:30:04: DeviceManager: going to sleep for 900000 msecs
fd4 10/31/2005 22:45:04: DeviceManager: timeout event fired
timeout events continue until the 23 hour time-out
fd4 11/1/2005 21:00:04: DeviceManager: timeout event fired
fd4 11/1/2005 21:00:04: DeviceManager: processing pending requests
fd4 11/1/2005 21:00:04: DeviceManager: going to sleep for 900000 msecs
580 11/1/2005 21:00:08: AbortJobWithName( {3F6FBCFF-2411-45B2-9AAE-ADE9F6A262CF}, , ) = 0
fd4 11/1/2005 21:15:04: DeviceManager: timeout event fired
fd4 11/1/2005 21:15:04: DeviceManager: processing pending requests
fd4 11/1/2005 21:15:04: DeviceManager: going to sleep for 900000 msecs
CAPTAIN, btw, is the 4th server in this backup job. the 3 prior are backed up fine, and the 3 behind it never got touched.
this same job ran perfectly the friday before, collecting 4g in just under an hour, but it stalled monday after 2 gigs, some 81 hours into the job.
BE Server is Dell Poweredge 2850, with attached sony DDS4 11000