cancel
Showing results for 
Search instead for 
Did you mean: 

Backup failing with error 84 for huge data, Plus status 24

fmk
Level 3

 

Backup failing with error 84 for multiple clients for huge Data during Multiplexing, but successful if using small data. Errors: Media write error (84), FTL - socket write failed, 24: socket write failed, INF - Server status = 24 & cannot write data to network

 

Errors: Media write error(84), FTL - socket write failed, 24: socket write failed, INF - Server status = 24 & cannot write data to network:  An existing connection was forcibly closed by the remote host.

My environment:
NetBackup 7.1
All the systems are in 1 VMware ESXi 5.1 Server-
S002-VM2: Master Server (Windows 2008 Ent x86)
S003: Media Server (Windows 2008 Stn x86) – Firestreamer Virtual Tape Library
S001: Client (Windows 2008 Ent x86)
S004: Client (Windows 2008 Ent x86)
C007: Client (Windows 7 x86)
Policy: Clients – S001, C00-VM5 & S004. Multi streaming and multiple data streams enabled. Backup selections “C:\”.
 
Issue details:
Activity Monitor Job Error: media write error (84)
Job details: Critical bpbrm(pid=2360) from client C002-VM5.flab.com: FTL - socket write failed
 
Bpbrm logs on Media Server:
07:04:12.648 [2064.3400] <32> bpbrm handle_backup: from client C002-VM5.flab.com: FTL - socket write failed
07:04:12.648 [2064.3400] <2> ConnectionCache::connectAndCache: Acquiring new connection for host s002-vm2.flab.com, query type 1
07:04:12.648 [2064.3400] <2> vnet_pbxConnect: pbxConnectEx Succeeded
07:04:12.648 [2064.3400] <2> logconnections: BPDBM CONNECT FROM 10.0.0.8.49534 TO 10.0.0.4.1556 fd = 604
07:04:12.710 [2532.2328] <32> bpbrm handle_backup: from client S001.flab.com: FTL - socket write failed
07:04:12.710 [2532.2328] <2> ConnectionCache::connectAndCache: Acquiring new connection for host s002-vm2.flab.com, query type 1
07:04:12.710 [2532.2328] <2> vnet_pbxConnect: pbxConnectEx Succeeded
07:04:12.710 [2532.2328] <2> logconnections: BPDBM CONNECT FROM 10.0.0.8.49535 TO 10.0.0.4.1556 fd = 604
07:04:12.851 [3852.2368] <2> bpcr_get_platform_rqst: Server client platform length = 7
07:04:12.851 [3852.2368] <2> bpcr_check_for_use_ofb_support: bpcd platform win_x86
07:04:12.851 [3852.2368] <2> MNG: backup_cmd = /usr/openv/netbackup/bin/bpbkar bpbkar32 -r 604800 -ru root -dt 0 -to 0 -clnt S004.flab.com -class S-1-2-3-C-Drive_Big_MPX-VTL-onS003 -sched Full -st FULL -bpstart_to 300 -bpend_to 300 -read_to 9600 -blks_per_buffer 127 -stream_count 1 -stream_number 1 -jobgrpid 477 -use_otm -use_ofb -b S004.flab.com_1364088814 -kl 28 -fso -WOFB_enabled -WOFB_fim 1 -WOFB_usage 0 -WOFB_error 0 -ct 13
07:04:12.851 [3852.2368] <2> bpbrm handle_backup: forking client backup
07:04:12.851 [3852.2368] <2> bpbrm send_bpsched_connected_msg: sending bpsched msg: CONNECTED TO CLIENT FOR S004.flab.com_1364088814
07:04:12.897 [2064.3400] <2> db_end: Need to collect reply
07:04:12.897 [2064.3400] <2> bpbrm wait_for_mm_events: unexpected terminate
07:04:12.897 [2064.3400] <2> inform_client_of_status: INF - Server status = 150
07:04:12.897 [2064.3400] <2> put_string: cannot write data to network:  An existing connection was forcibly closed by the remote host.
07:04:12.897 [2064.3400] <16> inform_client_of_status: could not send server status message
07:04:12.897 [2064.3400] <2> ConnectionCache::connectAndCache: Acquiring new connection for host s002-vm2.flab.com, query type 1
07:04:12.913 [2064.3400] <2> vnet_pbxConnect: pbxConnectEx Succeeded
07:04:12.913 [2064.3400] <2> logconnections: BPDBM CONNECT FROM 10.0.0.8.49536 TO 10.0.0.4.1556 fd = 604
07:04:12.913 [2532.2328] <2> db_end: Need to collect reply
07:04:12.913 [2532.2328] <2> bpbrm handle_backup: client S001.flab.com EXIT STATUS = 24: socket write failed
07:04:12.913 [2532.2328] <2> inform_client_of_status: INF - Server status = 24
07:04:12.913 [2532.2328] <2> put_string: cannot write data to network:  An existing connection was forcibly closed by the remote host.
 
Bpbkar on client C002-VM5:
[100.1852] <16> tar_tfi::processException:
An Exception of type [SocketWriteException] has occured at:
  Module: @(#) $Source: src/ncf/tfi/lib/TransporterRemote.cpp,v $ $Revision: 1.54 $ , Function: TransporterRemote::write[2](), Line: 321
  Module: @(#) $Source: src/ncf/tfi/lib/Packer.cpp,v $ $Revision: 1.89 $ , Function: Packer::getBuffer(), Line: 656
  Module: tar_tfi::getBuffer, Function: H:\71\src\cl\clientpc\util\tar_tfi.cpp, Line: 312
  Local Address: [::]:0
  Remote Address: [::]:0
  OS Error: 10054 (An existing connection was forcibly closed by the remote host.
)
 
Attempted Solution:
I referred to Article “Status code 24 - Socket write failed” URL http://www.symantec.com/docs/TECH150369
 
Made the appropriate changes as advised in the document.
1) Changed client read timeout parameter from 300 to 9600.
2) Changed Communication buffer size from 32 Kb to 128 KB. Go to Host Properties > Clients > Client Properties > Windows Client > ClientSettings > Communication buffer size = 128
3) In case there is an Antivirus running, turn it off for troubleshooting proposes. (All Systems)
4) Disabled autotuning and chimney features, from command prompt run: (On Master Server)
netsh int tcp set global autotuning=disabled
netsh int tcp set global chimney=disabled
5) Created TcpTimedWaitDelay entry(as “RED_DWORD”) in HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters and set the value to 30 seconds. (On Master Server)
6) Rebooted the Master server.
7) Started the Backup again, but same errors.
 
I have also attached my error logs files for bpbkar, bpbrm & bptm.
 
Note: If I remove a client from Policy and run backup it completes successfully. Or, if I use less amount of data then as well backup completes. Moreoever, for some reason I cannot use the same MPX Media for backups again and get error 86 on it, but that is not a priority right now.
18 REPLIES 18

fmk
Level 3

 

Not sure if my issue is because of MSEO, but I don't have that. 

How can I run pkginfo in windows to see the output?

Media write errors (Status 84) after upgrade to NetBackup 7.1 (http://www.symantec.com/docs/TECH176291)

My bptm logs show these errors: 

 

 

07:45:56.757 [3088.2880] <4> write_backup: waiting for client data or brm message
07:45:56.757 [3088.2880] <2> write_data: twin_index: 0 active: 1 dont_process: 0 wrote_backup_hdr: 0 finished_buff: 0 saved_cindex: -1 twin_is_disk 0 delay_brm: 0
07:45:56.757 [3088.2880] <2> write_data: Total Kbytes transferred 214656
07:45:56.757 [3088.2880] <2> write_data: absolute block position prior to writing backup header(s) is 3358, copy 1
07:45:56.757 [3088.2880] <2> write_data: block position check: actual 3358, expected 3359
07:45:56.757 [3088.2880] <2> ConnectionCache::connectAndCache: Acquiring new connection for host s002-vm2.flab.com, query type 1
07:45:56.757 [3996.3500] <2> bptm: INITIATING (VERBOSE = 0): -pid 1916 -den 20 -rt 8 -rn 1 -cj 1 -mpx 10 -reqid -1364089955 -jm -brm -p NetBackup -stunit S003-hcart3-robot-tld-1 -eari 0 -maxfrag 1048576 -masterversion 710000 -mediasvr S003.flab.com -bpbrm_shm_id Global\NetBackup_BPBRM_SHM_Path_5898335_1916_3108 -blks_per_buffer 128 -b S004.flab.com_1364091326 -cl S-1-2-3-C-Drive_Big_MPX-VTL-onS003 -c S004.flab.com -hostname S004.flab.com -bclnt S004.flab.com -bclnthostname S004.flab.com -ru root -rclnt S004.flab.com -rclnthostname S004.flab.com -sl Full -mmfill 3088 2 65536 12 10 5898631 0 1364091326 0 S-1-2-3-C-Drive_Big_MPX-VTL-onS003 S004.flab.com S004.flab.com_1364091326 
07:45:56.757 [3996.3500] <2> vnet_same_host: ../../libvlibs/vnet_addrinfo.c.2915: 0: name2 is empty: 0 0x00000000
07:45:56.757 [3996.3500] <4> bptm: emmserver_name = s002-vm2.flab.com
07:45:56.757 [3996.3500] <4> bptm: emmserver_port = 1556
07:45:56.773 [3088.2880] <2> vnet_pbxConnect: pbxConnectEx Succeeded
07:45:56.773 [3088.2880] <2> logconnections: BPDBM CONNECT FROM 10.0.0.8.49841 TO 10.0.0.4.1556 fd = 1124
07:45:56.773 [3996.3500] <2> Orb::init: initializing ORB EMMlib_Orb with: dbstunitq -ORBSvcConfDirective "-ORBDottedDecimalAddresses 0" -ORBSvcConfDirective "static PBXIOP_Factory '-enable_keepalive'" -ORBSvcConfDirective "static EndpointSelectorFactory ''" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory PBXIOP_Factory'" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory IIOP_Factory'" -ORBDefaultInitRef '' -ORBSvcConfDirective "static PBXIOP_Evaluator_Factory '-orb EMMlib_Orb'" -ORBSvcConfDirective "static Resource_Factory '-ORBConnectionCacheMax 1024 '" -ORBSvcConf nul -ORBSvcConfDirective "static Server_Strategy_Factory '-ORBMaxRecvGIOPPayloadSize 268435456'"(../Orb.cpp:824)
07:45:56.773 [3996.3500] <2> Orb::init: caching EndpointSelectorFactory(../Orb.cpp:839)
07:45:56.835 [3088.2880] <2> db_end: Need to collect reply
07:45:56.835 [3088.2880] <16> write_data: FREEZING media id 4_1910, External event caused rewind during write, all data on media is lost
07:45:56.835 [3088.2880] <2> send_MDS_msg: DEVICE_STATUS 1 203 s003.flab.com 4_1910 4000055 CRSTLINK.FIRESTRMRDRIVE.005 2000088 WRITE_ERROR 0 0
07:45:56.835 [3996.3500] <2> setup_mm_child: [3088] child using 12 data buffers
07:45:56.835 [3996.3500] <2> setup_mm_child: [3088] child buffer size is 65536
07:45:56.835 [3996.3500] <2> setup_mm_child: [3088] buf control for CINDEX 2 is 0x1940250
07:45:56.835 [3996.3500] <2> setup_mm_child: [3088] shared memory address for group 0 is 0x28f0000, handle is 924
07:45:56.835 [3996.3500] <2> setup_mm_child: [3088] shared memory address for CINDEX 2 is 0x2a70000, group 0
07:45:56.851 [3088.2880] <2> log_media_error: successfully wrote to error file - 03/24/13 07:45:56 4_1910 0 WRITE_ERROR CRSTLINK.FIRESTRMRDRIVE.005
07:45:56.851 [3088.2880] <2> send_MDS_msg: MEDIADB 1 203 4_1910 4000055 *NULL* 20 1364091319 1364091326 1364696126 0 214656 1 1 0 1 0 513 1024 0 0 0
07:46:09.955 [3996.3500] <2> io_set_recvbuf: setting receive network buffer to 263168 bytes
07:46:09.955 [3996.3500] <2> vnet_pbxConnect: pbxConnectEx Succeeded
07:46:09.955 [3996.3500] <2> job_connect: SO_KEEPALIVE set on socket 424 for client s002-vm2.flab.com
07:46:09.955 [3996.3500] <2> logconnections: BPJOBD CONNECT FROM 10.0.0.8.49862 TO 10.0.0.4.1556 fd = 424
07:46:09.955 [3996.3500] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
07:46:09.955 [3996.3500] <2> job_connect: Connected to the host s002-vm2.flab.com contype 10 jobid <486> socket <424>
07:46:09.955 [3996.3500] <2> job_connect: Connected on port 49862
07:46:09.955 [3088.2880] <2> check_error_history: just tpunmount: called from bptm line 19204, EXIT_Status = 84
07:46:09.955 [3088.2880] <2> io_close: closing C:\Program Files\Veritas\NetBackup\db\media\tpreq\drive_CRSTLINK.FIRESTRMRDRIVE.005, from bptm.c.16264
07:46:09.955 [3088.2880] <2> drivename_write: Called with mode 1
07:46:09.955 [3088.2880] <2> drivename_unlock: unlocked
07:46:09.955 [3088.2880] <2> drivename_checklock: Called
07:46:09.955 [3088.2880] <2> drivename_lock: lock established
07:46:09.955 [3088.2880] <2> drivename_unlock: unlocked
07:46:09.955 [3088.2880] <2> drivename_close: Called for file CRSTLINK.FIRESTRMRDRIVE.005
07:46:09.955 [3088.2880] <2> tpunmount: NOP: MEDIA_DONE 0 486 0 4_1910 4000055 0 {79E26ED3-7DED-464C-89D8-2FCFB0339BB9}
07:46:09.955 [3088.2880] <2> send_brm_msg: ERROR 84
07:46:09.955 [3088.2880] <2> mpx_terminate_exit: EXITING with status 84

 

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

pkginfo is Solaris command, and is not available on Windows.

Actually, block position which the tape drive provided by VTL is not same with which NetBackup expects.Does it work without multiplexing well? Any defect in Filestreamer?

fmk
Level 3

Backups work well without multiplexing, in fact they work fine with multiplexing if I run backup of C drive for 2 clients in same policy or if I take backup of 3 clients with less data size (for example about 6-8 GB).

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

Moreoever, for some reason I cannot use the same MPX Media for backups again and get error 86 on it, but that is not a priority right now.

Status code 86 is "media positioning failure". Positioning command to the tape drive returns by error. This indicate somethimg wrong with VTL and its virtual media. Any error in Filestreamer's log?

BTW, I found fix for multiplexing backup in NetBackup 7.5, with that backup fails with status code 25.
http://www.symantec.com/docs/DOC5130

This guide does not detail about this defect any more, so I can not determite if this hit to your case.

If you are seriously in difficulties, it is better to log a call with Filestreamer and Symantec.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Emulating SCSI for tape access has never been very successful in VMs.

These are hardware errors that need to be logged with your VTL vendor:

io_ioctl: WRITE FILEMARKS returned (1:write filemarks scsi command failed, status = 0x2, key = 0x5, asc = 0x24, ascq = 0x0)

tapealert_and_release: TapeAlert failed (log sense scsi command failed, status = 0x2, key = 0x5, asc = 0x20, ascq = 0x0)

The following is a sign of SAN firmware or configuration problem (as per Status 84 In Depth Troubleshooting Guide)::

write_data: block position check: actual 2370, expected 2371 

write_data: FREEZING media id 4_1810, External event caused rewind during write, all data on media is lost

 

Best Practice is to provide physical servers as media servers.

 

fmk
Level 3

 

Thank a lot for suggestions Yasuhisa and Marianne...

See you around. smiley

mph999
Level 6
Employee Accredited

 

07:45:56.757 [3088.2880] <2> write_data: block position check: actual 3358, expected 3359
 
Agree with all the above - just to add that that this error is often caused by firmware (usually of tape drive) back could also be the driver.  Other possibilities, are HBA issues.
 
Martin

Mark_Solutions
Level 6
Partner Accredited Certified
Also remember that Windows Media Servers are subject to issues with Windowsitself interferring with the tape drives Make sure that Removable Storage has not been installedon your servers (it doesnt get installed by default on 2008 but if it is stop and disable the service) Next make sure you have the AutoRun key for the tape drivers - see this tech note (says Windows 2003 but applies to all Windows versions) - this can cause the sudden re-wind of tapes when in use: http://support.microsoft.com/kb/842411 Hope this helps

fmk
Level 3

Hi Mark, 

I tried to follow the suggested documentation, but my Media Server's registry already shows that AutoRun is disabled for all Test Unit Ready (TUR) requests for the tape service.

image006.png

image007.png

image008 - Copy.png

 

 

 

 

fmk
Level 3

My Tape Drives' properties show Block size as below, does it matter for multiplexing? 

Maximum Block Size = 131008

Minimum Block Size = 512

image009.png

Mark_Solutions
Level 6
Partner Accredited Certified
Have you had a look through the firestreamer logs? Something is causing a tape rewind and if you don't have the removable storage service running and have the AutoRun key in place we need to find out what else is causing it. Do you have any errors showing in the firestreamer console or logs or the WIndows event logs? Do you have any third party drivers loaded - see the Firestreamer - Help - Troubleshooting - Incompatible Software section. The only other external thing that I have found that could cause it is a SAN thing called "Speed Write": http://www.symantec.com/docs/TECH34341 Hope this helps - let us know what you find

fmk
Level 3

Well, here's the thing. Firestreamer said "Sorry" for NetBackup, and said "Symantec was uncooperative when we approached them in order to make Firestreamer compatible with their backup products." lol Good that we didn't buy it yet. 

 

Any other known good VTL which is fully compatible? 

 

@ Mark

Logging - firestreamer console or logs or the Windows event logs - NONE.

Incompatible Software - There no option to check that within Firestreamer.

Moreover, I just found that Firestreamer console is showing Data compression enabled for the virtual media files. trying to find how to disable it. 

Still trying to figureout how to disable "Speed Write".

Mark_Solutions
Level 6
Partner Accredited Certified
The incompatible software information i had from here: http://www.cristalink.com/fs/hh.aspx?id=ts-soft Touble with anything unsupported is that it is unsupported!! So maybe you need to use something else that is on the supported list otherwise you won't get any help when you have issues - apart from on here of course!

fmk
Level 3

Here's the reply for Firestreamer: 

There is a bug in Firestreamer 4.0 related to tape positioning. It does not affect Microsoft DPM, but may affect Netbackup. The bug is fixed in Firestreamer 4.1, which is not released yet; the release date is unknown at this stage.

So I can't do much about multiplexing on Firestreamer.

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

Microsoft DPM is only software Filestreamer supports.

http://www.cristalink.com/fs/hh.aspx?id=overview

By HCL, there seems no software based VTL supported by NetBackup. NetBackup is backup infrastructure for enterprise, so only proven hardware based VTLs are listed.

http://www.symantec.com/docs/TECH76495

BTW, why do you need to use Filestreamer? For use of removable media?

fmk
Level 3

I am trying to use VTL for training and testing purposes. For the live environment we have IBM Ultrium drives. We are trying to create a small lab for testing before we roll our updates in the live environment, and for training purposes. 

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

So, mhvtl is best for test purpose. But mhvtl is linux software. If you need to test on Windows, consider to export robots and drives via iSCSI using scst-iscsi.

It is not so easy, but worth to try. If you don't need to test encryption, Scientific Linux CERN 5 is nice because SLC5 has scst SRPM and is easy to build. Installation of mhvtl rpm is also easy on SLC5.

fmk
Level 3

Much thanks Yasuhisa... mhVTL rocks. 

I tested Multiplexing of 5 clients with about 25 GB data to begin with, and it all went smooth. And, mhVTL actually emulates complete Robot, so I got MAP as well. smiley

A preconfigured mhVTL - 

https://www-secure.symantec.com/connect/downloads/ubuntu-1104-vm-mhvtl-over-iscsi