01-29-2012 11:02 AM
Hi,
since i've upgraded to 7.1.0.3 all my AIX 5.3 / NBU 6.5.6 SAN Clients failed with a status 40 - Network Connection Broken.
The backup starts well and write for few hours, but fails every time at the same level (536 855 296 kb written exactly !).
After that SAN Client is not active, and i have to rescan devices.
Smallers SAN Clients are fine ! And everything were fine with 7.1.0.2
Here is the nbftclnt logs, we can see some errors :
01/28/12 08:53:58.753 [ErrorDecoder] Thread 2336 Pipe ID 2 Cdb[0] 0x3b ASC = VRTS_ASC; ASCQ is not valid 0x7 at ftseq 0
01/28/12 08:53:58.757 [BackupThread] 2336: Pipe ID 2: DeviceIoControl: FATAL_ERROR; ftseq 2097152 length 0
01/28/12 08:53:58.757 [BackupThread] 2336: Pipe ID 2 stopped at ftseq 2097152 with 32 full buffers offlining device, Device name /dev/rmt9
01/28/12 08:53:58.757 [DiscoveryTaskThread] (1800) DTT_Offline
01/28/12 08:53:59.120 [~FATClientPipe] m_eDirection backup cnt 32 Pipe ID 2
01/28/12 11:13:09.087 [ErrorDecoder] Thread 2342 Pipe ID 1 Cdb[0] 0x3b ASC = VRTS_ASC; ASCQ is not valid 0x7 at ftseq 0
01/28/12 11:13:09.087 [BackupThread] 2342: Pipe ID 1: DeviceIoControl: FATAL_ERROR; ftseq 2097152 length 0
01/28/12 11:13:09.087 [BackupThread] 2342: Pipe ID 1 stopped at ftseq 2097152 with 32 full buffers offlining device, Device name /dev/rmt3
01/28/12 11:13:09.087 [DiscoveryTaskThread] (1800) DTT_Offline
01/28/12 11:13:09.112 [~FATClientPipe] m_eDirection backup cnt 32 Pipe ID 1
01/29/12 04:17:22.145 [ErrorDecoder] Thread 2365 Pipe ID 1 Cdb[0] 0x3b ASC = VRTS_ASC; ASCQ is not valid 0x7 at ftseq 0
01/29/12 04:17:22.188 [BackupThread] 2365: Pipe ID 1: DeviceIoControl: FATAL_ERROR; ftseq 2097152 length 0
01/29/12 04:17:22.220 [BackupThread] 2365: Pipe ID 1 stopped at ftseq 2097152 with 32 full buffers offlining device, Device name /dev/rmt8
01/29/12 04:17:22.279 [DiscoveryTaskThread] (1800) DTT_Offline
01/29/12 04:17:22.615 [~FATClientPipe] m_eDirection backup cnt 32 Pipe ID 1
01/29/12 06:38:36.768 [ErrorDecoder] Thread 2370 Pipe ID 1 Cdb[0] 0x3b ASC = VRTS_ASC; ASCQ is not valid 0x7 at ftseq 0
01/29/12 06:38:36.780 [BackupThread] 2370: Pipe ID 1: DeviceIoControl: FATAL_ERROR; ftseq 2097152 length 0
01/29/12 06:38:36.789 [BackupThread] 2370: Pipe ID 1 stopped at ftseq 2097152 with 32 full buffers offlining device, Device name /dev/rmt4
01/29/12 06:38:36.806 [DiscoveryTaskThread] (1800) DTT_Offline
01/29/12 06:38:36.932 [~FATClientPipe] m_eDirection backup cnt 32 Pipe ID 1
Anyone have the same issue ?
I'll open a case tomorrow.
Thanks
Solved! Go to Solution.
01-30-2012 07:52 AM
Looks like Symantec just released a tech note that references the issue you're having.
NetBackup 7.1.0.3 FT media servers getting Network Connection Broken (40) every 512Gbytes
http://www.symantec.com/business/support/index?page=content&id=TECH180175&key=15143&actp=LIST
Article: TECH180175
01-30-2012 03:29 AM
Did you patch using Live Update?
If so please note the section in the 7.1.0.3 release notes (just in case this has an effect):
When LiveUpdate is used to apply the patch to an FT (Fibre Transport) Media server, you must run the
/usr/openv/netbackup/bin/admincmd/nbftsrvr_config script manually, on the FT Media server, after the patch installation is complete.
The patch includes the newer device drivers which are not installed as part of the LiveUpdate process. The nbftsrvr_config script installs the correct drivers.
I can't see anything else that has changed that could cause such an error, but it may be a coincidence as usually backups that fail at exactly the same point generaly do so as they have hit a corrupt file so worth checking for that as well.
01-30-2012 05:38 AM
Maybe you have some stuck pipes at nbrb level try to release them and check the logs at the FT Media Server to see how is moving data:
01-30-2012 07:52 AM
Looks like Symantec just released a tech note that references the issue you're having.
NetBackup 7.1.0.3 FT media servers getting Network Connection Broken (40) every 512Gbytes
http://www.symantec.com/business/support/index?page=content&id=TECH180175&key=15143&actp=LIST
Article: TECH180175
01-30-2012 08:11 AM
Good spot sfroeschke - very hot off the press - not even in the Late Breaking news yet!
Looks like it just doesnt like mixed environments then (7.1.0.2 and 7.1.0.3)
01-30-2012 09:33 AM
Thanks you very much sfroeschke !! This is fresh news!
This afternoon i've downgraded my FT media servers to 7.1.0.2 and the problem disapeared.
I'll wait for the 7.1.0.3 fix.
Thanks all
01-31-2012 07:35 AM
Symantec sent me the fix this morning and it works great :)