cancel
Showing results for 
Search instead for 
Did you mean: 

ALERT - ISILON incrementals - succeed but backup no data! watch for attempt count!

Genericus
Moderator
Moderator
   VIP   

This is NOT NB specific and is across all versions.

Please be aware - there is an issue with Isilon TCP/IP and NetBackup - watch out for multiple tries on incremental backups very infrequently.

Backup will complete successfully update the listing then fail and rerun successfully and backup nothing.

What you see in NetBackup - incremental isilon backups succeed with multiple attempts - BUT
1st attempts find new data - backs up data - fails
2nd (or 3rd ) attempt completes with NO error - but backs up NO data!

From Isilon support ( after 9 months of Veritas/Isilon fighting over whose fault it is... )

Summary

After review, engineering has confirmed from the Network trace investigation that this issue is observed due to rare DMA slowness with reading from the received buffer (1-2% failure jobs). At the same time, the RST flag is sent by the Isilon node due to fast socket termination by the NDMP daemon without confirmation that all data was received by the DMA client exactly at the same slowness time on the DMA side.

To improve this situation from the Isilon NDMP side, engineering is looking at the following NDMP changes:

  1. Сall shutdown() before close()

The Dev team has already started this improvement in the Isilon NDMP code. Also, Engineering is working on a repro of this problem to have a lab where the code improvement can be validated. The current estimation is that the fix will take up to 4 weeks, assuming no issues. ( It has been 5 weeks now and no updated ETA... )

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS
2 REPLIES 2

Genericus
Moderator
Moderator
   VIP   

I set up a filter to select my isilon backups and attempts > 1 to search for these.

You can see the variance in the bytes attempted when you look at the job details for attempt 1 vs 2

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Genericus
Moderator
Moderator
   VIP   

This just in - from isilon support:

The fix is ready and approved for May GA-RUP – it should be PSP-1092. We are aiming to get this RUP by the end of May.

If you have an isilon, you should ensure you are updating!

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS