03-25-2021 12:56 PM
This is NOT NB specific and is across all versions.
Please be aware - there is an issue with Isilon TCP/IP and NetBackup - watch out for multiple tries on incremental backups very infrequently.
Backup will complete successfully update the listing then fail and rerun successfully and backup nothing.
What you see in NetBackup - incremental isilon backups succeed with multiple attempts - BUT
1st attempts find new data - backs up data - fails
2nd (or 3rd ) attempt completes with NO error - but backs up NO data!
From Isilon support ( after 9 months of Veritas/Isilon fighting over whose fault it is... )
Summary
After review, engineering has confirmed from the Network trace investigation that this issue is observed due to rare DMA slowness with reading from the received buffer (1-2% failure jobs). At the same time, the RST flag is sent by the Isilon node due to fast socket termination by the NDMP daemon without confirmation that all data was received by the DMA client exactly at the same slowness time on the DMA side.
To improve this situation from the Isilon NDMP side, engineering is looking at the following NDMP changes:
The Dev team has already started this improvement in the Isilon NDMP code. Also, Engineering is working on a repro of this problem to have a lab where the code improvement can be validated. The current estimation is that the fix will take up to 4 weeks, assuming no issues. ( It has been 5 weeks now and no updated ETA... )
03-25-2021 12:57 PM
I set up a filter to select my isilon backups and attempts > 1 to search for these.
You can see the variance in the bytes attempted when you look at the job details for attempt 1 vs 2
03-29-2021 04:47 AM
This just in - from isilon support:
The fix is ready and approved for May GA-RUP – it should be PSP-1092. We are aiming to get this RUP by the end of May.
If you have an isilon, you should ensure you are updating!