Forum Discussion

Bessam's avatar
Bessam
Level 4
2 months ago
Solved

Corrupted images

Hello,

This is the third time I have corrupted images for the same server. I had a corrupted X server image a few weeks ago, I expired the corrupted image and reran the backup which completed successfully and was duplicated successfully, I had a second the same thing again after a few weeks, and there, again the same corruption. An msdp check has already been done but it only brought up the corrupted image (which I already know).

I need to know the cause of the corruption of the image of this server knowing that the other backups are successful.

There is a possibility of doing a metadata analysis but the media will remain unavailable for the duration of the analysis since it takes 1 hour for each Tera of data, and I have 120TB so 120 hours of unavailability.

 This is my client, so I do not have access to the client server, however, I have control over the master and the media.

Note that this is a VMware type backup which has a datastore and a shared vcenter, so the other backups are successful

Master Linux RHEL 8.9, and Netbackup 10.0.0.1 

Media Linux RHEL 8.9, and Netbackup 10.0.0.1 

Client server Windows 2016, and Netbackup 10.0.0.1

  • Thank's to all for you collaboration, I'm very limited in the actions and investigation because I'm not the admin of the client server, it's my customer who have the issue and he have also limited actions.

    We proposed simply to expire corrupted images and try to do another backup.

  • Thank's to all for you collaboration, I'm very limited in the actions and investigation because I'm not the admin of the client server, it's my customer who have the issue and he have also limited actions.

    We proposed simply to expire corrupted images and try to do another backup.

  • The most probable cause is a hardware issue with the underlying disk storage and/or storage subsystem. Meaning: disk issues, LUN issues, connectivity issues between server and disk hardware, disk controller issues, HBA issues.

  • My suggestions, based on my experience, are:

    1. Open a case with Veritas.
    2. Upgrade to a newer version of NetBackup if posible.
    3. And, for me highly recommended, use NBD or HotAdd instead of SAN transport mode.
    4. Check for hardware errors between the media server and storage.
    5. If you are using antivirus, exclude the NetBackup and MSDP folders. Even though it is a Linux server, you might still have antivirus installed for Linux.
  • A support case is the best way to handle such serious problems.

    I have some questions:

    1. How did you realize there was corruption in your backup?
    2. What method are you using to back up the system (SAN, NBD, or HotAdd)?
    3. Are there any other backups of the same system that are OK/restorable?
    4. You mentioned that you expired the corrupted backup. Before trying the new backup, did you run an MSDP cleanup to delete the corrupted blocks?

     

    • Bessam's avatar
      Bessam
      Level 4

      hello StefanosM 

      Here is my answers:

      1. There is a message that clearly indicates image corruption "corrupted xxxxx".
      2. Transport mode is SAN
      3. Yes, all other image are successfull and restorable.
      4. Yes, I've applied a cleanup
  • I would open a ticket with Veritas/Cohesity support.  They have the knowledge to handle a possible MSDP corruption.

    Just experingen the backup images themself does not correct a possible bad fingerprint in the MSDP database, until all referring backup images has been expired.

     

    • Bessam's avatar
      Bessam
      Level 4

      Hello Nicolai 

      I've opened a case to Veritas when I've faced this situation in second time, and all what they suggest to me it's a msdpcheck where they sent to me corrupted backupid and they advised me to expire corrupted image, that what I already know and did before, but this is the third time and I want a deep analyze.

      Also, I want some other point of view from other persons like you.

      But, unfortunatley I need to create another case to Veritas, and this time we will pûsh them to exploit other assumption