Forum Discussion

ESCC_Servers's avatar
16 years ago
Solved

PDDO Duplications using an SLP failing with 'Error 84: media write error'

Hi,

Our PDDO duplications have stopped working since all of the related servers were rebooted over the weekend and are now repoting an error 84.

Setup is:

VCB Snapshot backups taken from VMWare ESX 3.5 to staging area on NBU Media Server (6.5.5)
Snapshots backed up to primary Puredisk server
Backup duplicated both to Tape and secondary Puredisk server using a Storage Lifecycle Policy

The snapshots, backup and duplication to tape all work fine but the duplication to puredisk fails throughout the night. Sometimes they re-run and pass but there are always a few still trying to duplication in the morning which will never complete. Job log below:

03/11/2010 10:17:08 - begin Duplicate
03/11/2010 10:17:13 - Info Duplicate (pid=7260) Initiating optimized duplication from @aaaak to @aaaam
03/11/2010 10:17:07 - requesting resource LCM_ST_DR_PureDisk_Pool
03/11/2010 10:17:07 - granted resource  LCM_ST_DR_PureDisk_Pool
03/11/2010 10:17:08 - started process RUNCMD (pid=7260)
03/11/2010 10:17:08 - ended process 0 (pid=7260)
03/11/2010 10:17:08 - requesting resource ST_DR_PureDisk_Pool
03/11/2010 10:17:08 - granted resource  MediaID=@aaaam;DiskVolume=PureDiskVolume;DiskPool=DR_PureDisk_Pool;Path=PureDiskVolume;StorageServer=espdisk02s;MediaServer=esnbupd01s
03/11/2010 10:17:08 - granted resource  ST_DR_PureDisk_Pool
03/11/2010 10:17:11 - requesting resource @aaaak
03/11/2010 10:17:11 - granted resource  MediaID=@aaaak;DiskVolume=PureDiskVolume;DiskPool=CH_PureDisk_Pool;Path=PureDiskVolume;StorageServer=espdisk01s;MediaServer=esnbupd01s
03/11/2010 10:17:14 - started process bpdm (pid=5588)
03/11/2010 10:17:29 - begin writing
03/11/2010 10:28:31 - Critical bpdm (pid=5588) sts_copy_extent failed: error 2060013 no more entries
03/11/2010 10:28:32 - Critical bpdm (pid=5588) image copy failed: error 2060013: no more entries
03/11/2010 10:28:32 - Error bpdm (pid=5588) cannot copy image from disk, bytesCopied = 18446744073709551615
03/11/2010 10:28:33 - Critical bpdm (pid=5588) sts_close_handle failed: 2060022 software error
03/11/2010 10:28:50 - Error bpduplicate (pid=7260) host esnbupd01s backup id essccm02v_1268258206 optimized duplication failed, media write error (84).
03/11/2010 10:28:50 - Error bpduplicate (pid=7260) Duplicate of backupid essccm02v_1268258206 failed, media write error (84).
03/11/2010 10:28:50 - Error bpduplicate (pid=7260) Status = no images were successfully processed.
03/11/2010 10:28:50 - end Duplicate; elapsed time 0:11:42
03/11/2010 10:28:51 - Info esnbupd01s (pid=5588) StorageServer=PureDisk:espdisk01s; Report=PDDO Stats for (espdisk01s): scanned: 6 KB, stream rate: 0.00 MB/sec, CR sent: 2097151 KB, dedup: 0.0%, cache hits: 0 (0.0%)
media write error (84)

Any help/ideas would be awesome. I've been trawling through the bpdm log and done a lot of Googling but found nothing.

 

Cheers

Rob

  • My problem was caused by deleting and then recreating a storage pool on 2 of our deduplication media servers.

    Apparently the legacy database entries for forwarding the image copies was pointing to the wrong volumes.

    With the help of a helpful tech, we deleted the erroneous indexes.

    Call Tech Support for a solution as the following I am sure could be dangerous

    This is not complete, but the indexes are located here:

    /usr/openv/pdde/pdcr/bin/spadb -d /dedupe/databases -c "select id,name from dataselection"

            id      name
            1       System DS for STP
            2       PDDO
    Corresponds to the following:

    ls /dedupe/databases/spa/database/dataselection/
    1  2
     

    /usr/openv/pdde/pdcr/bin/spadb -d /dedupe/databases -c "select * from forward"

    Corresponds to the following:

    ls /dedupe/databases/spa/database/forward/

    essentially you can remove any additional files from the
    /dedupe/databases/spa/database/dataselection/ and
    /dedupe/databases/spa/database/forward/

    EXCEPT for the !!!

            id      name
            1       System DS for STP
            2       PDDO
    which are the files 1 & 2
     

    This has to be done on all storage servers

    Once the services are started again, the indexes are re-created.

     

  • Hi Rob,

    I came across a tech note with this condition.  I believe it says that you have to run MB garbage collection before the optimized duplication happens.

    Tony

  • I think it's not related to article about 6.5.5.
    I had similar issue when I wanted to duplicate data from Media Server Deduplication Pool to Puredisk Pool (NB7). The same error code. Duplication between two MSDPs worked fine.
    I opened a case on support but I didn't have to work on that because it was only test environment etc..

  • 9/16/2010 11:51:54 AM - Critical bpdm(pid=9404) sts_copy_extent failed: error 2060013 no more entries     
    9/16/2010 11:51:54 AM - Critical bpdm(pid=9404) image copy failed: error 2060013: no more entries    
    9/16/2010 11:51:54 AM - Error bpdm(pid=9404) cannot copy image from disk, bytesCopied = 18446744073709551615    
    9/16/2010 11:51:55 AM - Critical bpdm(pid=9404) sts_close_handle failed: 2060022 software error       
    9/16/2010 11:51:55 AM - Info mediasrv (pid=9404) StorageServer=PureDisk:chvevbkp01; Report=PDDO Stats for (mediasrv): scanned: 0 KB, stream rate: 0.00 MB/sec, CR sent: 0 KB, dedup: 0.0%, cache hits: 0 (0.0%)
    9/16/2010 11:52:03 AM - Error bpduplicate(pid=10004) host mediasrv backup id  optimized duplication failed, media write error (84).
    9/16/2010 11:52:04 AM - Error bpduplicate(pid=10004) Duplicate of backupid  failed, media write error (84). 

     

    Hello,

    We are experiencing exactly the same issue for all our duplications.
    If you have any info or fix it would be great!

    Regards,

    Ludo

  • Hi There: I am paraphrasing Robs problem, which seems to be the same as ours...

    Setup is:

    VCB Snapshot backups taken from VMWare ESX 4 and 4.1 to staging area on NBU Media Server (7.0.1) "VCB Backup Host" Snapshots backed up to Puredisk volume locally attached DAS volume and then duplicated both to Tape and secondary Puredisk server using a Storage Lifecycle Policy.  (*We have the exact same setup for Solaris Oracle backups that are duplicated to a Solaris Media Server with a Deduplication Volume (Storage Server), which works without a hitch.)

    The snapshots, backup and duplication to tape all work fine but the duplication to puredisk fails. They never complete. Job log below:

    Nov 3, 2010 7:47:43 AM - requesting resource LCM_bcfnbdr-disk-Deduplication
    Nov 3, 2010 7:47:50 AM - Info nbrb (pid=1033) Limit has been reached for the logical resource LCM_bcfnbdr-disk-Deduplication
    Nov 3, 2010 8:37:52 AM - begin Duplicate
    Nov 3, 2010 8:37:50 AM - granted resource  LCM_bcfnbdr-disk-Deduplication
    Nov 3, 2010 8:37:50 AM - started process RUNCMD (pid=27100)
    Nov 3, 2010 8:37:51 AM - ended process 0 (pid=27100)
    Nov 3, 2010 8:37:54 AM - requesting resource bcfnbdr-disk-Deduplication
    Nov 3, 2010 8:37:54 AM - reserving resource @aaaa_
    Nov 3, 2010 8:38:02 AM - Info Duplicate (pid=27100) Initiating optimized duplication from @aaaa_ to @aaaaZ
    Nov 3, 2010 8:38:02 AM - started process bpdm (pid=6492)
    Nov 3, 2010 8:38:01 AM - resource @aaaa_ reserved
    Nov 3, 2010 8:38:01 AM - granted resource  MediaID=@aaaaZ;DiskVolume=PureDiskVolume;DiskPool=bcfnbdr-deduplication-vol1;Path=PureDiskVolume;StorageServer=bcfnbdr;MediaServer=bcfnbdr
    Nov 3, 2010 8:38:01 AM - granted resource  bcfnbdr-disk-Deduplication
    Nov 3, 2010 8:38:01 AM - requesting resource @aaaa_
    Nov 3, 2010 8:38:02 AM - granted resource  MediaID=@aaaa_;DiskVolume=PureDiskVolume;DiskPool=bcfvcb-254-deduplication-vol1;Path=PureDiskVolume;StorageServer=bcfvcb-254;MediaServer=bcfnbdr
    Nov 3, 2010 1:19:10 PM - Critical bpdm (pid=6492) sts_copy_extent failed: error 2060018 file not found
    Nov 3, 2010 1:19:10 PM - Critical bpdm (pid=6492) image copy failed: error 2060018: file not found
    Nov 3, 2010 1:19:10 PM - Error bpdm (pid=6492) cannot copy image from disk, bytesCopied = 18446744073709551615
    Nov 3, 2010 1:19:12 PM - Critical bpdm (pid=6492) image copy failed: target image size = 8192, source image size = 1024
    Nov 3, 2010 1:19:12 PM - Critical bpdm (pid=6492) Invalid image copy: error 2060001
    Nov 3, 2010 1:19:12 PM - Critical bpdm (pid=6492) image copy failed: totalbytesCopied = 0, image size = 1024

    Spoke with support who said that we are trying to duplicate images that have expired... This is not the case case the storage pool retention is 10days and the backups are run daily. Opened another call today, so will post back the findings if I can get a resolution.

    Cheers

  • "Spoke with support who said that we are trying to duplicate images that have expired..."

    HUH?!! Does this person not know how SLP's work? That source image will not expire unless successfully duplicated?

  • Ok Maianne, Thanks for the info.

    I was not entirely happy with the explaination. =0

  • Extract from Admin Guide:

    Ensuring successful copies using lifecycles
    The process to create copies as part of a lifecycle is different from the process to create copies as set up in a policy’s configuration. The policy’s Configure Multiple Copies dialog box includes the option to Fail all copies. That option means that if one copy fails, the remaining copies can be set to either continue or to fail.
    However, in a storage lifecycle policy, all copies must be completed. A lifecycle initially tries three times to create a copy. If no copy is created, NetBackup continues to try, but less frequently.
    The successful completion of copies is important because a lifecycle does not allow a copy to be expired before all copies are completed to each destination in the lifecycle. Expiration is necessary to free up space on the storage unit for new backups. NetBackup changes the retention period of an image to Infinite until all copies are created. After all copies are complete, the retention returns to the level as set in the policy that writes to the storage destination.

  • My problem was caused by deleting and then recreating a storage pool on 2 of our deduplication media servers.

    Apparently the legacy database entries for forwarding the image copies was pointing to the wrong volumes.

    With the help of a helpful tech, we deleted the erroneous indexes.

    Call Tech Support for a solution as the following I am sure could be dangerous

    This is not complete, but the indexes are located here:

    /usr/openv/pdde/pdcr/bin/spadb -d /dedupe/databases -c "select id,name from dataselection"

            id      name
            1       System DS for STP
            2       PDDO
    Corresponds to the following:

    ls /dedupe/databases/spa/database/dataselection/
    1  2
     

    /usr/openv/pdde/pdcr/bin/spadb -d /dedupe/databases -c "select * from forward"

    Corresponds to the following:

    ls /dedupe/databases/spa/database/forward/

    essentially you can remove any additional files from the
    /dedupe/databases/spa/database/dataselection/ and
    /dedupe/databases/spa/database/forward/

    EXCEPT for the !!!

            id      name
            1       System DS for STP
            2       PDDO
    which are the files 1 & 2
     

    This has to be done on all storage servers

    Once the services are started again, the indexes are re-created.