Forum Discussion

ericmagallanes's avatar
12 years ago

MSDP backup getting error status 84 - image open failed:2060001 one or more invalid arguments

Hi All,

I need some help regarding Netbackup MSDP, I ma getting status 84 and upon further going though the logs:

Spoold.log:

 

ERR [4627]: 25086: A bptm request for agent Store from PRUXBK11:56454 for dsid 2 was denied for the following reason: Store operations are currently not allowed.
 
I tried to issue:
root@PRUXBK11:/usr/openv/pdde/pdcr/bin #./crcontrol --getmode
Mode : GET=Yes PUT=No DEREF=No SYSTEM=Yes STORAGED=No REROUTE=No COMPACTD=Yes RECOVERCRDB=No
root@PRUXBK11:/usr/openv/pdde/pdcr/bin #
The last thing I did was to restart Netbackup services and after that then MSDP backup is failing with this status.

 

  • NetBackup is the right forum for MSDP queries - I have moved the post.

    Please show us all text in Activity Monitor details tab.

    Also let us know exact NBU version and patch level - there have been many MSDP fixes over the last 2 years...

    Please post bptm log as File attachment.

    It seems that you have found TN http://www.symantec.com/docs/TECH183707 , right?

    Exctract:

     

    3. If "put=No," examine the <storage_path>/log/spoold/spoold.log file for the following message:

    Storage Cache Manager: load completed


    The message means that spoold finished loading the fingerprints. However, the new state probably has not been pushed to all processes yet.

    4. If the spoold.log has the message, wait 5 minutes then run "crcontrol --getmode" again.

    5. If "put=Yes," cache loading is complete and normal operations will resume.

       If "put=No," contact your Symantec support representative.

  • Hi Marianne,

    Apologies for that. Activity monitor tab:

     

    01/03/2013 13:51:12 - Info nbjm (pid=6946980) starting backup job (jobid=9455) for client PRUXBK11_bk, policy DEDUPE_test, schedule full_daily
    01/03/2013 13:51:12 - Info nbjm (pid=6946980) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=9455, request id:{9428DBB0-5569-11E2-B10A-439ED7E30000})
    01/03/2013 13:51:12 - requesting resource AY_STU
    01/03/2013 13:51:12 - requesting resource PRUXBK11_bk.NBU_CLIENT.MAXJOBS.PRUXBK11_bk
    01/03/2013 13:51:12 - requesting resource PRUXBK11_bk.NBU_POLICY.MAXJOBS.DEDUPE_test
    01/03/2013 13:51:12 - Info bpbrm (pid=15073484) PRUXBK11_bk is the host to backup data from
    01/03/2013 13:51:12 - Info bpbrm (pid=15073484) reading file list from client
    01/03/2013 13:51:12 - granted resource  PRUXBK11_bk.NBU_CLIENT.MAXJOBS.PRUXBK11_bk
    01/03/2013 13:51:12 - granted resource  PRUXBK11_bk.NBU_POLICY.MAXJOBS.DEDUPE_test
    01/03/2013 13:51:12 - granted resource  MediaID=@aaaaI;DiskVolume=PureDiskVolume;DiskPool=AY_POOL;Path=PureDiskVolume;StorageServer=PRUXBK11_bk;MediaServer=PRUXBK11_bk
    01/03/2013 13:51:12 - granted resource  AY_STU
    01/03/2013 13:51:12 - estimated 0 kbytes needed
    01/03/2013 13:51:12 - Info nbjm (pid=6946980) started backup (backupid=PRUXBK11_bk_1357192272) job for client PRUXBK11_bk, policy DEDUPE_test, schedule full_daily on storage unit AY_STU
    01/03/2013 13:51:12 - started process bpbrm (pid=15073484)
    01/03/2013 13:51:12 - connecting
    01/03/2013 13:51:13 - Info bpbrm (pid=15073484) starting bpbkar on client
    01/03/2013 13:51:13 - Info bpbkar (pid=18088122) Backup started
    01/03/2013 13:51:13 - Info bpbrm (pid=15073484) bptm pid: 14352438
    01/03/2013 13:51:13 - connected; connect time: 0:00:00
    01/03/2013 13:51:14 - Info bptm (pid=14352438) start
    01/03/2013 13:51:14 - Info bptm (pid=14352438) using 262144 data buffer size
    01/03/2013 13:51:14 - Info bptm (pid=14352438) using 30 data buffers
    01/03/2013 13:51:23 - Info bptm (pid=14352438) start backup
    01/03/2013 13:51:29 - Critical bptm (pid=14352438) image open failed: error 2060001: one or more invalid arguments
    01/03/2013 13:51:30 - Info bptm (pid=14352438) EXITING with status 84 <----------
    01/03/2013 13:51:30 - Error bpbrm (pid=15073484) from client PRUXBK11_bk: ERR - bpbkar exiting because backup is aborting
    01/03/2013 13:51:31 - Info bpbkar (pid=18088122) done. status: 84: media write error
    01/03/2013 13:51:31 - end writing
    media write error  (84)
     
    NBU 7.5.0.3 running on AIX 6.
     
    I have found the TN and followed exactly that,  I found these logs below:
     
    eason: ERROR:  could not extend relation 1663/16387/28280: No space left on device
    HINT:  Check free disk space.
    CONTEXT:  COPY objects2_1, line 5945286
    December 30 12:35:46 ERR [3170]: 4: database query failed.
    Query      : CREATE UNIQUE INDEX idx_objects2_1_1356842146506364_2513780535 ON objects2_1(key);
    Called by  : pqObjects2ChildCopyIndex
    Reason     : ERROR:  current transaction is aborted, commands ignored until end of transaction block
    December 30 12:42:12 ERR [3427]: 4: pqObjects2ChildCopyIndex: Copy command failed.
    Reason: ERROR:  could not extend relation 1663/16387/28286: No space left on device
    HINT:  Check free disk space.
    CONTEXT:  COPY objects2_2, line 5945434
    December 30 12:42:12 ERR [3427]: 4: database query failed.
    Query      : CREATE UNIQUE INDEX idx_objects2_2_1356842532906421_948636706 ON objects2_2(key);
    Called by  : pqObjects2ChildCopyIndex
    Reason     : ERROR:  current transaction is aborted, commands ignored until end of transaction block
    December 30 12:43:50 ERR [3684]: 4: pqObjects2ChildCopyIndex: Copy command failed.
    Reason: ERROR:  could not extend relation 1663/16387/28292: No space left on device
    HINT:  Check free disk space.
    CONTEXT:  COPY objects2_3, line 5945285
    December 30 12:43:50 ERR [3684]: 4: database query failed.
    Query      : CREATE UNIQUE INDEX idx_objects2_3_1356842630817015_2535966694 ON objects2_3(key);
    Called by  : pqObjects2ChildCopyIndex
    Reason     : ERROR:  current transaction is aborted, commands ignored until end of transaction block
     
     
    My storage details:
    ************ Data Store statistics ************
    Data storage      Raw    Size   Used   Avail  Use%
                       3.5T   3.3T 935.9G   2.4T  28%
     
    Number of containers             : 4159
    Average container size           : 234448275 bytes (223.59MB)
    Space allocated for containers   : 975070378437 bytes (908.11GB)
    Space used within containers     : 965789085432 bytes (899.46GB)
    Space available within containers: 9281293005 bytes (8.64GB)
    Space needs compaction           : 44191463 bytes (42.14MB)
    Reserved space                   : 170125094912 bytes (158.44GB)
    Reserved space percentage        : 4.4%
    Records marked for compaction    : 1711
    Active records                   : 26063814
    Total records                    : 26065525
     
    Use "--dsstat 1" to get more accurate statistics
     
     
    i am wondering why I am running out of space if I still have 2.4 T available?
     
     
     
  • You currently have PUT and STORAGED set as NO

    It looks like storaged or queue processing has crashed and after 5 retries it locks itself down so will not allow anything to work again.

    If you take a look in the storaged.log you will probably find a line telling you that with a nice little comment saying "contact support immediately" - worth doing anyway to try and trace what caused your issues but I guess for now you want to get it up and running so try the following:

    Check spoold is actually running - if it is i would re-start it anyway (if in doubt reboot the server)

    Next you need to run queue processing manually /usr/openv/pdde/pdcr/bin/crcontrol -processqueue

    This will re-activate everything for you and then watch the --getmode until the PUT and STORAGED both change to YES (may well take 20 minutes or so as it needs to do a run through its logs to bring itself back on line)

    Hope this gets you up and running - if not lets see some of the storaged.log to see what else is going on