cancel
Showing results for 
Search instead for 
Did you mean: 

MSDP backup getting error status 84 - image open failed:2060001 one or more invalid arguments

ericmagallanes
Level 3

Hi All,

I need some help regarding Netbackup MSDP, I ma getting status 84 and upon further going though the logs:

Spoold.log:

 

ERR [4627]: 25086: A bptm request for agent Store from PRUXBK11:56454 for dsid 2 was denied for the following reason: Store operations are currently not allowed.
 
I tried to issue:
root@PRUXBK11:/usr/openv/pdde/pdcr/bin #./crcontrol --getmode
Mode : GET=Yes PUT=No DEREF=No SYSTEM=Yes STORAGED=No REROUTE=No COMPACTD=Yes RECOVERCRDB=No
root@PRUXBK11:/usr/openv/pdde/pdcr/bin #
The last thing I did was to restart Netbackup services and after that then MSDP backup is failing with this status.

 

4 REPLIES 4

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

NetBackup is the right forum for MSDP queries - I have moved the post.

Please show us all text in Activity Monitor details tab.

Also let us know exact NBU version and patch level - there have been many MSDP fixes over the last 2 years...

Please post bptm log as File attachment.

It seems that you have found TN http://www.symantec.com/docs/TECH183707 , right?

Exctract:

 

3. If "put=No," examine the <storage_path>/log/spoold/spoold.log file for the following message:

Storage Cache Manager: load completed


The message means that spoold finished loading the fingerprints. However, the new state probably has not been pushed to all processes yet.

4. If the spoold.log has the message, wait 5 minutes then run "crcontrol --getmode" again.

5. If "put=Yes," cache loading is complete and normal operations will resume.

   If "put=No," contact your Symantec support representative.

ericmagallanes
Level 3

Hi Marianne,

Apologies for that. Activity monitor tab:

 

01/03/2013 13:51:12 - Info nbjm (pid=6946980) starting backup job (jobid=9455) for client PRUXBK11_bk, policy DEDUPE_test, schedule full_daily
01/03/2013 13:51:12 - Info nbjm (pid=6946980) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=9455, request id:{9428DBB0-5569-11E2-B10A-439ED7E30000})
01/03/2013 13:51:12 - requesting resource AY_STU
01/03/2013 13:51:12 - requesting resource PRUXBK11_bk.NBU_CLIENT.MAXJOBS.PRUXBK11_bk
01/03/2013 13:51:12 - requesting resource PRUXBK11_bk.NBU_POLICY.MAXJOBS.DEDUPE_test
01/03/2013 13:51:12 - Info bpbrm (pid=15073484) PRUXBK11_bk is the host to backup data from
01/03/2013 13:51:12 - Info bpbrm (pid=15073484) reading file list from client
01/03/2013 13:51:12 - granted resource  PRUXBK11_bk.NBU_CLIENT.MAXJOBS.PRUXBK11_bk
01/03/2013 13:51:12 - granted resource  PRUXBK11_bk.NBU_POLICY.MAXJOBS.DEDUPE_test
01/03/2013 13:51:12 - granted resource  MediaID=@aaaaI;DiskVolume=PureDiskVolume;DiskPool=AY_POOL;Path=PureDiskVolume;StorageServer=PRUXBK11_bk;MediaServer=PRUXBK11_bk
01/03/2013 13:51:12 - granted resource  AY_STU
01/03/2013 13:51:12 - estimated 0 kbytes needed
01/03/2013 13:51:12 - Info nbjm (pid=6946980) started backup (backupid=PRUXBK11_bk_1357192272) job for client PRUXBK11_bk, policy DEDUPE_test, schedule full_daily on storage unit AY_STU
01/03/2013 13:51:12 - started process bpbrm (pid=15073484)
01/03/2013 13:51:12 - connecting
01/03/2013 13:51:13 - Info bpbrm (pid=15073484) starting bpbkar on client
01/03/2013 13:51:13 - Info bpbkar (pid=18088122) Backup started
01/03/2013 13:51:13 - Info bpbrm (pid=15073484) bptm pid: 14352438
01/03/2013 13:51:13 - connected; connect time: 0:00:00
01/03/2013 13:51:14 - Info bptm (pid=14352438) start
01/03/2013 13:51:14 - Info bptm (pid=14352438) using 262144 data buffer size
01/03/2013 13:51:14 - Info bptm (pid=14352438) using 30 data buffers
01/03/2013 13:51:23 - Info bptm (pid=14352438) start backup
01/03/2013 13:51:29 - Critical bptm (pid=14352438) image open failed: error 2060001: one or more invalid arguments
01/03/2013 13:51:30 - Info bptm (pid=14352438) EXITING with status 84 <----------
01/03/2013 13:51:30 - Error bpbrm (pid=15073484) from client PRUXBK11_bk: ERR - bpbkar exiting because backup is aborting
01/03/2013 13:51:31 - Info bpbkar (pid=18088122) done. status: 84: media write error
01/03/2013 13:51:31 - end writing
media write error  (84)
 
NBU 7.5.0.3 running on AIX 6.
 
I have found the TN and followed exactly that,  I found these logs below:
 
eason: ERROR:  could not extend relation 1663/16387/28280: No space left on device
HINT:  Check free disk space.
CONTEXT:  COPY objects2_1, line 5945286
December 30 12:35:46 ERR [3170]: 4: database query failed.
Query      : CREATE UNIQUE INDEX idx_objects2_1_1356842146506364_2513780535 ON objects2_1(key);
Called by  : pqObjects2ChildCopyIndex
Reason     : ERROR:  current transaction is aborted, commands ignored until end of transaction block
December 30 12:42:12 ERR [3427]: 4: pqObjects2ChildCopyIndex: Copy command failed.
Reason: ERROR:  could not extend relation 1663/16387/28286: No space left on device
HINT:  Check free disk space.
CONTEXT:  COPY objects2_2, line 5945434
December 30 12:42:12 ERR [3427]: 4: database query failed.
Query      : CREATE UNIQUE INDEX idx_objects2_2_1356842532906421_948636706 ON objects2_2(key);
Called by  : pqObjects2ChildCopyIndex
Reason     : ERROR:  current transaction is aborted, commands ignored until end of transaction block
December 30 12:43:50 ERR [3684]: 4: pqObjects2ChildCopyIndex: Copy command failed.
Reason: ERROR:  could not extend relation 1663/16387/28292: No space left on device
HINT:  Check free disk space.
CONTEXT:  COPY objects2_3, line 5945285
December 30 12:43:50 ERR [3684]: 4: database query failed.
Query      : CREATE UNIQUE INDEX idx_objects2_3_1356842630817015_2535966694 ON objects2_3(key);
Called by  : pqObjects2ChildCopyIndex
Reason     : ERROR:  current transaction is aborted, commands ignored until end of transaction block
 
 
My storage details:
************ Data Store statistics ************
Data storage      Raw    Size   Used   Avail  Use%
                   3.5T   3.3T 935.9G   2.4T  28%
 
Number of containers             : 4159
Average container size           : 234448275 bytes (223.59MB)
Space allocated for containers   : 975070378437 bytes (908.11GB)
Space used within containers     : 965789085432 bytes (899.46GB)
Space available within containers: 9281293005 bytes (8.64GB)
Space needs compaction           : 44191463 bytes (42.14MB)
Reserved space                   : 170125094912 bytes (158.44GB)
Reserved space percentage        : 4.4%
Records marked for compaction    : 1711
Active records                   : 26063814
Total records                    : 26065525
 
Use "--dsstat 1" to get more accurate statistics
 
 
i am wondering why I am running out of space if I still have 2.4 T available?
 
 
 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Seems this is a bug that is resolved in 7.5.0.4.

See this TN: http://www.symantec.com/docs/TECH190714

The TN is for Windows, but probably applies to Unix as well...

**** Hopefully Martin an Mark will be along soon with some more suggestions ****

Mark_Solutions
Level 6
Partner Accredited Certified

You currently have PUT and STORAGED set as NO

It looks like storaged or queue processing has crashed and after 5 retries it locks itself down so will not allow anything to work again.

If you take a look in the storaged.log you will probably find a line telling you that with a nice little comment saying "contact support immediately" - worth doing anyway to try and trace what caused your issues but I guess for now you want to get it up and running so try the following:

Check spoold is actually running - if it is i would re-start it anyway (if in doubt reboot the server)

Next you need to run queue processing manually /usr/openv/pdde/pdcr/bin/crcontrol -processqueue

This will re-activate everything for you and then watch the --getmode until the PUT and STORAGED both change to YES (may well take 20 minutes or so as it needs to do a run through its logs to bring itself back on line)

Hope this gets you up and running - if not lets see some of the storaged.log to see what else is going on