Forum Discussion

mrmadej's avatar
mrmadej
Level 4
10 years ago

NDMP on Hitachi Content Platform - error "Too many backups"

Hi All,

I am looking for any advice how to troubleshoot the NDMP backup.
I have Hitachi Content Platform (HCP) which has NDMP implemented.
But on the HCP resides the specific data - huge amount of little files. This is the archiving solution.
There are filesystems created dynamicly by the script and at this moment is about 1100 filesystems - paths to protection.
Regarding the HCP documentation it is able to run 30 backup sessions at one time. But I limited jobs per client to 6. And polices has unchecked "Allow multiple streams per policy".
But very often I noticed the error like: 

09/21/2015 11:12:04 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started
09/21/2015 11:12:34 - Error ndmpagent (pid=16920) host.xxx.xx.xx: Too many backups.
09/21/2015 11:12:34 - Error ndmpagent (pid=16920) NDMP backup failed, path = /fcfs_data/objects/890/
09/21/2015 11:12:52 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started
09/21/2015 11:13:22 - Error ndmpagent (pid=16920) host.xxx.xx.xx: Too many backups.
09/21/2015 11:13:22 - Error ndmpagent (pid=16920) NDMP backup failed, path = /fcfs_data/objects/891/
09/21/2015 11:13:46 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started
09/21/2015 11:14:16 - Error ndmpagent (pid=16920) host.xxx.xx.xx: Too many backups.
09/21/2015 11:14:16 - Error ndmpagent (pid=16920) NDMP backup failed, path = /fcfs_data/objects/892/
09/21/2015 11:14:46 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started
09/21/2015 11:15:17 - Error ndmpagent (pid=16920) host.xxx.xx.xx: Too many backups.
09/21/2015 11:15:17 - Error ndmpagent (pid=16920) NDMP backup failed, path = /fcfs_data/objects/893/
09/21/2015 11:15:47 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started
09/21/2015 11:16:17 - Error ndmpagent (pid=16920) host.xxx.xx.xx: Too many backups.
09/21/2015 11:16:17 - Error ndmpagent (pid=16920) NDMP backup failed, path = /fcfs_data/objects/894/
09/21/2015 11:16:49 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started
09/21/2015 11:17:19 - Error ndmpagent (pid=16920) host.xxx.xx.xx: Too many backups.
09/21/2015 11:17:19 - Error ndmpagent (pid=16920) NDMP backup failed, path = /fcfs_data/objects/895/
09/21/2015 11:17:49 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started
09/21/2015 11:18:19 - Error ndmpagent (pid=16920) host.xxx.xx.xx: Too many backups.
09/21/2015 11:18:19 - Error ndmpagent (pid=16920) NDMP backup failed, path = /fcfs_data/objects/896/
09/21/2015 11:18:49 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started
09/21/2015 11:19:19 - Error ndmpagent (pid=16920) host.xxx.xx.xx: Too many backups.
09/21/2015 11:19:19 - Error ndmpagent (pid=16920) NDMP backup failed, path = /fcfs_data/objects/897/
09/21/2015 11:19:50 - Info ndmpagent (pid=16920) host.xxx.xx.xx: ArC Backup Started

Or:

09/20/2015 11:53:53 - Error ndmpagent (pid=13132) ndmp_data_start_backup failed, status = 9 (NDMP_ILLEGAL_ARGS_ERR)
09/20/2015 11:53:53 - Error ndmpagent (pid=13132) NDMP backup failed, path = /fcfs_data/objects/561/
09/21/2015 10:09:59 - Info bpbrm (pid=13336) terminating bpbrm child 18600 jobid=63221

When the jobs are running and another jobs are waiting for resources, the below command returns connection error:

tpautoconf.exe -verify host.xxx.xx.xx

Connecting to host "host.xxx.xx.xx" as user "admin"...
Waiting for connect notification message...
Timeout waiting for connect reply
: host "host.xxx.xx.xx" failed
NDMP failed to verify host

I have investigated the connection with LAN admin and there are no errors. This is the same LAN segment without any firewall.

I know that I have to open the case with HDS. But I would ask You for advice how to check the the problem is only on the HCP side.

NBU version is 7.6.0.3 on Windows 2012 R2

NDMP:
Connecting to host "host.xxx.xx.xx" as user "admin"...
Waiting for connect notification message...
Opening session--attempting with NDMP protocol version 4...
Opening session--successful with NDMP protocol version 4
  host supports MD5 authentication
Getting MD5 challenge from host...
Logging in using MD5 method...
Host info is:
  host name "host.xxx.xx.xx"
  os type "ArC"
  os version "6.0.2.77"
  host id "host.xxx.xx.xx"
Login was successful
Host supports 3-way backup/restore

 

Thank you for any suggestion.

Regards

Madej

  • There's some info re NDMP backups for HCP in this doc:

    NetBackup for NDMP: NAS appliance information

    https://support.symantec.com/en_US/article.TECH31885.html

    .

    Plus, I'm curious as to why anyone would want to backup an HCP platform device.  Any restore is going to be fraught with problems.  How on earth would you test that the application which is writing to HCP is later able to read any restored data?  The best way to protect an HCP device is to replicate it to another HCP device - i.e. via LAN/WAN asynchronously.   Indeed, HCP is usually sold and configured in pairs, and so you may already have another secondary HCP that the primary HCP is already replicating to, in which case why even attempt a backup.  The same is usually true of EMC Centera, in that we would replicate it, and not attempt a backup.

  • Maybe NDMP sessions are hanging out on the filer OR other backup sessions are running from a different domain or manually on the filer?

    Do only 6 jobs attempt to go active at one time in the activity monitor? (Confirming that 6 jobs per client setting is working.)

    Does your directives contain the New Stream setting? (bypassing the checkbox for multiple streams?)

     

    I would say double check with Hitachi, most vendors have a command to show the current settings for max streams or max backups.  Perhaps by default it is set lower but "could" support more?

     

  • Thank you for response.

    Today I observe that the 'max jobs per client' does not work. Earlier I had 10 jobs per client and I have reduced to 6. In nbrb log appearing the below entry when the new job started:

    Request=1 provider=NamedResourceProvider resourcename=bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx CountedResourceRequest { max=6 }
    2015-09-21 17:34:28.107 V-118-228 [ResBroker_i::passOne] allocation ID {C1A9BF22-9210-4F63-9F6F-B84105555A90}, group ID {00000000-0000-0000-0000-000000000000}, provider NamedResourceProvider, resource bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx
    2015-09-21 17:34:28.107 [ResBroker_i::passOne] allocation: (Allocation: id={C1A9BF22-9210-4F63-9F6F-B84105555A90} provider=NamedResourceProvider resourcename=bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx masterserver=bkp-netbkp-v groupid={00000000-0000-0000-0000-000000000000} userSequence=-1 userid="jobid=63551"  firstuserid="jobid=63551" named resource allocation)

    But when another jobs are wating in the nbrb log I have:

    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)
    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)
    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)
    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)
    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)
    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)
    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)
    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)
    2015-09-21 17:42:41.794 [CountedProvider::allocate] cannot allocate, resource is busy bkp-netbkp-v.NBU_CLIENT.MAXJOBS.host.xxx.xx.xx (current_ref_count=10, max_ref_count=10)

    So it looks like the setting does not effect the jobs. And there are 10 active jobs. But this is still less then 30 supported by the HCP.


    I don't know HCP and I don't have access to the command line. And from the GUI there is nothing to check count of the sessions.

    Regards

    Madej

  • There's some info re NDMP backups for HCP in this doc:

    NetBackup for NDMP: NAS appliance information

    https://support.symantec.com/en_US/article.TECH31885.html

    .

    Plus, I'm curious as to why anyone would want to backup an HCP platform device.  Any restore is going to be fraught with problems.  How on earth would you test that the application which is writing to HCP is later able to read any restored data?  The best way to protect an HCP device is to replicate it to another HCP device - i.e. via LAN/WAN asynchronously.   Indeed, HCP is usually sold and configured in pairs, and so you may already have another secondary HCP that the primary HCP is already replicating to, in which case why even attempt a backup.  The same is usually true of EMC Centera, in that we would replicate it, and not attempt a backup.