Forum Discussion

Hamza_H's avatar
Hamza_H
Moderator
6 years ago
Solved

A.I.R replication & Import jobs don’t start

Hello all,

We have a problem regarding replication and import jobs that don’t start anymore, nothing changed, it just stopped itself..

The nbu restart didn’t fix it, still no replication or import.

I can see the images who are not completed (nbstlutil stlilist -image_incomplete) and the state are sometimes « not started » , « deffered ».
It is a targeted replication (trusted master configured)

The AIR is in both sides (MasterA<->MasterB)

I spent hours and hours digging into logs legacy and unified at high level (6) and I couldn’t find anything.

Legacy logs from master : bpdbm, admin
Unified logs from master : nbpem, nbstserv, nbrb, nbjm..
Legacy logs from media servers: bpdm, bptm

The credentials validation is success (Credentials>storage servers>replication)

It used to work fine before but nit anymore since this WE.

This isn’t the first time I face a problem regarding this, especially import jobs who don’t start..

I have been through all possible technotes and EEBs and Vox topics.. But NADA !

The bundle and hotfix post upgrade EEBS are already installed with latest versions on master/media servers.


Anyone who has a clue?

Thank you.
  • Hello everyone,

    The problem was resolved after contacting Veritas Support.

    The solution is not published anywhere (internal TN) so I want to share the solution with you to help other people..

    the solution was by suspending the SLP processing  then terminating nbstserv & nbsvcmon processes (make sure they are terminated by a bpps command & grep )

    then restart nbstserv and wait for like 30 seconds and then run the magic command :

    # nbstlutil dropwg 

    (at first I hesitate to do that.. but eventually, I did it and what encouraged me most was because I have seen errors like " index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612).."

    So after the dropwg..Restart the SLP manager and start the service monitor process and the import/replication jobs started to work again.

    The strange thing is that I only did this on one Master .. but it worked for both (Replication & Import)..

    Hope this would help you guys !

    Best regards.

9 Replies

  • Hi there, what storage are you using? Is it NBU appliances or open storage? If open storage, I would recommend investigating on the storage side as well.

  • For replications that do not even start, I would expect to see a reason in nbstserv log, unless there is an issue with nbstserv.  

    I have experienced something similar some years ago at a customer site where duplications did not start. 
    https://vox.veritas.com/t5/NetBackup/SLP-s-stopped-processing/td-p/355495

    Long story short, I noticed that nbstserv process kept on terminating... Last time anything was logged in the /usr/openv/logs/nbstserv folder was 2 days ago.  I tried to restart the process, but a few secons later, the process was no longer running. 
    The real issue was caused by corrupted nblog.conf when the master server disk filled up due to too high logging levels.
    We copied a 'good' nblog.conf from a media server that was running the same OS and NBU version. 
    nbstserv stayed up when I started it and duplications started running.

    About replication that is successful but imports do not happen, I would compare date and time on the 2 masters.
    From Admin Guide I:

    Synchronize the clocks of the master servers in the source and the target
    domains so that the master server in the target domain can import the images
    as soon as they are ready. The master server in the target domain cannot import
    an image until the image creation time is reached. Time zone differences are
    not a factor because the images use Coordinated Universal Time (UTC).

     

    • Hamza_H's avatar
      Hamza_H
      Moderator

      Hello Marianne ,

      Thank you for your reply,

      Actually I have already checked that, but still no clue..

      the nbstserv is running with no issues and the nblog.conf doesn't seem corrupt to me (access via root OK)

      For Date and time, they are synchronized between the masters..

       

      nbstserv log doesn't show much, I just found this morning this: 

      [DsmAccess::getDiskServerTypeByMediaId] failed to get disk volume info: media id = (DsmAccess.cpp:197)
      04/29/2020 17:00:39.072 [ImageMgr::GroupingAssignment] Image client123_123456789, index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612)
      04/29/2020 17:00:39.075 [DsmAccess::getDiskServerTypeByMediaId] failed to get disk volume info: media id = (DsmAccess.cpp:197)
      04/29/2020 17:00:39.075 [ImageMgr::GroupingAssignment] Image client346_123456789, index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612)
      04/29/2020 17:00:39.078 [DsmAccess::getDiskServerTypeByMediaId] failed to get disk volume info: media id = (DsmAccess.cpp:197)
      04/29/2020 17:00:39.078 [ImageMgr::GroupingAssignment] Image client789_123456789, index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612)

      any ideas?

       

      thank you!

       

       

      • sdo's avatar
        sdo
        Moderator

        Issue three commands for each MSDP disk pool, and check that storage server and disk pool and disk volume are all up:up, i.e. admin up and internal up.

        I think the syntax is something like this (but please forgive me if I'm wrong):

        nbdevquery -list_sts -stype PureDisk -storage_server MyStorageServerName

        nbdevquery -listdp -stype PureDisk -dp MyPoolName

        nbdevquery -listdv -stype PureDisk -dv MyDiskVolumeName