cancel
Showing results for 
Search instead for 
Did you mean: 

A.I.R replication & Import jobs don’t start

Hamza_H
Moderator
Moderator
   VIP   
Hello all,

We have a problem regarding replication and import jobs that don’t start anymore, nothing changed, it just stopped itself..

The nbu restart didn’t fix it, still no replication or import.

I can see the images who are not completed (nbstlutil stlilist -image_incomplete) and the state are sometimes « not started » , « deffered ».
It is a targeted replication (trusted master configured)

The AIR is in both sides (MasterA<->MasterB)

I spent hours and hours digging into logs legacy and unified at high level (6) and I couldn’t find anything.

Legacy logs from master : bpdbm, admin
Unified logs from master : nbpem, nbstserv, nbrb, nbjm..
Legacy logs from media servers: bpdm, bptm

The credentials validation is success (Credentials>storage servers>replication)

It used to work fine before but nit anymore since this WE.

This isn’t the first time I face a problem regarding this, especially import jobs who don’t start..

I have been through all possible technotes and EEBs and Vox topics.. But NADA !

The bundle and hotfix post upgrade EEBS are already installed with latest versions on master/media servers.


Anyone who has a clue?

Thank you.
1 ACCEPTED SOLUTION

Accepted Solutions

Hamza_H
Moderator
Moderator
   VIP   

Hello everyone,

The problem was resolved after contacting Veritas Support.

The solution is not published anywhere (internal TN) so I want to share the solution with you to help other people..

the solution was by suspending the SLP processing  then terminating nbstserv & nbsvcmon processes (make sure they are terminated by a bpps command & grep )

then restart nbstserv and wait for like 30 seconds and then run the magic command :

# nbstlutil dropwg 

(at first I hesitate to do that.. but eventually, I did it and what encouraged me most was because I have seen errors like " index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612).."

So after the dropwg..Restart the SLP manager and start the service monitor process and the import/replication jobs started to work again.

The strange thing is that I only did this on one Master .. but it worked for both (Replication & Import)..

Hope this would help you guys !

Best regards.

View solution in original post

9 REPLIES 9

VirgilDobos
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi there, what storage are you using? Is it NBU appliances or open storage? If open storage, I would recommend investigating on the storage side as well.

--Virgil

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

For replications that do not even start, I would expect to see a reason in nbstserv log, unless there is an issue with nbstserv.  

I have experienced something similar some years ago at a customer site where duplications did not start. 
https://vox.veritas.com/t5/NetBackup/SLP-s-stopped-processing/td-p/355495

Long story short, I noticed that nbstserv process kept on terminating... Last time anything was logged in the /usr/openv/logs/nbstserv folder was 2 days ago.  I tried to restart the process, but a few secons later, the process was no longer running. 
The real issue was caused by corrupted nblog.conf when the master server disk filled up due to too high logging levels.
We copied a 'good' nblog.conf from a media server that was running the same OS and NBU version. 
nbstserv stayed up when I started it and duplications started running.

About replication that is successful but imports do not happen, I would compare date and time on the 2 masters.
From Admin Guide I:

Synchronize the clocks of the master servers in the source and the target
domains so that the master server in the target domain can import the images
as soon as they are ready. The master server in the target domain cannot import
an image until the image creation time is reached. Time zone differences are
not a factor because the images use Coordinated Universal Time (UTC).

 

Hamza_H
Moderator
Moderator
   VIP   

Hello @Marianne ,

Thank you for your reply,

Actually I have already checked that, but still no clue..

the nbstserv is running with no issues and the nblog.conf doesn't seem corrupt to me (access via root OK)

For Date and time, they are synchronized between the masters..

 

nbstserv log doesn't show much, I just found this morning this: 

[DsmAccess::getDiskServerTypeByMediaId] failed to get disk volume info: media id = (DsmAccess.cpp:197)
04/29/2020 17:00:39.072 [ImageMgr::GroupingAssignment] Image client123_123456789, index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612)
04/29/2020 17:00:39.075 [DsmAccess::getDiskServerTypeByMediaId] failed to get disk volume info: media id = (DsmAccess.cpp:197)
04/29/2020 17:00:39.075 [ImageMgr::GroupingAssignment] Image client346_123456789, index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612)
04/29/2020 17:00:39.078 [DsmAccess::getDiskServerTypeByMediaId] failed to get disk volume info: media id = (DsmAccess.cpp:197)
04/29/2020 17:00:39.078 [ImageMgr::GroupingAssignment] Image client789_123456789, index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612)

any ideas?

 

thank you!

 

 

sdo
Moderator
Moderator
Partner    VIP    Certified

Issue three commands for each MSDP disk pool, and check that storage server and disk pool and disk volume are all up:up, i.e. admin up and internal up.

I think the syntax is something like this (but please forgive me if I'm wrong):

nbdevquery -list_sts -stype PureDisk -storage_server MyStorageServerName

nbdevquery -listdp -stype PureDisk -dp MyPoolName

nbdevquery -listdv -stype PureDisk -dv MyDiskVolumeName

Hamza_H
Moderator
Moderator
   VIP   
Hello @sdo,
Thank you for your reply,
Yes i have checked that at first place and they are all up (sts, internal and adminup..)
And also its flagged replication target on both sts
I did a manual replication and it worked but the import didn’t work so i had to import it manually also.. but the manual import named « image import » is staying active (no progression) and then an import job got created and went successful.. but still no auto replication/import ..

sdo
Moderator
Moderator
Partner    VIP    Certified

Did something change recently?  If you implemented Data Classifications this can break existing SLPs.  Data Classifications are fairly tricky to setup and most small and medium shops don't need them.

Other things to check are that SLP are not suspended.  Also check SLP run windows.  And check SLP parameters.

Hamza_H
Moderator
Moderator
   VIP   

Hi @sdo , nothing changed recently including data classification

slps are not suspended and the SLP parameters looks fine to me .. :\

Hamza_H
Moderator
Moderator
   VIP   

Hello everyone,

The problem was resolved after contacting Veritas Support.

The solution is not published anywhere (internal TN) so I want to share the solution with you to help other people..

the solution was by suspending the SLP processing  then terminating nbstserv & nbsvcmon processes (make sure they are terminated by a bpps command & grep )

then restart nbstserv and wait for like 30 seconds and then run the magic command :

# nbstlutil dropwg 

(at first I hesitate to do that.. but eventually, I did it and what encouraged me most was because I have seen errors like " index 1 cannot be assigned to a workgroup, status= 130(ImageManager.cpp:2612).."

So after the dropwg..Restart the SLP manager and start the service monitor process and the import/replication jobs started to work again.

The strange thing is that I only did this on one Master .. but it worked for both (Replication & Import)..

Hope this would help you guys !

Best regards.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Thanks for the feedback @Hamza_H !