Forum Discussion

tgenova's avatar
tgenova
Level 4
10 years ago

VVR paused due to network disconnection

Hi all.

I have a global cluster with 2 minicluster systems with solaris 10 installed (SPARC)

primary: MIVDB01S - 172.22.8.132

secondary: MILDB08S - 10.66.11.148               

I stopped the secondary (init 0) for 3 days, and after that I startup the secondary (boot from ok-prompt) and after 1 day I checked for the situation of the replication but:

MIVDB01S
root
  vradmin -g datadg printrvg datarvg
    Replicated Data Set: datarvg
    Primary:
        HostName: 172.22.8.132  <localhost>
        RvgName: datarvg
        DgName: datadg
    Secondary:
        HostName: 10.66.11.148
        RvgName: datarvg
        DgName: datadg

  vxrlink -g datadg status datarlk
    Wed Jan 21 09:51:08 2015
    VxVM VVR vxrlink INFO V-5-1-12887 DCM is in use on rlink datarlk. DCM contains 874432 Kbytes (1%) of the Data Volume(s).

  vradmin -g datadg repstatus datarvg 
    Replicated Data Set: datarvg
    Primary:
      Host name:                  172.22.8.132
      RVG name:                   datarvg
      DG name:                    datadg
      RVG state:                  enabled for I/O
      Data volumes:               1
      VSets:                      0
      SRL name:                   srl_vol
      SRL size:                   1.00 G
      Total secondaries:          1

    Secondary:
      Host name:                  10.66.11.148
      RVG name:                   datarvg
      DG name:                    datadg
      Data status:                consistent, behind
      Replication status:         paused due to network disconnection (dcm resynchronization)
      Current mode:               asynchronous
      Logging to:                 DCM (contains 874432 Kbytes) (SRL protection logging)
      Timestamp Information:      N/A

  vxprint -Pl | grep flags
    flags:    write enabled attached consistent disconnected asynchronous dcm_logging resync_paused

-----

1st solution

 

MIVDB01S

root

vradmin -g datadg resync datarvg

-----

2nd solution

Stop vradmin on secondary then on primary

# /usr/sbin/vxstart_vvr stop

Start vradmin on secondary then on primary

# /usr/sbin/vxstart_vvr start

Can you help me ?

 

  • Hi.

    I confirm you that after the activity on our Firewall (the rules were ok, but they didn't work properly), the situation is ok now.

    The KSTATE immediately switched from ENABLE to CONNECT and the sync was ok.

    Thank you very much for your best support

    BR

    Tiziano

     

     

      vxprint -P
        Disk group: datadg

        TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
        rl datarlk      datarvg      CONNECT  -        -        ACTIVE   -       -

      vxprint -P
        Disk group: datadg

        TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
        rl datarlk      datarvg      CONNECT  -        -        ACTIVE   -       -

    - and then:

      vradmin -g datadg repstatus datarvg
        Replicated Data Set: datarvg
        Primary:
          Host name:                  172.22.8.132
          RVG name:                   datarvg
          DG name:                    datadg
          RVG state:                  enabled for I/O
          Data volumes:               1
          VSets:                      0
          SRL name:                   srl_vol
          SRL size:                   1.00 G
          Total secondaries:          1

        Secondary:
          Host name:                  10.66.11.148
          RVG name:                   datarvg
          DG name:                    datadg
          Data status:                consistent, up-to-date
          Replication status:         replicating (connected)
          Current mode:               asynchronous
          Logging to:                 SRL
          Timestamp Information:      behind by 0h 0m 0s

      vxrlink -g datadg status datarlk
        Thu Jan 29 09:29:02 2015
        VxVM VVR vxrlink INFO V-5-1-4639 Rlink datarlk has 1 outstanding write, occupying 33 Kbytes (0%) on the SRL

      vxrlink -g datadg status datarlk
        Thu Jan 29 09:29:07 2015
        VxVM VVR vxrlink INFO V-5-1-4467 Rlink datarlk is up to date

     

  • Hi,

    One thing for sure is that SRL has overflown so the DCM logging is happening .... DCM synchronization will be required in any case ..

    however before that, you need to confirm if network connection is back ...

    # vxprint -qthg <diskgroup> | egrep "^rl"

    The result should be that rlink is in "CONNECT ACTIVE" state on both primary & secondary.

    If above the case, then I would suggest to for vradmin -g datadg resync datarvg, this should start resync of DCM & you can monitor the same in vxrlink status command

    however If the rlink is in "ENABLED ACTIVE" state then you would need to start replication again using autosync (vradmin startrep OR vxrlink attach ) which will go for full resync

     

    G

  • Hi G.

    so, because I'm in the following situation:

    root@MIVDB01S # vxprint -qthrg datadg | egrep "^rl"
      rl datarlk      datarvg      ENABLED  ACTIVE   10.66.11.148 datadg datarlk

    root@MILDB08S # vxprint -qthrg datadg | egrep "^rl"
      rl datarlk      datarvg      ENABLED  ACTIVE   172.22.8.132 datadg datarlk

    where MI1DB01S is the primary

    root@MIVDB01S # vradmin -g datadg printrvg datarvg
    Replicated Data Set: datarvg
    Primary:
            HostName: 172.22.8.132  <localhost>
            RvgName: datarvg
            DgName: datadg
    Secondary:
            HostName: 10.66.11.148
            RvgName: datarvg
            DgName: datadg

    The correct solutiion should be:

    root@MIVDB01S # vradmin -g datadg -a startrep datarvg

    I only have a little doubt about the flag (-a ?).

    Can you confirm the previous command ?

     

    BR.

    Tiziano

     

     

     

  • Hi,

    As the rlinks are not in CONNECT ACTIVE state, that means current replication is broken.

    At this point, first troubleshoot if your secondary is reachable from primary ? (a ping test )

    Confirm that your /etc/hosts file is correctly reflecting the hostname/IP address mapping or DNS is doing the correct resolution.

    Once you find secondary is reachable, use the startrep command (command is right). -a flag is for autosync, it is likely that if -a flag gives any error, run the command without -a which will be a full sync.

    Once you trigger a command with -a, wait for few moments & double check if rlink has came in CONNECT ACTIVE state, if yes, start checking # vxrlink -g datadg -i5 status <rlink_to_secondary>      to ensure you see reduction in data replicating to secondary.

     

    G

  • Hi.

     

    root@MIVDB01S # vxprint -P
    Disk group: datadg

    TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
    rl datarlk      datarvg      ENABLED  -        -        ACTIVE   -       -
     

    root@MIVDB01S # vradmin -g datadg startrep datarvg
    VxVM VVR vradmin ERROR V-5-52-268 One of the options -a, -c, -f or -b must be used.
    VxVM VVR vradmin INFO V-5-52-258
        Usage: vradmin [-g diskgroup] {-a | -c checkpoint | -f | -b} startrep rvg [sechost]
     

    root@MIVDB01S # vradmin -g datadg -a startrep datarvg
    Message from Primary:
    VxVM VVR vxrlink ERROR V-5-1-3531 Rlink datarlk is already attached


    root@MIVDB01S # vxprint -P
    Disk group: datadg

    TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
    rl datarlk      datarvg      ENABLED  -        -        ACTIVE   -       -

    So I supposed something similar to (from primary):

    root@MIVDB01S # vxrlink -g datadg -f det datarlk    

    root@MIVDB01S # vxrlink -g datadg att datarlk   

    What do you think ?

    BR

    Tiziano

  • Yep, you are right, it seems the status is still attached in kernel, so a detach & attach will required.

    This is going to start complete repllication though.

    G

  • Hi.

    I noted something strange in this situation, so before to write the last 2 commands (detatch and attach), I decided to verify about some drops on firewall and I had an answer from my collegues that manages the firewall:

    we have drop on port 4145 UDP between primary and secondary.

    Now they are investigating because I planned a correct rule about that port.

    Please wait.

    BR

    Tiziano

     

  • Hi.

    I confirm you that after the activity on our Firewall (the rules were ok, but they didn't work properly), the situation is ok now.

    The KSTATE immediately switched from ENABLE to CONNECT and the sync was ok.

    Thank you very much for your best support

    BR

    Tiziano

     

     

      vxprint -P
        Disk group: datadg

        TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
        rl datarlk      datarvg      CONNECT  -        -        ACTIVE   -       -

      vxprint -P
        Disk group: datadg

        TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
        rl datarlk      datarvg      CONNECT  -        -        ACTIVE   -       -

    - and then:

      vradmin -g datadg repstatus datarvg
        Replicated Data Set: datarvg
        Primary:
          Host name:                  172.22.8.132
          RVG name:                   datarvg
          DG name:                    datadg
          RVG state:                  enabled for I/O
          Data volumes:               1
          VSets:                      0
          SRL name:                   srl_vol
          SRL size:                   1.00 G
          Total secondaries:          1

        Secondary:
          Host name:                  10.66.11.148
          RVG name:                   datarvg
          DG name:                    datadg
          Data status:                consistent, up-to-date
          Replication status:         replicating (connected)
          Current mode:               asynchronous
          Logging to:                 SRL
          Timestamp Information:      behind by 0h 0m 0s

      vxrlink -g datadg status datarlk
        Thu Jan 29 09:29:02 2015
        VxVM VVR vxrlink INFO V-5-1-4639 Rlink datarlk has 1 outstanding write, occupying 33 Kbytes (0%) on the SRL

      vxrlink -g datadg status datarlk
        Thu Jan 29 09:29:07 2015
        VxVM VVR vxrlink INFO V-5-1-4467 Rlink datarlk is up to date