11-07-2012 06:08 AM
I am trying to create a secondary and I am having all sorts of connection problems. I have 2 routers in between my servers set to a 512k connection. I have a continues ping going in a Command Prompt and am getting back at 2ms with no time outs. But when I try to connect from server one to server two I get an error v-39-53247-7. After a bunch of attempts I can sometimes get connected and am able to configure the RDS. But again once the RDS for the secondary is setup and I click start the status changes from connected to Disconnect ever few seconds and replication never happens. Where do I need to begin my troubleshooting?
Addition note: This is server2008r2 running SFWHA 5.1 SP2 CP13
Solved! Go to Solution.
11-08-2012 05:24 AM
If by "no changes are made to the secondary" you mean you tried to change to TCP and the protocol changed on primary, but not on the secondary, this will because vxrds (and maybe the GUI) will fail to change the secondary if you are having connect problems, so you will need to use vxedit to change the protocol on the secondary.
If by "no changes are made to the secondary" you mean you add a file on primary while volume is mounted readonly on secondary and you don't see the change, then this is normal, as you shouldn't really mount secondary, but if replication is working and you remount secondary, then you should see the change.
Mike
11-07-2012 06:34 AM
You could try pinging with a packet size of 8192 to check networt can handle large packets - so for example in Linux run the following on primary:
ping -s 8192 sec_host
Choosing the packet sizeIf you have selected the UDP transport protocol for replication, the UDP packetsize used by VVR to communicate between hosts could be an important factor inthe replication performance. By default, VVR uses a UDP packet size of 8400 bytes.In certain network environments, such as those that do not support fragmentedIP packets, it may be necessary to decrease the packet size.If the network you are using loses many packets, the effective bandwidth availablefor replication is reduced. You can tell that this is happening if you run vxrlinkstats on the RLINK, and see many timeout errors.In this case, network performance may be improved by reducing the packet size.If the network is losing many packets, it may simply be that each time a largepacket is lost, a large retranmission has to take place. In this case, try reducingthe packet size until the problem is ameliorated.
You could also try using a different protocol so use "vxprint -Pl" to see if TCP or UDP is being used and then try the other using:
vxedit -g diskgroup set protocol=UDP|TCP rlink_name
11-07-2012 10:21 AM
I removed the routers and ran a crossover from one server to the other and set the speed to 10mb. Once I change the Primary and secondary to the new IP's everything started working right away.
Is there a Min bandwidth needed from replication?
11-07-2012 12:31 PM
I don't think there is any minimum bandwidth, but if there are lots of timeouts, then VVR will disconnect which is what you seem to be getting. If the packet size used by VVR is too big for network so it gets broken up by switches I have seen links much quicker than 10Mbits go to a crawl as there are a lot of timeouts, so if you only have 512Kbits and you are getting lots of timeouts then this could be why rlink is disconnecting. I would run:
vxrlink -g diskgroup stats rlink_name
to see if you are getting a lot of timeouts (use vxprint -P to get name of rlink)
Mike
11-08-2012 04:02 AM
Funny you say that, I had tried that command yesterday and received an error.
C:\Users\administrator.AVMAIL>vxprint -lPV Diskgroup = BasicGroup Diskgroup = AvMailDiskGroup Rvg : nycv state : state=ACTIVE kernel=ENABLED assoc : datavols=D: srl=\Device\HarddiskDmVolumes\AvMailDiskGroup\rep rlinks=rlk_144_6740 att : rlinks=rlk_144_6740 checkpoint : flags : primary enabled attached clustered Rlink : rlk_144_6740 info : timeout=98 packet_size=1100 latency_high_mark=10000 latency_low_mark=9950 bandwidth_limit=1000 state : state=ACTIVE synchronous=override latencyprot=off srlprot=autodcm assoc : rvg=nycv remote_host=10.10.1.2 remote_dg=AvMailDiskGroup remote_rlink=rlk_mhsws001anp-1_18231 local_host=10.10.1.1 protocol : UDP/IP flags : write attached consistent connected Diskgroup = MHSWS001ANP-1-Dg0 |
C:\Users\administrator.AVMAIL>vxrlink -g avmaildiskgroup startstats rlk_mhsws001 anp-1_18231 Failed to perform the operation. Error V-107-58644-914: RLINK name is not valid. |
11-08-2012 05:03 AM
vxrlink -g AvMailDiskGroup stats rlk_144_6740
Also, note on a previous point where you asked "Is there a Min bandwidth needed from replication?" I have found out that in UNIX the minimum you can set the bandwidth_limit to is 56kbps (https://sort.symantec.com/public/documents/sfha/6.0/solaris/manualpages/html/man/volume_manager/html...) but for Windows this is 1Mbps (see Windows VVR admin guide page 327). So this implies 512kpbs would definaltly work on UNIX, but I hadn't realised until your last post that you were using Windows. This does not necessarily mean that anything less that 1Mbs will not work on WIndows, it could be you just can't set the bandwidth_limit less than this, but I don't know why there is so much difference between UNIX and Windows as to what you can set the bandwidth_limit to as the difference between 56kbps and 1Mbps is huge.
Have you tried using protocol=TCP yet? - I often find this helps when UDP has issues - you can set this retrospectively using vxedit (using vxrds probably won't work if it can't connect) or you can specify protocol=TCP when you create secondary using "vxrds addsec" (or I guess you probably have this option in the GUI)
Mike
11-08-2012 05:13 AM
Ok, didnt realize I couldn't check the remote, yes the local was fine.When I run the command from the secondary server I receive an error that the command can only be run from the Primary.
1. The default size for UDP was 1400 I change to 1100 to see if it got better.
2. I set the bandwith limit to 1000 after having it at max for a few days.
3. As I am just trying to get the initial steps for setup figured out I wasn't worried about the rlink name. Once I have a set plan I will rebuild everything to make sure the process is correct.
Again with Chapter 8 :) that was the only chapter I didnt print out. I am going to do that this morning.
I did switch to TCP, once I stopped and start the replication process everything moved over. But again I am back to when I add somthing new or remove something my network monitor shows activity but no changes are made to the secondary. Is there a setting I missed? I did have an error at one point that pointed to the SwiftSync issue. Do you think this may have somthing to do with my data not replicating?
11-08-2012 05:24 AM
If by "no changes are made to the secondary" you mean you tried to change to TCP and the protocol changed on primary, but not on the secondary, this will because vxrds (and maybe the GUI) will fail to change the secondary if you are having connect problems, so you will need to use vxedit to change the protocol on the secondary.
If by "no changes are made to the secondary" you mean you add a file on primary while volume is mounted readonly on secondary and you don't see the change, then this is normal, as you shouldn't really mount secondary, but if replication is working and you remount secondary, then you should see the change.
Mike
11-08-2012 05:48 AM
This:
"If by "no changes are made to the secondary" you mean you add a file on primary while volume is mounted readonly on secondary and you don't see the change, then this is normal, as you shouldn't really mount secondary, but if replication is working and you remount secondary, then you should see the change."
Yep, I removed the drive letter then readded and the data was there. So when I am doing my testing i should leave the drive with no drive letter unless i need to see somthing?
11-08-2012 06:01 AM
The proper way to check your secondary data is valid is to take a snapshot and mount the snapshot, or of course you can do a migration of roles to make your secondary writable so that you can mount. Mounting (giving a drive letter to) your secondary read-only is not supported so your should not normally have a drive letter assigned at the secondary, but it is ok for reassurance during testing, but after a while you will trust VVR, as if VVR says it is up-to-date, I have never known this not to be the case, so VVR should tell you if it does not have the data by saying it has "X bytes in the SRL or DCM"
So to summarise, are you saying that you have issues of rlink disconnecting when using UDP mode, but it works ok using TCP?
Mike
11-08-2012 06:09 AM
"So to summarise, are you saying that you have issues of rlink disconnecting when using UDP mode, but it works ok using TCP?"
A couple things appeared the be the problem in the begining.
1. my connection was set to 512, we now know 1mb is the min.
2. I had the drive mounted so I only say the initial change when I first setup the replication.
Now that I have a 1mb UDP appears to be fine so long as the drive is not mounted.
Thanks for all the help!