11-09-2020 02:37 AM
Hello
Recently I did upgrade NBU on this box to 3.2 and than did apply Maintenance Release 1 pack. after few days I did notice I am unable to ping one of its clients. NBU appliance as well this client is in the same subnet - so no routing involved, but still pings does not work. I did open a case with VRTS and waiting for their inputs...
NBU app is having a bond in 802.3ad mode made from two 10 Gbps NICs ... on top of that I laid down vlan 10.
vlan10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.60.208.47 netmask 255.255.255.0 broadcast 10.60.208.255
inet6 fe80::21e:67ff:fef2:cbd5 prefixlen 64 scopeid 0x20<link>
ether 00:1e:67:f2:cb:d5 txqueuelen 1000 (Ethernet)
routing table
netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.60.208.1 0.0.0.0 UG 0 0 0 vlan10
10.60.208.0 0.0.0.0 255.255.255.0 U 0 0 0 vlan10
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 vlan10
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 vlan20
192.168.229.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
I am able to ping GW
ping 10.60.208.1
PING 10.60.208.1 (10.60.208.1) 56(84) bytes of data.
64 bytes from 10.60.208.1: icmp_seq=1 ttl=255 time=1.29 ms
^C
--- 10.60.208.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.290/1.290/1.290/0.000 ms
but I am unable to ping 10.60.208.5
PING 10.60.208.5 (10.60.208.5) 56(84) bytes of data.
^C
--- 10.60.208.5 ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 4999ms
Below is client config
Windows IP Configuration
Ethernet adapter Live Network Connection:
Connection-specific DNS Suffix . : some.com
IPv4 Address. . . . . . . . . . . : 10.60.208.5
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 10.60.208.1
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 10.60.208.1 10.60.208.5 271
10.60.208.0 255.255.255.0 On-link 10.60.208.5 271
10.60.208.5 255.255.255.255 On-link 10.60.208.5 271
10.60.208.255 255.255.255.255 On-link 10.60.208.5 271
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 10.60.208.5 271
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 10.60.208.5 271
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
0.0.0.0 0.0.0.0 10.60.208.1 Default
===========================================================================
IPv6 Route Table
===========================================================================
Active Routes:
If Metric Network Destination Gateway
1 331 ::1/128 On-link
1 331 ff00::/8 On-link
===========================================================================
Persistent Routes:
None
There is no Firewall on this client - it is a VM running windows server 2016...
I can't see the NBU app IP in arp -a outcome - not sure what to do .... do you have any idea?
Solved! Go to Solution.
11-10-2020 01:01 AM
Today I started to map what VMs are residing on which ESXi and was pinging these. It turned out I can ping few and few from the same subnet from the same ESXi can't.
I forced network team to verify LACP status on their end, and it turned out that one of the ports in this port channel was suspended (down) while NBU appliance was showing for both UP status.
So I did elevate to CLI and run ifdown bond0 followed by ifup bond0
Than I had to add again default gw
And everything is working back fine…
I am just wondering why appliance did not notice the remote port being in down state.
Nonetheless thank you for your assistance.
11-09-2020 09:08 AM
11-09-2020 09:14 AM
Hello
Please find my answers below:
1) Can the client ping the appliance back successfully?
No it can not.
2) ping is certainly handy but it's not actually required for NBU to function - what kind of backup errors are you seeing ? Can you telnet to the PBX, bpcd, and/or vnetd ports Client-->Media Server ? Media Server-->Client ?
No, I cant connect from neither client (2 master) nor master (to client) - if I do ping from client I cant see in arp -a MAC address from master... Something is NOK
3) Can the appliance ping anything else on that subnet above the client's IP (i.e. x.x.x.6, etc. ) ?
Yes, appliance can ping GW and some other clients within same subnet - this is really weird.
4) Are backups of everything else on that subnet working except for that one particular client ?
Almost all clients are VMs and I am using vmware policy type to back up these... Problem arised when I was willing to restore to this client some files using BAR on it....
5) Have you run any of the bpclntcmd or bptestbpcd commands between the two boxes for more info ?
No - if nothing is working this will not help...
6) How is the client's connection to the Master ?
vm based backup....
Also I did on Fri a reboot of this appliance - then I was able to ping everything within the same subnet... but I was unable to ping default GW, so my VM based backups was jeopardized - so I rebooted it again and ended in the previous state. Could backup using vmware but was unable to restore... etc... I am really lost....
11-09-2020 09:36 AM
If you're using a VMWare policy type, then backing up the VM isn't an indicator of network connectivity b/t the appliance and the VM. I suspect you have a VLAN problem or a vSwitch problem where the vSwitch your client connects through isn't actually connected to the same physical network as the appliance. That's what the client not being able to ping the appliance but the appliance networking functioning for other paths tells me.
Are the VM and the appliance going through different gateways? Is there a route anywhere to route those networks?
You say there is no firewall, but with Win2016 I've seen lots of admins insist for DAYS that the firewall is turned off only to find out 'oops, on this one server something went wrong with GPO and it's turned on'. I'd actively go allow NBU ports through, and check that ping responses aren't disabled.
Are you 100% certain the VLAN setup for the appliance NIC(s) and this VM are 100% the same? Again, lots of time troubleshooting things on this branch of the tree to find days later "oops, we fat-fingered VLAN 1040 when it was supposed to be 1049" or the like.
11-09-2020 09:47 AM
11-09-2020 09:56 AM
Hi
Thanks... But I am also part of admin group and I was on this wintel box and did check firewall.cpl - everything is out there turned off. Both servers are in the same vlan 10, same gateway for both 10.60.208.1....
Network team was involved... per them all is OK. Now wintel/vm admins are on it... I did request to reboot that box... and see - you know this is windows so for sure this won't harm.
11-10-2020 01:01 AM
Today I started to map what VMs are residing on which ESXi and was pinging these. It turned out I can ping few and few from the same subnet from the same ESXi can't.
I forced network team to verify LACP status on their end, and it turned out that one of the ports in this port channel was suspended (down) while NBU appliance was showing for both UP status.
So I did elevate to CLI and run ifdown bond0 followed by ifup bond0
Than I had to add again default gw
And everything is working back fine…
I am just wondering why appliance did not notice the remote port being in down state.
Nonetheless thank you for your assistance.