03-05-2013 03:36 PM
Before I detail the problem here are some details about my setup. The master server is AIX 5.3 with version 7.5.0.3 and the media server in question is RHEL 6.3 with 7.5.0.3 as well. This is a new media server install as well and hasn't yet been used for anything.
It seems that when running "vmoprcmd -h mediaserver -extall" on the master server that it hangs for along period of time before exiting with "network protocol error (39)". I enabled verbose logging for volmgr in vm.conf, the log directories under the debug directory and touched a bunch of "touch files" to enable more debug output but when this error occurs the only log file created on the media server is under the daemon folder and there isn't anything useful inside.
If I run tpconfig to display the device config on the media server (via the -l, -d or -dl options. I haven't tried any other options) the problem seems to temporarily go away. After some undetermined amount of time though if I again try to run "vmoprcmd -h mediaserver -extall" it will hang and time out again.
When first attempting to configure the robot and drives that are connected to this new media server my client had issues and crashed. I'm not sure if this problem is a result of that or the cause. After the first failed attempt to configure the robot and drives I deleted them from Netbackup and reinstalled the media server. After that I was able to successfully configure the robot and drives.
Here is the output from running the 'vmoprcmd -h mediaserver -extall' command:
Solved! Go to Solution.
03-08-2013 03:27 PM
This turned out to be a fragmentation problem with an intersite link between data centers that is using a GRE tunnel. One of 2 solutions worked to fix this. Either lower the MTU on the network or enable tcp_mtu_probing in the linux kernel.
03-05-2013 08:11 PM
What in the debug dir did you create? The usual one are reqlib, tpcommand, daemon & ltid.
Check vmd & ltid on this media server to see if they goes up and down intermittently, it shouldn't do that but if it did, check the logs above to find out why.
Are you sharing tape devices (SSO) with other media servers?
nbemmcmd -listhosts -verbose from this media server to check if there is any setting/connectivity issue.
03-06-2013 07:53 AM
I created the following directories in the debug directory: acsssi, daemon, ltid, reqlib, robots, tpcommand. There are no log files created in any of those directories when the problem occurs other than the debug directory and that only contains entries for the 'vmoprcmd -h mediaserver -extall' command with no error or useful informatino to the problem.
The robot and tape drives are local only, not shared.
The output from 'nbemmcmd -listhosts -verbose' looks no different for the media server in question than any other.
03-06-2013 02:30 PM
It's unusual you can't see any logs in those debug dirs. Have you got VERBOSE in vm.conf?
Check bp.conf to find out if there is any SERVER or MEDIA_SERVER entries that you can't ping or bpclntcmd to.
Also check this technote: http://www.symantec.com/docs/TECH126537
03-07-2013 02:02 PM
This seems to possibly be a network issue somehow. I see a packet going out of the media server but it doesn't ever appear at the master server. I'm not sure how running 'tpconfig' first fixes a packet from disappearing.
I'm attempting to engage our network guys and will hopefully be able to dig into this some more tomorrow.
03-08-2013 03:27 PM
This turned out to be a fragmentation problem with an intersite link between data centers that is using a GRE tunnel. One of 2 solutions worked to fix this. Either lower the MTU on the network or enable tcp_mtu_probing in the linux kernel.