Odd recurring issue, 5320 eventually times out and can not login until reboot. NBU operations fine
We have many appliances with mostly no issues but one in particular is a problem child; the master that we have for one of our environments seems to get hosed after a few weeks of operation and the only solution is to reboot. The problem is nobod can log in - whether using individual Ids or the admin login via SSH, after typing in the password it hangs for a while and eventually just times out.
Backup operations continue without issue, loggin into the console takes a while but eventually works. So far the solution has just been to reboot the thing but I dont like that idea, its not like this is a Windows box that should be bounced "just because" and I have always hated that idea, would rather get to the bottom of it.
I suspect network issues of some kind but its beyond me what they might be, especially since this only seems to happen after a few weeks of activity and the fact that backup/duplication/replication operations are all working just fine.
Any clues as to what I can look for? Assuming I can ever log in, that is!
Turns out this had to do with Active Directory setup; the system kept trying to connect the DC over and over while timing out, chewing up all the system resources. Very strange since we set up bunch of appliances with AD and didn't see this anywhere else even in the same network segment using the same AD controllers. Guess my suspicion was correct, all of those winbindd entries in /var/log/messages is what lead tech. support to the smoking gun.
I would have been here more often and provided more information but since the Wannacry thing hit in May we have lost all internet access so the only way for me to post is from my own system.