cancel
Showing results for 
Search instead for 
Did you mean: 

Odd recurring issue, 5320 eventually times out and can not login until reboot. NBU operations fine

HoldTheLine
Level 4

We have many appliances with mostly no issues but one in particular is a problem child; the master that we have for one of our environments seems to get hosed after a few weeks of operation and the only solution is to reboot.  The problem is nobod can log in - whether using individual Ids or the admin login via SSH, after typing in the password it hangs for a while and eventually just times out.  

Backup operations continue without issue, loggin into the console takes a while but eventually works.  So far the solution has just been to reboot the thing but I dont like that idea, its not like this is a Windows box that should be bounced "just because" and I have always hated that idea, would rather get to the bottom of it. 

I suspect network issues of some kind but its beyond me what they might be, especially since this only seems to happen after a few weeks of activity and the fact that backup/duplication/replication operations are all working just fine.

Any clues as to what I can look for?  Assuming I can ever log in, that is!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Turns out this had to do with Active Directory setup; the system kept trying to connect the DC over and over while timing out, chewing up all the system resources.  Very strange since we set up bunch of appliances with AD and didn't see this anywhere else even in the same network segment using the same AD controllers.  Guess my suspicion was correct, all of those winbindd entries in /var/log/messages is what lead tech. support to the smoking gun.

I would have been here more often and provided more information but since the Wannacry thing hit in May we have lost all internet access so the only way for me to post is from my own system.  

 

View solution in original post

7 REPLIES 7

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

I recall a case where the /tmp was being occupied with log files which hangs the system, or at least makes it a pain to log in. Rebooting would clean it out and eventually it would happen again.

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Checked with a colleague, it was actually not /tmp but some other folder. The fact being that, space not being available was the issue. They had to boot into single user mode to find the culprit to release the space. 

Marianne
Level 6
Partner    VIP    Accredited Certified

Seemingly not too much of an issue as @HoldTheLine has not been back to look for replies .... 

"I recall a case where the /tmp was being occupied with log files which hangs the system, or at least makes it a pain to log in. Rebooting would clean it out and eventually it would happen again."

"Checked with a colleague, it was actually not /tmp but some other folder. The fact being that, space not being available was the issue. They had to boot into single user mode to find the culprit to release the space. "

Thanks, this is something I can look into. 

"Seemingly not too much of an issue as @HoldTheLine has not been back to look for replies .... "

While this - not so much.  Whats with the snark, do you get paid by amount of replies or something?  Seems a bit uncalled for.  As I said NBU is functioning normally with no errors.

 

Definitely take a look at /tmp, /inst and /log to see if they are filling up.  I've seen login issues arise with /tmp filling with tens of thousands of files.

Any support cases on the appliance?

Charles
VCS, NBU & Appliances

I was able to get in and none of the file systems look out of line.  I do see a lot of messages in /var/log/messages about winbindd and  too many open files.  A coworker is going to be opening a case, will see what turns out.  I just wanted to see if I could get a head start on it.  Its acting very strange, sometimes the policy will not over ride, sometimes connections time out and even on the IPMI console it can take a very long time to log in, if at all.

 

During all of this NBU continues to function without issue, backups, restores, replications etc.  Very strange.

 

Thanks

Turns out this had to do with Active Directory setup; the system kept trying to connect the DC over and over while timing out, chewing up all the system resources.  Very strange since we set up bunch of appliances with AD and didn't see this anywhere else even in the same network segment using the same AD controllers.  Guess my suspicion was correct, all of those winbindd entries in /var/log/messages is what lead tech. support to the smoking gun.

I would have been here more often and provided more information but since the Wannacry thing hit in May we have lost all internet access so the only way for me to post is from my own system.