Can anybody help me out with my enviornment (details below), facing frequent issues with errors 40, 13, 20, 50 etc.
One Master server, 165 SAN Media Servers, HP VLS 9000 (VTL).
We use flash backups on all 165 servers and run the backup through sepearate policy for each server.
Backups from Monday to Sunday, 6 days Diff Incr and 1 day Full backup. Each has got 2 week retention.
I need help on the setup and improvement in performance.
Please let me know in case any information is required on the same.
Thanks for you response.
Master server OS:
[/usr/openv/lib]# cat /etc/*-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)
[/usr/openv/lib]# uname -mrs
Linux 2.6.18-53.1.6.el5 x86_64
We are using 7.5.3 on Master as well as all SAN Media servers.
I have made those changes as you advised above and will monitor the environment for couple of days and get back to you.
However do we need to make any changes to Kernel Semaphores in regards to tune up the performance on the master.
Current Kernel Sem values:
# Syntax of the following paramter: kernel.sem = SEMMSL SEMMNS SEMOPM SEMMNI
# 4 values defining limits for System V IPC semaphores.
# These fields are, in order:
# SEMMSL The maximum semaphores per semaphore set.
# SEMMNS A system-wide limit on the number of semaphores in all semaphore sets.
# SEMOPM The maximum number of operations that may be specified in a semop(2) call.
# SEMMNI A system-wide limit on the maximum number of semaphore identifiers.
kernel.sem = 300 32000 64 1024
Because I have read about those values in some of the tech notes from Symantec.
Please suggest me on the same.
Thanks a ton Mark.
Love to help you but this one is out of my scope i am afraid!
If you have found it is a tuning tech note from Symantec then I can't see that it would hurt - but maybe try one thing at a time so that you know what actually does the trick for you
Even after tuning those parameters suggested by you, backups are failing with error 13 and 40.
Please find the detailed description below for both the errors.
4/3/2013 4:09:07 AM - Info bpbkar(pid=9680) 60000 entries sent to bpdbm
4/3/2013 4:09:07 AM - Error bpbrm(pid=9636) from client sz0066a-util.westchester.pa.mail.comcast.net: 1364951613 755297/
4/3/2013 4:09:22 AM - Error bpbrm(pid=9636) db_FLISTsend failed: file read failed (13)
4/3/2013 4:09:24 AM - Error bptm(pid=9694) media manager terminated by parent process
4/3/2013 4:09:28 AM - Info bpbkar(pid=0) done. status: 13: file read failed
4/3/2013 4:09:28 AM - end writing; write time: 03:06:07
file read failed(13)
4/3/2013 4:04:14 AM - Info bpbkar(pid=16939) 100000 entries sent to bpdbm
4/3/2013 4:04:42 AM - Info bpbkar(pid=16939) 105000 entries sent to bpdbm
4/3/2013 4:05:19 AM - Error bptm(pid=16950) media manager exiting because bpbrm is no longer active
4/3/2013 4:05:19 AM - Info bpbkar(pid=16939) 110000 entries sent to bpdbm
4/3/2013 4:05:20 AM - Info bptm(pid=16950) EXITING with status 174 <----------
network connection broken(40)
I suspect its something to do with bpbrm and media manager causing these errors.
In case, if its out of your scope, could you please refer someone who can actually help me out on this.
Thanks for your support by the way.
After how long do they fail?
As for "referring you" - this is an open forum - we are all here to help and do it in our own time, so hopefully someone will see this and can assist further - my only referral would be to advise you to open a support case with Symantec
Need to see the full log (please attach as a text file rather than pasting into the thread)
Hi Mark/Wiriadi Wangsa,
Sorry for the long delay in replying back to this forum post.
We are working with Symantec on this, however we have a particular client where the Incr backup fails with error 13 or 14 and Full backup goes successful without any issues.
I have checked both the below values and they are set to the max.