Backup failed with error code 24 !!!!

V_P_S · ‎10-01-2012

Hello Marianne / All

In my environment, on the weekends i found that lots of backups failed with error code 24 (Unix + Windows) both and on weekdays i saw verymuch less. Can you please help me out that how i resolve this..

Please share your thoughts..

revarooo · ‎10-01-2012

Check your network. Try and stagger when the backups run.

NetBackup doesn't decide to have a problem on the weekend but not in the week, so the issue is a resource one, like I mentioned probably the network load.

Marianne · ‎10-01-2012

Please mention NBU details in all new posts?

Can we assume that master is the same as in your previous post?

Master server NBU version:6.5.4

Master OS: SunOS 5.10

Not upgraded yet with NBU 6.x support reaching EOSL in two days' time?

You need to pinpoint the problem and start troubleshooting there.
We do not have enough information to determine if problem is with media server or with clients.
Is any data transferred before backup fails?
Do you have all relevant logs on media server and client(s)? (bpbrm and bptm on media server and bpbkar on client(s))

Is master server also media server?

Are OS patches up-to-date on master and/or media server?

Firewall anywhere in the picture?

We know for sure that status 24 is hardly ever a NetBackup issue.
Logs will help to point you in the right direction.

I remember how we used to see 24 on certain Linux clients at a particular customer all the time.
No matter how we tried, we could not determine the cause.
All that was done was to enable Checkpoint Restart and backups were restarted over and over until it was eventually completed...
Machines were old and must have displayed 'issues' in other areas too. Customer decided to replace old Linux machines with new hardware, newer OS, etc. Same NBU software as before... No more status 24's....

We also know that TCP Chimney on W2003 machines cause status 24's - especially during high I/O.

So - no magic fix for status 24.
NBU is reporting the error... not causing it.

Handy NetBackup Links

V_P_S · ‎10-01-2012

Hi Revaroo,

Yes , might be there is a network Load. So in that case what to do?

Marianne · ‎10-01-2012

Extract from Revaroo's post:

Try and stagger when the backups run.

Handy NetBackup Links

V_P_S · ‎10-01-2012

Hello marianne.

Clients which failed on weekends having same full schedule.

Netbackup master version: VERSION NetBackup 7.1.0.2

OS version: Solaris, Solaris10

Media server NBU : 7.0

Media server OS : RS6000, AIX53

Some of backups going through media server and some through master server.

I have only one doubt that these error come on weekends more instead of weekdays.

Anonymous · ‎10-01-2012

Increase the retry number on these backups. I have had a status 24 for couple of years and it backups, it doesnt! The retry catches it.

No amount of searching has fixed it. As OS issue. Not Netbackup, as others have commented.

Socket issues.

V_P_S · ‎10-01-2012

Hello Stuart.

Backup is not go into incomplete state, it directly fails..

revarooo · ‎10-01-2012

Does it faily immediately? Let's see the contents of the Detailed Status from the job.

V_P_S · ‎10-01-2012

Hi Revaroo

No, its not fail immediatel, failing after transferring some data.

10/01/2012 16:09:54 - Info nbjm (pid=28941) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=835244, request id:{56639030-0BB4-11E2-88BE-002128055BC4})
10/01/2012 16:09:54 - requesting resource Storage_Unit_Group_Vdrives_LoadBalancing
10/01/2012 16:09:54 - requesting resource Master server.NBU_CLIENT.MAXJOBS.Client
10/01/2012 16:09:54 - requesting resource Master server.NBU_POLICY.MAXJOBS.<Policy name>
10/01/2012 16:09:56 - granted resource Master server.NBU_CLIENT.MAXJOBS.Client
10/01/2012 16:09:56 - granted resource Master server.NBU_POLICY.MAXJOBS.Policy name
10/01/2012 16:09:56 - granted resource VT0037
10/01/2012 16:09:56 - granted resource VLS03_drive17
10/01/2012 16:09:56 - granted resource Master server-hcart-robot-tld-7-VLS
10/01/2012 16:09:56 - estimated 18700884 kbytes needed
10/01/2012 16:09:56 - Info nbjm (pid=28941) resumed backup job for client Client, policy Policy name, schedule Full on storage unit tus3bkpnbupin01-hcart-robot-tld-7-VLS
10/01/2012 16:10:00 - connecting
10/01/2012 16:10:01 - connected; connect time: 0:00:00
10/01/2012 16:10:01 - begin writing
10/01/2012 17:14:20 - Error bpbrm (pid=17158) from client tus3crsapppeb60: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
10/01/2012 17:14:21 - Info bpbrm (pid=14663) media manager for backup id tus3crsapppeb60_1349065699 exited with status 150: termination requested by administrator
10/01/2012 17:14:21 - end writing; write time: 1:04:20
socket write failed (24)

revarooo · ‎10-01-2012

So it looks like it's doing some backups then times out mid-way. Guarantee this is a network/load issue if it's fine in the week.

The date you have here though is today, Monday 1st Oct

You got the bpbrm log from the media server and bpbkar log from the client ? If so post them up.

Marianne · ‎10-01-2012

Do you have Checkpoint Restart enabled in Policy attributes?

Handy NetBackup Links

V_P_S · ‎10-01-2012

Hello Revaroo.

Mostly failed on weekends and completed on weekdays.

Yes Marianne, Checkpoint Restart is enabled

mph999 · ‎10-01-2012

Well I'll guess that weekends is when the fulls run - therefore network load issue.

Try doing 1/7 of your fulls on mon, 1/7 on tue, 1/7 on wed etc ...

Status 24 is 99% not netbackup - very rare for NBU to be the cause, and in 5 years, I've never seen it.

Martin

V_P_S · ‎10-02-2012

Hello mph.

Will try as said by you and update the same.

mph999 · ‎10-02-2012

Excellent.

Sometimes people forget that a backup environment does have a limit as to what it can do ...

M

Mark_Solutions · ‎10-02-2012

How is your VTL attached?

The 10054 is a network issue so could be an overload, client read timeout or a port usage issue (number of ports available / keep_alive etc.)

Take a look at your backups reports for a weekend - total up how much data you back up (at least when they all work) and then work out the maths based on what your network bandwidth is to see how / if you have outgrown your system and need to add extra media servers.

Always best to keep the Master as just a Master as it can use up all available ports - try a netstat during a busy backup period to see the state of your port usage

Hope this helps

V_P_S · ‎10-04-2012

Heel All.

Thanks Much for all your support...

I talked with my project manager regarding this issue that backups failed on weekend and completed on weekdays..and ask him to change the full backup schedule.. but he told me that firtst justify me the things but he does not told me what things to justify asusual.

Can you please explain me that what i have to justify in front of my manager.

mph999 · ‎10-04-2012

He is asking you to justify why the change may make a differnce.

The answer.

If the network is overloaded, spreading out the load over the different days will reduce it.

Martin

rajeshthink · ‎10-10-2012

Did we use the amazing SAS utility , which will give us right problem if its a nextwork problem.

you need to run it from master to client and vice versa and then give the file to symantec tech support which will revert back with there anylising and output on where the network lacks.

also did we check if we are getting 24 for all the client in one backup policy .

VOX

Backup failed with error code 24 !!!!