cancel
Showing results for 
Search instead for 
Did you mean: 

Duplication is falining with 191(Broken pipe)

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

Hello All,

 

I have a solaris 10 media server with the NBU Version 6.5.6 having the Disk storage unit.. and the ACS tape drives.. initically we take the backup to the disk then do the duplication to the ACS tapes...

I have been getting the duplicatoin faillures.. when tryinging to duplicate the date...

Error :-cannot write data to socket, Broken pipe
 

but when I duplicate the images using the another media server drives.. its getting successfull..

I have no clue what needs to be done on this..

 

Detail status:-

 

Jul 24, 2011 2:14:24 PM - started process bptm (pid=8226)
Jul 24, 2011 2:14:30 PM - started process bpdm (pid=8252)
Jul 24, 2011 2:14:21 PM - Waiting for scan drive stop L3_0_7_1_6_SL85_02, Media server: wspedb52-ebr
Jul 24, 2011 2:14:22 PM - granted resource  WPL423
Jul 24, 2011 2:14:22 PM - granted resource  L3_0_7_1_6_SL85_02
Jul 24, 2011 2:14:22 PM - granted resource  LTO3_STU
Jul 24, 2011 2:14:30 PM - started process bptm (pid=8226)
Jul 24, 2011 2:14:30 PM - mounting WPL423
Jul 24, 2011 2:14:33 PM - begin reading
Jul 24, 2011 2:23:15 PM - Error bptm (pid=8226) media manager terminated by parent process
Jul 24, 2011 2:23:15 PM - Error bpdm (pid=8284) cannot write data to socket, Broken pipe
Jul 24, 2011 2:24:22 PM - Error bpduplicate (pid=28794) host p1tvt1d2-ebr backup id p1tvt1d2-ebr_1309498654 read failed, socket write failed (24).
Jul 24, 2011 2:24:22 PM - Error bpduplicate (pid=28794) host wspedb52-ebr backupid p1tvt1d2-ebr_1309498654 write failed, media manager killed by signal (82).
Jul 24, 2011 2:24:22 PM - Error bpduplicate (pid=28794) Duplicate of backupid p1tvt1d2-ebr_1309498654 failed, media manager killed by signal (82).
Jul 24, 2011 2:24:23 PM - started process bptm (pid=10305)
Jul 24, 2011 2:24:23 PM - requesting resource LTO3_STU
Jul 24, 2011 2:29:39 PM - started process bpdm (pid=11902)
Jul 24, 2011 2:29:34 PM - granted resource  WPL423
Jul 24, 2011 2:29:34 PM - granted resource  L3_0_7_1_6_SL85_02
Jul 24, 2011 2:29:34 PM - granted resource  LTO3_STU
Jul 24, 2011 2:29:42 PM - begin reading
Jul 24, 2011 2:33:07 PM - Error bptm (pid=10305) media manager terminated by parent process
Jul 24, 2011 2:33:07 PM - Error bpdm (pid=11927) cannot write data to socket, Broken pipe
Jul 24, 2011 2:33:41 PM - Error bpduplicate (pid=28794) host p1tvt1d2-ebr backup id p1tvt1d2-ebr_1309498180 read failed, socket write failed (24).
Jul 24, 2011 2:33:41 PM - Error bpduplicate (pid=28794) host wspedb52-ebr backupid p1tvt1d2-ebr_1309498180 write failed, media manager killed by signal (82).
Jul 24, 2011 2:33:42 PM - Error bpduplicate (pid=28794) Duplicate of backupid p1tvt1d2-ebr_1309498180 failed, media manager killed by signal (82).
Jul 24, 2011 2:33:43 PM - started process bptm (pid=12967)
Jul 24, 2011 2:33:43 PM - requesting resource LTO3_STU
Jul 24, 2011 2:37:47 PM - granted resource  WPL423
Jul 24, 2011 2:37:47 PM - granted resource  L3_0_7_1_6_SL85_02
Jul 24, 2011 2:37:47 PM - granted resource  LTO3_STU
Jul 24, 2011 2:37:52 PM - started process bpdm (pid=13988)
Jul 24, 2011 2:37:55 PM - begin reading
Jul 24, 2011 2:43:08 PM - Error bptm (pid=12967) media manager terminated by parent process
Jul 24, 2011 2:43:08 PM - Error bpdm (pid=14014) cannot write data to socket, Broken pipe
Jul 24, 2011 2:43:54 PM - Error bpduplicate (pid=28794) host p1tvt1d2-ebr backup id p1tvt1d2-ebr_1309497937 read failed, socket write failed (24).
Jul 24, 2011 2:43:54 PM - Error bpduplicate (pid=28794) host wspedb52-ebr backupid p1tvt1d2-ebr_1309497937 write failed, media manager killed by signal (82).
Jul 24, 2011 2:43:55 PM - Error bpduplicate (pid=28794) Duplicate of backupid p1tvt1d2-ebr_1309497937 failed, media manager killed by signal (82).
Jul 24, 2011 2:43:55 PM - started process bptm (pid=15856)
Jul 24, 2011 2:43:56 PM - requesting resource LTO3_STU
Jul 24, 2011 2:50:28 PM - granted resource  WPL423
Jul 24, 2011 2:50:28 PM - granted resource  L3_0_7_1_6_SL85_02
Jul 24, 2011 2:50:28 PM - granted resource  LTO3_STU
Jul 24, 2011 2:50:33 PM - started process bpdm (pid=17322)
Jul 24, 2011 2:50:36 PM - begin reading
Jul 24, 2011 2:52:49 PM - Error bptm (pid=15856) media manager terminated by parent process
Jul 24, 2011 2:53:20 PM - Error bpdm (pid=17358) cannot write data to socket, Broken pipe
Jul 24, 2011 2:53:20 PM - Error bpdm (pid=17322) media manager terminated by parent process
Jul 24, 2011 2:53:21 PM - Error bpduplicate (pid=28794) host p1tvt1d2-ebr backup id p1tvt1d2-ebr_1309497100 read failed, media manager killed by signal (82).
Jul 24, 2011 2:53:21 PM - Error bpduplicate (pid=28794) host wspedb52-ebr backupid p1tvt1d2-ebr_1309497100 write failed, media manager killed by signal (82).
Jul 24, 2011 2:53:21 PM - Error bpduplicate (pid=28794) Duplicate of backupid p1tvt1d2-ebr_1309497100 failed, media manager killed by signal (82).
Jul 24, 2011 2:53:21 PM - Error bpduplicate (pid=28794) Status = no images were successfully processed.
Jul 24, 2011 2:53:22 PM - end Duplicate; elapsed time 0:48:21
no images were successfully processed  (191)
 

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

OK, take a look at this technote :

http://www.symantec.com/docs/TECH34183

The example here is Solaris, so this should be fine for you.

I am pretty sure this is not a NetBackup issue, and you may wish to speak with you sys admins regarding 'network' related tuning, of the technote does not resolve the issue.

You should also read through the performance and tuning guide and apply any recommened settings that match your server.

http://entsupport.symantec.com/docs/307083

Many thanks,

Martin

View solution in original post

16 REPLIES 16

Nicolai
Moderator
Moderator
Partner    VIP   

How do you duplicate the images SLP, DSSU, vault ?

I think that both the "read failed, socket write failed" and "cannot write data to socket, Broken pipe" are indication of a underlying issue. But without knowing how images are duplicated is's difficult to provide some directions.

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

we generally do the duplication using the scripts wich starts from the cron.. it alwasy looks for the images.. only having the copy 1. then do the bpduplicate command for those images to duplicate it..

and the about log that i provided in my first post was form the job that I manualy fired from the GUI catalog...

this media server never had a  successfull duplication jobs...

Thank you very much for reply.... :)

Nicolai
Moderator
Moderator
Partner    VIP   

Put VERBOSE=5 in the bp.conf on the master and media server. Create the "admin" directory in /usr/openv/nertbackup/logs on the master and the bptm directory in the same location on the media server.

Retry the duplication process. Look for clues in the admin log and bptm log on the media server.

Lines with  <16> and <32> are the interesting ones.

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

below is the log i find in admin in master server ... when verbose is 5  and i did not find any error in media server admin log..


12:23:59.159 [12740] <16> emmlib_GetHost: (0) QueryMachine failed, emmError = 2000000, nbError = 0
12:23:59.159 [12740] <16> bpduplicate: emmlib_GetHost(CEMM_NBU_MEDIA) failed: 2000000
1
12:34:16.123 [12740] <16> bpduplicate: host p1tvt1d2-ebr backup id p1tvt1d2-ebr_1310185896 read failed, socket write failed (24).
12:34:16.124 [12740] <16> bpduplicate: host wspedb52-ebr backupid p1tvt1d2-ebr_1310185896 write failed, media manager killed by signal (82).
12:34:16.579 [12740] <16> bpduplicate: Duplicate of backupid p1tvt1d2-ebr_1310185896 failed, media manager killed by signal (82).
12:34:16.581 [12740] <16> bpduplicate: Status = no images were successfully processed.
12:34:17.222 [12740] <16> free_allocated_resources: Free resource allocations failed.
 

Detail log of the master admin log is attached....

Please suggest...

Nicolai
Moderator
Moderator
Partner    VIP   

Next place to look is the bpbrm on the media server.

Is this media server seperated by a firewall ?. Can you from the master server perform a "vmoprcmd -h {mediaservername}

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

media server is not in firewall.....and i can do the vmoprcmd from the master serve.....and media server status in EMM is active for tape and disk jobs.. i will have a look into bpbrm logs of media server....

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

bpbrm has nothing .. it is not even creating log.....

currently this media server is not using for backups.. so even there is no log file created for bpbrm..

but i find the logs in bptm and bpdm..

Please check.. if they can help us..

and one more thing...the dupcliation is getting failed.. after it has writing some data...and the data size is differnet each time.. and the backup images that is processing also different each time...

mph999
Level 6
Employee Accredited

Nicolai as always has made some excellent suggestions.

The admin log shows this :

 

12:23:59.159 [12740] <16> emmlib_GetHost: (0) QueryMachine failed, emmError = 2000000, nbError = 0
12:23:59.159 [12740] <16> bpduplicate: emmlib_GetHost(CEMM_NBU_MEDIA) failed: 2000000
1

.. which happens before the status 24.

12:34:16.123 [12740] <16> bpduplicate: host p1tvt1d2-ebr backup id p1tvt1d2-ebr_1310185896 read failed, socket write failed (24).

On this media server, what does this command show :

nbemmcmd -listhosts

Is this the media server having issues, CEMM_NBU_MEDIA  ???

It is odd, as it has problems, but does at least try to start.  

Is the media server at the same version as the master ?

Martin

mph999
Level 6
Employee Accredited

Make sure that in /usr/openv/volmgr there is only one media server listed for this line :

MM_SERVER_NAME = <media server name>

If this line is not in the file, please add it.

Restart mm services like this (commands are in volmgr/bin)

stopltid

ltid -v 

Also, my previous post, command should have been 

nbemmcmd -listhosts -verbose

 

Regards,

Martin

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

I have updated the vm.conf with the entry MM_SERVER_NAME = and did the test duplication. ..but it got failed again..

both master and media servers are in same version...

nbemmcmd -listhosts -verbose command output:-

wspedb52-ebr
        ClusterName = ""
        MachineName = "wspedb52-ebr"
        FQName = "wspedb52-ebr.edc.cingular.net"
        LocalDriveSeed = ""
        MachineDescription = ""
        MachineFlags = 0xf7
        MachineNbuType = media (1)
        MachineState = active for tape and disk jobs (14)
        MasterServerName = "wspebrms01-ebr"
        NetBackupVersion = 6.5.6.0 (656000)
        OperatingSystem = solaris (2)
        ScanAbility = 5.


I am not sure with is CEMM_NBU_MEDIA.. any info about this is greatly appreciated.....

mph999
Level 6
Employee Accredited

hmm, in your nbemmcmd output, you did post all the output into this thread ???

because, you appear to be missing a media server, unless I am mistaken.

Lets, double check, just post nbemmcmd -listhosts

thanks 

 

Martin

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

i thought its only for the media server....i am attaching the total outputs now... thank you.....

edit"the attachment name has 54...but it has from correct media server only..."done

mph999
Level 6
Employee Accredited

My error CEMM_NBU_MEDIA. is an internal label, not a media server .. sorry, I didn't look at the wntire admin log previously, looking at it now.

 

mph999
Level 6
Employee Accredited

I see this in the admin log 

 

 

12:16:08.514 [12740] <2> vnet_cached_gethostbyname: vnet_hosts.c.422: gethostbyname failed: gaalpa2digmsdc2-ebr
12:16:08.514 [12740] <2> vnet_cached_gethostbyname: vnet_hosts.c.436: Function failed: 6 0x00000006
12:16:08.563 [12740] <2> vnet_cached_gethostbyname: vnet_hosts.c.421: h_errno: 1 0x00000001
12:16:08.563 [12740] <2> vnet_cached_gethostbyname: vnet_hosts.c.422: gethostbyname failed: gaalpa2digmsdc3-ebr

It might not be the cause, but is a name resolution issue that should not be there.  Can you ensure that the master can resolve all the configured nic ports on the media.

This :
12:23:59.159 [12740] <16> emmlib_GetHost: (0) QueryMachine failed, emmError = 2000000, nbError = 0
12:23:59.159 [12740] <16> bpduplicate: emmlib_GetHost(CEMM_NBU_MEDIA) failed: 2000000

Is similar to a previus bug, but that sould have been fixed by now.  The bptm doesn't show the symptoms of the previus bug, so we'll discount that for now.

I need to x2 check (long day), but I think for a duplication the parent job of bptm is nbjm.  I'm thinking we might need to grab the nbjm log, admin and bptm log that cover the same duplication job.  At the moment, I'm looking in the bptm log from one time, and the admin log from another - you cannot troubleshhot like this, the logs need to match. This is not a complaint, as you have just posted up some logs for info, but to work through logs, they need to be for the same job/ timeframes.

We might also increase the library/ tao logs, to get more network information, but, all these suggestions may not lead to much - often with status 24 the logs get as good as this :

read failed, socket write failed (24)If the network suddenly has an issue - NetBackup cannot tell why, it just realises that it can't write anymore.

I think I'll run a duplication test - I'll try tomorrow if I can, and just remind myself what is in each log.

It could be worth trying a backup on the media server (just set up a test job backing up itself) this would be interesting to see.

Having looked at this, I'm not convinced the EMM 'error' is relevant to this failure, as, as pointed out data does start to write.

Martin

 

mph999
Level 6
Employee Accredited

OK, take a look at this technote :

http://www.symantec.com/docs/TECH34183

The example here is Solaris, so this should be fine for you.

I am pretty sure this is not a NetBackup issue, and you may wish to speak with you sys admins regarding 'network' related tuning, of the technote does not resolve the issue.

You should also read through the performance and tuning guide and apply any recommened settings that match your server.

http://entsupport.symantec.com/docs/307083

Many thanks,

Martin

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

I will have a look into the above tech Notes.

and comming to.. I can perform the backups on this tapes... fine.. there is no failurs .. when backups are running.. only i am getting this issue.. when duplications are running...

i am sorry.. I would have attached the same time logs... in my previous post...

now i am doing that... Please have look when you have some time...

and i will try to check this with SA even...

Thank you very much for all your effortes on this..