cancel
Showing results for 
Search instead for 
Did you mean: 

YET another NB 6.0 MP3 loss of data

Stumpr2
Level 6
Here is the scenario:
Job is writing to tape E00661 and fails. The job is resubmitted and grabs a different tape, E00800. I do not know why it wanted a different tape.
in the details below it states that it wants to position to file #25
Perhaps it is file #25 on tape E00661?

I do use multiplexing, checkpoint restarts and spanning media enabled. These problems were suppose to have been fixed in MP3???


7/08/2006 14:55:50 - requesting resource server-hcart2
07/08/2006 14:55:50 - requesting resource server.NBU_CLIENT.MAXJOBS.client
07/08/2006 14:55:50 - requesting resource server.NBU_POLICY.MAXJOBS.ops_win_monthly
07/08/2006 14:55:50 - granted resource server.NBU_CLIENT.MAXJOBS.client
07/08/2006 14:55:50 - granted resource server.NBU_POLICY.MAXJOBS.ops_win_monthly
07/08/2006 14:55:50 - granted resource E00661
07/08/2006 14:55:50 - granted resource STK9940B_00112_2
07/08/2006 14:55:50 - granted resource server-hcart2
07/08/2006 14:55:54 - connecting
07/08/2006 14:55:55 - begin writing
07/08/2006 14:55:55 - positioned E00661
07/08/2006 14:55:55 - connected; connect time: 0:00:00
07/08/2006 15:16:41 - end writing; write time: 0:20:46
network connection timed out (41)

RETRY:
-------------------
07/08/2006 15:18:30 - requesting resource server-hcart2
07/08/2006 15:18:30 - requesting resource server.NBU_CLIENT.MAXJOBS.client
07/08/2006 15:18:30 - requesting resource server.NBU_POLICY.MAXJOBS.ops_win_monthly
07/08/2006 15:18:30 - granted resource server.NBU_CLIENT.MAXJOBS.client
07/08/2006 15:18:30 - granted resource server.NBU_POLICY.MAXJOBS.ops_win_monthly
07/08/2006 15:18:30 - granted resource E00800
07/08/2006 15:18:30 - granted resource STK9940B_0113_3
07/08/2006 15:18:30 - granted resource server-hcart2
07/08/2006 15:18:33 - started process bpbrm (pid=3042)
07/08/2006 15:22:04 - Error bpbrm (pid=3042) cannot connect to client, No such file or directory (2)
07/08/2006 15:22:05 - connecting
07/08/2006 15:22:07 - mounting E00800
07/08/2006 15:22:42 - mounted E00800; mount time: 0:00:35
07/08/2006 15:22:42 - positioning E00800 to file 25
07/08/2006 15:23:30 - Warning bptm (pid=3048) cannot locate on drive index 3, locate scsi command failed (possibly a command timeout), errno = 9, Bad file number
07/08/2006 15:25:23 - Error bptm (pid=3048) ioctl (MTFSF) failed on media id E00800, drive index 3, I/O error (bptm.c.6730)
07/08/2006 15:25:23 - end writing
media position error (86)Message was edited by:
Bob Stump
9 REPLIES 9

Alasdair_McQuir
Level 4
Yep, this looks all too familiar. I have been asked to increase logging to max and send in the entire log dir, this is 40Gb, even zipped its big.

Stumpr2
Level 6
symantec referred me to Sun. Seems there may be a hardware problem with Sun branded Emulex SG-XPCI2FC-EM2.

12:13:28.080 <8> io_position_for_write: cannot locate on drive index 2, locate scsi command failed (possibly a command timeout), errno = 9, Bad file number

Stumpr2
Level 6
The fix was
http://sunsolve.sun.com/search/document.do?assetkey=1-21-120222-10-1

Stumpr2
Level 6
thanks

Stumpr2
Level 6
The solution is provided in my earlier posting in this thread.

Dennis_Strom
Level 6
Dang, I think you just fixed a problem for me....

DavidParker
Level 6
Did that apply to 5.x as well or was this merely a 6.0 'feature'?

Stumpr2
Level 6
for me it was a Solaris 10 change. The new servers came with the

Sun branded Emulex cards
instead of a
Emulex branded Emulex card

DavidParker
Level 6
Good to know!
I may be encountering Solaris boxes again in the near future.