TLDD spitting out errors:open failed in io_open

jim_dalton · ‎11-23-2010

Solaris10 media and mastrer, netb 7.0.1, lto4 in sl500 robot, sso with 4 other sol media servers (running 6.5.4) plus two ndmp hosts.

Ive turned on logging in vm.conf to investigate an issue with 7.0.1 and have noticed other errors are appearing:

Nov 23 14:00:44 wunxxx tldcd[13044]: [ID 616320 daemon.notice] TLD(0) initiati
ng MOVE_MEDIUM from addr 1154 to addr 507
Nov 23 14:00:50 wunxxx ltid[25290]: [ID 527590 daemon.notice] LTID - Sent ROBO
TIC request, Type=1, Param2=1
Nov 23 14:00:50 wunxxx tldd[25329]: [ID 789709 daemon.notice] TLD(0) MountTape
004074 on drive 9, from slot 152
Nov 23 14:00:50 wunxxx tldd[13094]: [ID 302958 daemon.notice] TLD(0) open fail
ed in io_open, I/O error
Nov 23 14:00:50 wunxxx tldd[13094]: [ID 320576 daemon.notice] TLD(0) unload==T
RUE, but no unload, drive 9 (device 0)
Nov 23 14:00:50 wunxxx tldcd[25405]: [ID 871790 daemon.notice] Processing MOUN
T, TLD(0) drive 9, slot 152, barcode 004074 , vsn 004074

This is NOT benign, as reported in some other forum questions. Theres literally hundreds of these errors, it can happen any time of day, to any drive (to all drives in fact) and to any tape.

This is a problem when you have restricted active media in a pool because if one wont mount, Netb still honours maxlive count and then you find that instead of writing to 3 media, suddenly youre down to 2. The tape remains active but is simply ignored. Something very dodgy going on.

For a good long while now my three media have been consistently written to without faltering.

Sadly I cant check to see if the upgrade to 7 was the source because I didnt have logging on in 6.5.4 so the messages (probably) wont exist.

On reflection...if we ignore all these io_open errors, the question remains: why does Netbackup not select my media for writing...its active, writeable, in the right pool, in the robot, has no media errors, not full, was written to yesterday, visual inspection shows nothing unusual... errr what else can there be? Maybe Netbackup can no longer count when it comes to active media in pool?

Your input appreciated,Jim.

Andy_Welburn · ‎11-23-2010

Are you sure? Looking at your output it initiated the mount request for tape 004074 on drive 9 from slot 152, posted the 'error' & the 'unload' message then processed the mount request. Maybe there is another issue causing the unavailability of media?

"Theres literally hundreds of these errors, it can happen any time of day, to any drive (to all drives in fact) and to any tape." is that merely a consequence of the SSO scan-host polling the drives?

About scan hosts on UNIX/Linux
http://www.symantec.com/business/support/index?page=content&id=HOWTO32788

jim_dalton · ‎11-24-2010

The odd things is...its fixed itself. Thats to say the one 'problem' tape that was in the pool has expired, another one from scratch has been put in and now its writing to three tapes like it should be. Very peculiar. So AW...there clearly was another issue causing unavailability of media.

Jim

Andy_Welburn · ‎11-24-2010

Just like these things to be inconsiderate & fix themselves before you get to find out the cause!

Glad it all seems ok now. Let's just hope whatever it was doesn't recur & if it does it stays broken until you fix it!

VOX

TLDD spitting out errors:open failed in io_open