I hope this is the appropriate place to post this information - my apologies if not.
I work day in and day out with NetBackup. After a 6.5.4 upgrade, we started to experience errors when trying to restore data from a media server that acts as a client for NFS mounted data (this media server is a client for approximately 20-30 policies). Restores would produce a "file read failed" error when attempted via this client. I went through many Symantec technotes, configuration guidelines, and logs. The environment had only been upgraded approximately 1 month ago. All images pre-upgrade and post upgrade produced the same error while attempting a restore for this client (file read failed).
To save everyone much anguish I thought I would post the root cause of the problem. Assuming we had no catalog inconsistencies (who would have thought after cleaning the catalog pre-upgrade) I discovered, with Symantec's help, that there were bad image headers in the /usr/openv/netbackup/db/images/[media_server] directory!!! After removing those bad image headers, I attempted the restore and "bingo", it worked like a charm! So the media server/client in question that was producing "file read failed" error upon restore, contained bad image headers!
It's funny how we found the bad image headers. We ran an NBCC to ensure the catalog was in a consistent state. When running NBCCR to repair inconsistencies, the NBCCR executable would fail outright. Symantec explained this was because of bad image headers. I couldn't understand how, after just going through an upgrade and having the catalog in a consistent state prior to the upgrade, we could have bad image headers. The bad image headers where from legacy images dated years back.
I have posed the question to Symantec why NBCCR did not pick up these bad image headers when cleaning the catalog at version 5.1MP6. I'm no NBU expert but thought I would point out a shortcoming of the NBCCR repair with 5.x versus 6.5. Maybe, I'm a little misled and should know this but regardless, I thought I would share this tidbit with everyone so they don't spend the time I did looking for a solution.
I inherrited a v5.1 environment a few years back... It took me a while to find out why the catalog was so big. Multiple runs of NBCC never found a problem. In the end I had to write my own tool to compare the contents of a bpimagelist versus the contents of the /db/images folder tree. I found 35 GB of orphaned partial image records - which equated to 15% of the entire catalog. It took an absolute age to get Symantec Support to agree with my findings. And even longer for them to confirm that the orphanned data was safe to "move-out". I didn't just delete the data, I got the script to rename all the orphaned data out to another folder structure. And then saved the orphaned data folder tree to tape before deleting it.
It just goes to show that NBCC doesn't find all the issues.