cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup 7.1 Media error 84

MjG
Level 3

We reciently upgraded to version 7.1.  Soon after we started receiving the above error for all of our disk based backups.    The BPTM logs shows:

] <2> db_end: Need to collect reply
00:00:34.778 [6304.1220] <4> write_disk_header: Data Classification & Storage Lifecycle action 6
00:00:34.778 [6304.1220] <2> construct_sts_isid: master_server MASTERSERVER, client SERVER, backup_time 1302238821, copy_number 1, stream_number 0, fragment_number 0, resume_number 0, spl_name NULL
00:00:35.182 [6304.1220] <16> 11003:bptm:6304:: PDVFS: [1] pdvfs_lib_log: data store failed: could not spool object: try again
00:00:35.183 [6304.1220] <16> 11003:bptm:6304:: PDVFS: [1] pdvfs_lib_log: Synchronization to <SERVERNAME> failed (try again).
00:00:35.183 [6304.1220] <16> 11003:bptm:6304:: PDVFS: [1] pdvfs_release_cr_conn: ERROR CRSync failed: try again
00:00:35.183 [6304.1220] <16> 11003:bptm:6304: PDVFS: [1] pdvfs_release_cr_conn: failed, data=0000000003DB0040, cr_pool=00000000028A5AC0, try again
00:00:35.183 [6304.1220] <16> 11003:bptm:6304: PDVFS: [1] pdvfs_flushfile_norm_cleanup: pdvfs_release_cr_conn write failed 5
00:00:35.183 [6304.1220] <16> 11003:bptm:6304: PDVFS: [1] pdvfs_flushfile: during normal flushfile cleanup, errno=11
00:00:35.183 [6304.1220] <16> 11003:bptm:6304: PDVFS: [1] pdvfs_flushfile: file SERVER_1302238821_C1_HDR.info now marked CORRUPT
00:00:35.183 [6304.1220] <16> 11003:bptm:6304: PDVFS: [1] pdvfs_fcache_close: pdvfs_flushfile failed (Resource temporarily unavailable)
00:00:35.183 [6304.1220] <16> 11003:bptm:6304: PDVFS: [1] pdvfs_close: pdvfs_fcache_close failed, cache_fd=90005
00:00:35.183 [6304.1220] <16> 11003:bptm:6304: PDVFS: [1] PdvfsClose: exit fd=5 failed, res=-1 errno=11
00:00:35.183 [6304.1220] <16> 11003:bptm:6304: impl_set_imh_image_prop error (11 Resource temporarily unavailable) on close
00:00:35.190 [6304.1220] <32> bp_sts_open_image: sts_create_image failed: error 2060012
00:00:35.191 [6304.1220] <32> write_disk_header: image open failed: error 2060012:
00:00:35.207 [6304.1220] <2> delete_image_disk_sts: file SERVER_1302238821_C1_F1 does not exist
00:00:35.207 [6304.1220] <2> ConnectionCache::connectAndCache: Acquiring new connection for host MASTERSERVER, query type 161
00:00:35.218 [6304.1220] <2> vnet_pbxConnect: pbxConnectEx Succeeded
00:00:35.218 [6304.1220] <2> logconnections: BPDBM CONNECT FROM IP.61004 TO IP.1556 fd = 1608
00:00:35.268 [6304.1220] <2> db_end: Need to collect reply
00:00:35.306 [6304.1220] <2> db_end: no DONE from db_getreply(): no entity was found
00:00:35.306 [6304.1220] <2> cleanup_image_disk: image cleanup failed: 227
00:00:35.307 [6304.1220] <2> bptm: Calling tpunmount for media @aaaad

 

 

Any suggestions on how to start trying to correct this?

14 REPLIES 14

SBB
Level 3
Partner Accredited Certified

Have a look at this...

http://www.symantec.com/business/support/index?page=content&id=TECH75230&key=52672&basecat=ALERTS&actp=LIST

 

MjG
Level 3

Thanks.  I added the robust logging to the bptm logs and enabled the other logging.  It generated a large amout of logs, but nothing like what was listed in that article.  I am willing to try changing the PDDO policy, but can't figure out how to do that.

 

Here is a portion of the logs with the logging level increased.

 

07:32.772 [6300.2176] <16> 14396:bptm:6300:server: PDVFS: [1] pdvfs_release_cr_conn: ERROR CRSync failed: try again
11:07:32.772 [6300.2176] <16> 14396:bptm:6300:server: PDVFS: [1] pdvfs_release_cr_conn: failed, data=00000000039D0040, cr_pool=000000000577ECD0, try again
11:07:32.772 [6300.2176] <16> 14396:bptm:6300:server: PDVFS: [1] pdvfs_flushfile_norm_cleanup: pdvfs_release_cr_conn write failed 5
11:07:32.772 [6300.2176] <16> 14396:bptm:6300:server: PDVFS: [1] pdvfs_flushfile: during normal flushfile cleanup, errno=11
11:07:32.772 [6300.2176] <16> 14396:bptm:6300:server: PDVFS: [1] pdvfs_flushfile: file server_1302797200_C1_HDR.info now marked CORRUPT
11:07:32.772 [6300.2176] <2> 14396:bptm:6300:server: PDVFS: [4] pdvfs_flushfile: complete, errno=11
11:07:32.772 [6300.2176] <16> 14396:bptm:6300:server: PDVFS: [1] pdvfs_fcache_close: pdvfs_flushfile failed (Resource temporarily unavailable)
11:07:32.772 [6300.2176] <4> 14396:bptm:6300:server: PDVFS: [3] pdvfs_unlink: unlink corrupt file server_1302797200_C1_HDR.info
11:07:32.772 [6300.2176] <16> 14396:bptm:6300:server: PDVFS: [1] pdvfs_close: pdvfs_fcache_close failed, cache_fd=90005
11:07:32.772 [6300.2176] <2> 14396:bptm:6300:server: PDVFS: [4] pdvfs_close: exit fd=5 res=-1 inuse=3
11:07:32.772 [6300.2176] <16> 14396:bptm:6300:server: PDVFS: [1] PdvfsClose: exit fd=5 failed, res=-1 errno=11
11:07:32.772 [6300.2176] <16> 14396:bptm:6300:server: impl_set_imh_image_prop error (11 Resource temporarily unavailable) on close
11:07:32.772 [6300.2176] <2> 14396:bptm:6300:server: impl_set_imh_image_prop cleanup server_1302797200_C1_HDR
11:07:32.773 [6300.2176] <4> 14396:bptm:6300:server: PDVFS: [3] PdvfsUnlink: enter /server#1/2/server/SarpyBackupserverFileSys/server_1302797200_C1_HDR.info
11:07:32.773 [6300.2176] <4> 14396:bptm:6300:server: PDVFS: [3] find_entry: ENOENT: no match for (server_1302797200_C1_HDR.info)
11:07:32.773 [6300.2176] <4> 14396:bptm:6300:server: PDVFS: [3] PdvfsUnlink: exit /server#1/2/server/SarpyBackupserverFileSys/server_1302797200_C1_HDR.info failed (No such file or directory)
11:07:32.773 [6300.2176] <4> 14396:bptm:6300:server: PDVFS: [3] PdvfsUnlink: exit /server#1/2/server/SarpyBackupserverFileSys/server_1302797200_C1_HDR.info completed
11:07:32.773 [6300.2176] <4> 14396:bptm:6300:server: PDVFS: [3] PdvfsUnlink: enter /server#1/2/server/SarpyBackupserverFileSys/server_1302797200_C1_HDR.img
11:07:32.774 [6300.2176] <4> 14396:bptm:6300:server: PDVFS: [3] pdvfs_unlink: file server_1302797200_C1_HDR.img still in use, refcnt=1
11:07:32.774 [6300.2176] <4> 14396:bptm:6300:server: PDVFS: [3] PdvfsUnlink: exit /server#1/2/server/SarpyBackupserverFileSys/server_1302797200_C1_HDR.img completed
11:07:32.774 [6300.2176] <2> 14396:bptm:6300:server: impl_set_imh_image_prop exit (2060012:call should be repeated)
11:07:32.774 [6300.2176] <2> 14396:bptm:6300:server: release_svh nhandle=2--
11:07:32.774 [6300.2176] <2> 14396:bptm:6300:server: impl_image_handle exit (2060012:call should be repeated)
11:07:32.774 [6300.2176] <2> 14396:bptm:6300:server: impl_create_image exit (2060012:call should be repeated)
11:07:32.774 [6300.2176] <2> 14396:bptm:6300:server: pi_create_image_v10 exit (2060012:call should be repeated)
11:07:32.774 [6300.2176] <32> bp_sts_open_image: sts_create_image failed: error 2060012
11:07:32.775 [6300.2176] <2> set_job_details: Tfile (14396): LOG 1302797252 32 bptm 6300 image open failed: error 2060012: call should be repeated

11:07:32.775 [6300.2176] <32> write_disk_header: image open failed: error 2060012:

Marianne
Level 6
Partner    VIP    Accredited Certified

Which NBU version did you upgrade from?

What is the PureDisk version?

MjG
Level 3

We upgraded from 7.01 to 7.1.   It worked great for a couple of weeks and then the errors started late last week.

Marianne
Level 6
Partner    VIP    Accredited Certified

Did you perhaps have this 7.0.1 EEB's installed ?

http://www.symantec.com/docs/TECH153190

extract:

This EEB includes some late fixes that are not included in the upcoming
NetBackup 7.1 release.  These late fixes will be delivered as an EEB bundle, to
be installed on top of NetBackup 7.1, soon after NetBackup 7.1 is released.  If
you install this EEB, please watch for Symantec to release
the NetBackup 7.1 EEB bundle, and install the NetBackup 7.1 EEB bundle to avoid
regressions after you upgrade to NetBackup 7.1.

MjG
Level 3

Good question. I looked at the install files we used to update to version 7.01.  eebinstaller.2106924.5.x86.exe.    I take it we are stuck until they come out with the version for 7.1?  I have a case open (opened on Friday) and I'm still waiting on a call back from support..

Marianne
Level 6
Partner    VIP    Accredited Certified

Hopefully you had a call back by now? It's been a week since you opened the case...

If not, phone in and raise the severity.

Please let us know!

MjG
Level 3

Just received my initial call back last night after hours.  6 day wait with a serverity of Critical.  I'll post the solution when I get one.

RonCaplinger
Level 6

We have been having problems getting calls back for over a year now.  Here's my advice (if you aren't already doing this):

Make sure you are calling these tickets in, not opening them through the Symantec website. 

And when you call it in, ask when you should expect a call back. 

Give them exactly that much time; if they have not called you back yet, then call back and ask for the Duty Manager and get the case escalated.

 

This should improve your response times from Symantec.

AAlmroth
Level 6
Partner Accredited

There is now an EEB for NBU 7.1 adressing the same issues/other issues available in the EEB already proposed for NBU 7.0.1 in previous reply.

However, the EEB primarily fixes MSDP (PDDE) issues, and not so much issues with communication with a PureDisk pool (PDDO).

Make sure you are on PD 6.6.1.2+EEB4rollup1+EEB06 on your PD appliance.

 

/A

 

AAlmroth
Level 6
Partner Accredited

If you are on Basic support, forget about them meeting any deadlines set out in the support contract.

If you are on Essential support, you may have a call back within a day or two (severity 2 - Critical). Still not meeting the deadlines.

If you are on Business Critical, then there is a complete different story, but still, callback is not always meeting the deadlines set out in the contract.

 

As Ron mention, the only way is to call them as soon as the deadline is up. Continue to call them, escalating with duty manager, may or may not improve your chances. Your other option is to contact your account manager at partner or Symantec, so they can escalate as per procedure internally. Your third option is to go with a Symantec support partner upon next renewal. It may or may not improve the callback time.

I find that for many of the cases that I open concerning MSDP or PureDisk is sometimes a 2-3 days delayed before initial contact, or actually communication with someone that does know enough about PureDisk. I have perhaps 8-10 cases per month on behalf of customers.

On the other hand, calling in issues on NBU core/app/db is a complete different story; much better response time, so much more positive story!

Unfortunately, based on their previous experience, many of my customers see support contract just as an expensive way to get "free" upgrades. At least they have set the expectation level right then perhaps...

Sad story.

 

/A

MjG
Level 3

Thanks for the tips on support.  We did call our account manager to help escalate the case, but still had a 6 day wait with  Essential support.  I did express my disappointment in the survey for what good that will do. 

 

With version 7.01, we were able to run our Disk Pool up to about 95% (our set high water mark) without any issues.  It appears that with version 7.1, we can only run our Disk Pool up to 89% before it quits working.  The solution was to expire some old jobs and free up some space.  We expired enough to get to 85% free.

I asked if this was a "feature" that was to be fixed in the EEB fixes for 7.1.  I was told that it has always been this way and they closed the case without responding to my comment about it working previously in version 7.01. 

.

saltbartechnolo
Level 2

Have the exact same situation and spent most of the afternoon searching for the cause of this error, I suspected space as we recently hit the 30TB threshold but all the verbose logging sent me on a wild goose chase.

saltbartechnolo
Level 2

FYI,

After much hassle with this issue it seems we have got on top of it, I am not sure it exactly matches the case at the top of this thread, but the solution may be the same. The space issue was caused by the processqueue (MSDP server ..\pdde> crcontrol --queueinfo) not running, and the queue effectively cleans up the expired backup jobs and reclaims your space.

This is due to a bug in 7.1 ( http://www.symantec.com/business/support/index?page=content&id=TECH153084 ) and has now seemingly been resolved in 7.1.0.1, we can see that since applying the patch the queue is processing but space has not yet started reclaiming as I think it first needs to get through all of the queue items (1.35 trillion files !) we're now at 800 million after about 6 days and expect to be there in the next few days, what happens after that we're not quite sure .. and guess what, I can't get hold of Symantec .. had a great conversation with their engineeering department a while back, and since then haven't been able to get through to them again or get a call back. Still at least they released the patch for 7.1 as promised and only 1 day late.

jt