cancel
Showing results for 
Search instead for 
Did you mean: 

Existing with error 84 - Freezing Media (Backup to tape)

TsAdmin
Level 3

Hi,

This problem has given us headache as our production backup is keep on failing for months already. already contacted Veritas support but seems the issue still cannot been solve. please anyone can help us to rectify this matter for once and forever. below are the backup status:-

01/23/2018 01:50:55 - Info nbjm (pid=816) starting backup job (jobid=2844) for client dbsvr01, policy oracle_dbbackup_daily_dbsvr01, schedule daily_backup
01/23/2018 01:50:55 - Info nbjm (pid=816) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=2844, request id:{87F95EE7-34AA-4B2A-9A78-757CAA3E2CF3})
01/23/2018 01:50:55 - requesting resource dbsvr01-hcart-robot-tld-0
01/23/2018 01:50:55 - requesting resource bcksvr01.net.NBU_CLIENT.MAXJOBS.dbsvr01
01/23/2018 01:50:55 - requesting resource bcksvr01.net.my.NBU_POLICY.MAXJOBS.oracle_dbbackup_daily_dbsvr01
01/23/2018 01:50:56 - Waiting for scan drive stop HP.ULTRIUM7-SCSI.003, Media server: dbsvr01.net.
01/23/2018 01:50:58 - granted resource bcksvr01.net.NBU_CLIENT.MAXJOBS.dbsvr01
01/23/2018 01:50:58 - granted resource bcksvr01.net.NBU_POLICY.MAXJOBS.oracle_dbbackup_daily_dbsvr01
01/23/2018 01:50:58 - granted resource H019L7
01/23/2018 01:50:58 - granted resource HP.ULTRIUM7-SCSI.003
01/23/2018 01:50:58 - granted resource afis-dbsvr01-hcart-robot-tld-0
01/23/2018 01:50:58 - estimated 0 kbytes needed
01/23/2018 01:50:58 - Info nbjm (pid=816) started backup (backupid=dbsvr01_1516643458) job for client dbsvr01, policy oracle_dbbackup_daily_dbsvr01, schedule daily_backup on storage unit dbsvr01-hcart-robot-tld-0
01/23/2018 01:51:28 - Info bpbrm (pid=27533) dbsvr01 is the host to backup data from
01/23/2018 01:51:28 - Info bpbrm (pid=27533) reading file list for client
01/23/2018 01:51:28 - started process bpbrm (pid=27533)
01/23/2018 01:51:28 - connecting
01/23/2018 01:51:28 - connected; connect time: 0:00:00
01/23/2018 01:51:29 - Info bpbrm (pid=27533) starting bpbkar on client
01/23/2018 01:51:29 - Info bpbkar (pid=27552) Backup started
01/23/2018 01:51:29 - Info bpbrm (pid=27533) bptm pid: 27553
01/23/2018 01:51:29 - Info bptm (pid=27553) start
01/23/2018 01:51:29 - Info bptm (pid=27553) using 65536 data buffer size
01/23/2018 01:51:29 - Info bptm (pid=27553) using 30 data buffers
01/23/2018 01:51:30 - Info bptm (pid=27553) start backup
01/23/2018 01:51:30 - Info bptm (pid=27553) Waiting for mount of media id H019L7 (copy 1) on server dbsvr01.net.
01/23/2018 01:51:30 - mounting H019L7
01/23/2018 01:52:27 - mounted H019L7; mount time: 0:00:57
01/23/2018 01:52:27 - positioning H019L7 to file 5
01/23/2018 01:52:28 - Info bptm (pid=27553) media id H019L7 mounted on drive index 3, drivepath /dev/nst4, drivename HP.ULTRIUM7-SCSI.003, copy 1
01/23/2018 01:54:06 - positioned H019L7; position time: 0:01:39
01/23/2018 01:54:06 - begin writing
01/23/2018 07:54:58 - Error bptm (pid=27553) FREEZING media id H019L7, External event caused rewind during write, all data on media is lost
01/23/2018 07:54:58 - Info bptm (pid=27553) EXITING with status 84 <----------
01/23/2018 07:54:59 - Error bpbrm (pid=27533) from client dbsvr01: ERR - bpbkar exiting because backup is aborting
01/23/2018 07:54:59 - Info bpbkar (pid=27552) done. status: 84: media write error
01/23/2018 07:54:59 - end writing; write time: 6:00:53
media write error (84)

Please write to me soon

13 REPLIES 13

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
This is the issue:

"External event caused rewind during write, all data on media is lost."

Please tell us more about your environment.
Are drives shared between media servers and/or NDMP filers?
What is OS on media servers?

What has been seen in OS logs and Veritas debug logs since you started troubleshooting with Support?

Hi Marianne,

This drive are been shared by other media server. 

Master : Window Server 2016

NBU Master: 7.7.3

Media Serve: Oracle Linux Server release 7.4

There is no error on OS log showing why this error occured. But below are the part of BPDBM logs

07:54:58.469 [9008.15544] <2> logconnections: BPDBM ACCEPT FROM 192.168.xx.xx.55158 TO 192.168.xx.xx.1556 fd = 128
07:54:58.469 [9008.15544] <2> init_resilient_cache: [vnet_nbrntd.c:880] Initialize resilient cache. 0 0x0
07:54:58.469 [9008.15544] <2> vnet_pcache_init_table: [vnet_private.c:214] starting cache size 200 0xc8
07:54:58.469 [9008.15544] <2> vnet_cached_getnameinfo: [vnet_addrinfo.c:2049] found via getnameinfo OUR_HOST=dbsvr01.net.my IPSTR=192.168.xx.xx
07:54:58.485 [9008.15544] <2> db_valid_master_server: dbsvr01.net.my is not a valid server
07:54:58.485 [9008.15544] <2> db_valid_master_server: dbsvr01.net.my is a valid media server

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

bpdbm (database manager) has nothing to do with tape usage.

How much interaction have you had with Veritas Support since logging the call? Do you have the call reference number?

In my experience, Support would've asked you for logs on all media servers -  Verbose (level 5) bptm log, various logs under volmgr\debug, VERBOSE entry in vm.conf, text output of Windows System and Application log (after restart with VERBOSE in vm.conf and error is experienced), /var/log/messages on Linux server.

All of these logs will give us info from NBU and OS side.
It could even be incorrect SSO config or incorrect device mapping. So, NBU device and SSO config needs to be checked.
You really need to work with your Support engineer as far as all of the above is concerned.

But if we look at the error, there is an 'external event' causing the rewind. 

This means possibly outside of the servers - such as other servers zoned to the same drive, all drives and servers in one zone (there are other discussions here that explains best practice for tape zoning - I am not a SAN expert). 
It could be that disk and tapes are connected to the same hba.This will cause all sorts of issues ( best practice available on the web).
All of these needs to be checked with SAN and server admins.

 

Nicolai
Moderator
Moderator
Partner    VIP   

This is likley a SAN issue. You won't find an fix in the netbackup layer.

How are media servers and tape drives connected ?

On the Linux media server try running the command "dmesg" and inspct /var/log/messages. FC or SCSI related messages will show up here. 

mph999
Level 6
Employee Accredited

I'll almost agree with Nicolai ...   ;0)

This probably isn't NBU.  Unfortunatley, it can be very hard to find.

NBU makes a position check after writing each fragment in a backup.  We know how many blocks of data were sent to the tape drive, and thus we know where the tape drive should be positioned, eg, we send 100 blocks of data to an empty tape, the tape dive should report back it is at position 100 when it has finished.  If we send a new job with just 50 blocks to the same tape (append to the end) the drive would repoty back it at position 150 (it started at position 100, and wrote 50 blcoks more = 150) when it has finished, you get the idea.

If this position check fails, we send out the scsi rewind message.

Wrost case, is that a scsi rewind really did happen (hence Marianne asked if ndmp filers were involved).  A mis-match between scsi reservation type on a shared device ( media server set to SPC2 and NDMP filer set to persistent for example) will cause this symptom for example.  In this case, the tape will rewind mid-backup (imvisible to NBU and the operating system, all at scsi level ) and the tape will be overwritten.  This is easy to spot as the tape will not be ountable as there will be no NBU tape header.

What is probably for likely is that the position check is just a bit out, eg. Expected position 3455, actual 3456.

Generally, if the position chcek is a large number out, it's hardware.

If the

position check is just a few out, it's firmware of driver.

Frstly, if the drive is shared, make sure that all devices seeig  teh drive are using the same scsi reservation type.  I would suggest using Persistent over spc2.

Next I'd change the firmware and driver level of the tape drive and HBA.  If at the latest, go back one and see if that helps.

I actualluy have an issue like this at the moment, depite changing firmware/ drivers.  We eventually found (using sg_utils 3rd party utility) that 'something' is sending a persistent reservation key to the drives as soon as the server boots - this happens even with NBU service completly disabled, so there is some OS level issue going on - hopefully yours won't be a complex as that one ....

hi mph999,

for all information, the SSO did suggest the firmware upgrade with to the drive, we did update and still the problem occured. then we change to a new drive with the minus one firmware, the backup was successfull for 2 days and day 3, its fail again with the same error code. we have 4 drive and which ever drive it used, will return the same error. 

Hi Marianne,

yes, the SSO did ask us to collect all the logs been mention, and already passed back to them. firstly it was the issue with SELinux, then we disable the SELinux, the backup was a success. Then the next day it happen again with the same error 84. 

As for the SAN zoning, all backup is in one zone which is backup zone. We put all drive and the media server and master server into one zone. this has been done by HPE SAN engineer. 

below are the bptm log from the failing job:-

01:51:29.468 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
01:51:29.468 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
01:51:29.484 [27553] <4> bptm: emmserver_name = bcksvr01.net.my
01:51:29.484 [27553] <4> bptm: emmserver_port = 1556
01:51:29.523 [27553] <8> do_pbx_service: [vnet_connect.c:2156] via PBX VNETD CONNE
CT FROM 192.168.53.11.33837 TO 192.168.60.91.1556 fd = 7
01:51:29.523 [27553] <8> vnet_vnetd_get_master_useat_info: [vnet_vnetd.c:3135] VN_
REQUEST_GET_SECURITY_INFO 9 0x9
01:51:29.585 [27553] <8> vnet_vnetd_disconnect: [vnet_vnetd.c:192] VN_REQUEST_DISC
ONNECT 1 0x1
01:51:29.620 [27553] <8> do_pbx_service: [vnet_connect.c:2156] via PBX VNETD CONNE
CT FROM 192.168.xx.xx.14836 TO 192.168.xx.xx.1556 fd = 7
01:51:29.620 [27553] <8> vnet_vnetd_get_master_useat_info: [vnet_vnetd.c:3135] VN_
REQUEST_GET_SECURITY_INFO 9 0x9
01:51:29.678 [27553] <8> vnet_vnetd_disconnect: [vnet_vnetd.c:192] VN_REQUEST_DISC
ONNECT 1 0x1
01:51:29.768 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
01:51:29.768 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
01:51:29.869 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
01:51:29.869 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
01:51:29.964 [27553] <4> report_client: VBRC 2 27553 1 dbsvr01_1516643458 0 o
racle_dbbackup_daily_dbsvr01 2 daily_backup 0 1 1
01:51:29.964 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
01:51:29.964 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
01:51:30.520 [27553] <4> create_tpreq_file: symlink to path /dev/nst4
01:51:35.528 [27553] <4> expandpath: /usr/openv/netbackup/db/media/tpreq/drive_HP.
ULTRIUM7-SCSI.003
01:52:28.030 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
01:52:28.031 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
01:52:28.148 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 0 0 (bptm.c.18569)
01:54:07.232 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
01:54:07.232 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
01:54:07.340 [27553] <4> write_backup: begin writing backup id dbsvr01_151664
3458, copy 1, fragment 1, to media id H019L7 on drive HP.ULTRIUM7-SCSI.003 (index
3)
02:14:11.473 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
02:14:11.473 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
02:14:11.635 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 155200000 155200000 (bptm.c.26346)
02:34:13.657 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
02:34:13.658 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
02:34:13.822 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 96000000 96000000 (bptm.c.26346)
02:54:19.344 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
02:54:19.344 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
02:54:19.520 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 103600000 103600000 (bptm.c.26346)
03:14:23.891 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
03:14:23.892 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
03:14:24.056 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 107600000 107600000 (bptm.c.26346)
03:34:25.616 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
03:34:25.616 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
03:34:25.782 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 115200000 115200000 (bptm.c.26346)
03:54:28.064 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
03:54:28.064 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
03:54:28.186 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 158000000 158000000 (bptm.c.26346)
04:14:28.275 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
04:14:28.275 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
04:14:28.390 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 157200000 157200000 (bptm.c.26346)
04:34:10.639 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
04:34:10.639 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
04:34:10.749 [27553] <4> report_throughput: VBRT 1 27553 5 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 155776000 155776000 (bptm.c.20947)
04:34:10.750 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
04:34:10.750 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
04:34:11.292 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
04:34:11.292 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
04:34:11.394 [27553] <4> write_backup: successfully wrote backup id dbsvr01_1
516643458, copy 1, fragment 1, 1048576000 Kbytes at 109189.269 Kbytes/sec
04:34:11.394 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
04:34:11.394 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
04:34:11.494 [27553] <4> write_backup: begin writing backup id dbsvr01_151664
3458, copy 1, fragment 2, to media id H019L7 on drive HP.ULTRIUM7-SCSI.003 (index
3)
04:54:13.447 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
04:54:13.447 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
04:54:13.557 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 155600000 155600000 (bptm.c.26346)
05:14:15.289 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
05:14:15.289 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
05:14:15.399 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 158000000 158000000 (bptm.c.26346)
05:34:20.942 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
05:34:20.942 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
05:34:21.053 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 167200000 167200000 (bptm.c.26346)
05:54:23.496 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
05:54:23.496 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
05:54:23.604 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 173200000 173200000 (bptm.c.26346)
06:14:23.791 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
06:14:23.791 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
06:14:23.903 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 168800000 168800000 (bptm.c.26346)
06:34:25.402 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
06:34:25.402 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
06:34:25.512 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 165600000 165600000 (bptm.c.26346)
06:41:42.193 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
06:41:42.193 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
06:41:42.464 [27553] <4> report_throughput: VBRT 1 27553 5 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 60176000 60176000 (bptm.c.20947)
06:41:42.464 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
06:41:42.464 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
06:41:43.002 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
06:41:43.002 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
06:41:43.104 [27553] <4> write_backup: successfully wrote backup id dbsvr01_1
516643458, copy 1, fragment 2, 1048576000 Kbytes at 137056.378 Kbytes/sec
06:41:43.104 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
06:41:43.104 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
06:41:43.200 [27553] <4> write_backup: begin writing backup id dbsvr01_151664
3458, copy 1, fragment 3, to media id H019L7 on drive HP.ULTRIUM7-SCSI.003 (index
3)
07:01:43.775 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
07:01:43.776 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
07:01:44.049 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 168000000 168000000 (bptm.c.26346)
07:21:43.985 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
07:21:43.986 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
07:21:44.271 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 164400000 164400000 (bptm.c.26346)
07:41:45.046 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
07:41:45.046 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
07:41:45.335 [27553] <4> report_throughput: VBRT 1 27553 1 1 HP.ULTRIUM7-SCSI.003
H019L7 0 1 0 166000000 166000000 (bptm.c.26346)
07:54:58.183 [27553] <16> send_job_file: Failed to send request [job ID 2844, ftyp
e = 3 msg len = 124, msg = LOG 1516665298 16 bptm 27553 FREEZING media id H019L7,
External event caused rewind during write, all data on media is lost ]
07:54:58.188 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
07:54:58.188 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
07:54:58.268 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
07:54:58.268 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
07:54:58.383 [27553] <16> write_backup: FREEZING media id H019L7, External event c
aused rewind during write, all data on media is lost
07:54:58.437 [27553] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
07:54:58.437 [27553] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
07:55:27.149 [27842] <4> bptm: emmserver_name = bcksvr01.net.my
07:55:27.149 [27842] <4> bptm: emmserver_port = 1556
07:55:27.189 [27842] <8> do_pbx_service: [vnet_connect.c:2156] via PBX VNETD CONNE
CT FROM 192.168.xx.xx.15319 TO 192.168.xx.xx.1556 fd = 1
07:55:27.189 [27842] <8> vnet_vnetd_get_master_useat_info: [vnet_vnetd.c:3135] VN_
REQUEST_GET_SECURITY_INFO 9 0x9
07:55:27.245 [27842] <8> vnet_vnetd_disconnect: [vnet_vnetd.c:192] VN_REQUEST_DISC
ONNECT 1 0x1
07:55:27.280 [27842] <8> do_pbx_service: [vnet_connect.c:2156] via PBX VNETD CONNE
CT FROM 192.168.xx.xx.44871 TO 192.168.xx.xx.1556 fd = 1
07:55:27.280 [27842] <8> vnet_vnetd_get_master_useat_info: [vnet_vnetd.c:3135] VN_
REQUEST_GET_SECURITY_INFO 9 0x9
07:55:27.339 [27842] <8> vnet_vnetd_disconnect: [vnet_vnetd.c:192] VN_REQUEST_DISC
ONNECT 1 0x1
07:55:27.444 [27842] <4> create_tpreq_file: symlink to path /dev/nst4
07:57:16.439 [27842] <8> vnet_get_user_credential_path: [vnet_vxss.c:1474] status
35 0x23
07:57:16.440 [27842] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3643] vn
et_get_user_credential_path failed 35 0x23
07:57:16.760 [27842] <4> report_resource_done: VBRD 1 27842 0 HP.ULTRIUM7-SCSI.003
H019L7
07:57:16.760 [27842] <4> create_tpreq_file: symlink to path /dev/nst4

hi Nicolai,

there is 4 media server and 4 tape drive connected and all in the same zone. and as for the OS logs, there is any mention regarding error on FC nor SCSI.

mph999
Level 6
Employee Accredited

If you search trough all the bptm logs, do you have messages like this ....

write_backup: block position check: actual7461, expected 7462

You should see this just before the 'External event has caused rewind' message - well at least in my experience you should, so I'm a bit confused as to why it's missing from the log you posted.

On one of the media that has had the issues, culd you run:

bpmedialist -mcontents -m <media id>

And post up the output

Nicolai
Moderator
Moderator
Partner    VIP   

Hi @TsAdmin

"We put all drive and the media server and master server into one zone. "

Don't - this has cause major problems at my site back in time. Each tape drive and HBA should be one zone. A HBA can be a member of multiple zones. If all drives are in one zone,  a SCSI bus reset  (or FC LIP) on one tape drive  will propergate to all the drives in same zone (this may be the "external event"). By zoning server HBA and each tape drive in seperate zones , a SCSI bus reset will be contained within that zone, only affecting the drive itself.

Please consider to implement the zoning suggestion above.

Best Regards

Nicolai

 

hi @mph999

here are the output from the command:-

media id = H019L7, allocated 01/22/2018 12:39, retention level = 1, Media on Hold = 0

File number 1
Backup id = dbsvr01_1516595950
Creation date = 01/22/2018 12:39
Expiration date = 02/05/2018 12:39
Retention level = 1
Copy number = 1
Fragment number = 1
Block size (in bytes) = 65536

File number 2
Backup id = dbsvr01_1516595950
Creation date = 01/22/2018 12:39
Expiration date = 02/05/2018 12:39
Retention level = 1
Copy number = 1
Fragment number = 2
Block size (in bytes) = 65536

File number 3
Backup id = dbsvr01_1516595950
Creation date = 01/22/2018 12:39
Expiration date = 02/05/2018 12:39
Retention level = 1
Copy number = 1
Fragment number = 3
Block size (in bytes) = 65536

File number 4
Backup id = dbsvr01_1516595950
Creation date = 01/22/2018 12:39
Expiration date = 02/05/2018 12:39
Retention level = 1
Copy number = 1
Fragment number = 4
Block size (in bytes) = 65536

File number 5
Backup id = dbsvr01_1516643458
Creation date = 01/23/2018 01:50
Expiration date = 02/06/2018 01:50
Retention level = 1
Copy number = 1
Fragment number = 1
Block size (in bytes) = 65536

File number 6
Backup id = dbsvr01_1516643458
Creation date = 01/23/2018 01:50
Expiration date = 02/06/2018 01:50
Retention level = 1
Copy number = 1
Fragment number = 2

mph999
Level 6
Employee Accredited

OK, thanks, so we know that there wasn't a real scsi_rewind, else there would be no header and mcontents would have failed.

Any luck finding any position check messages in the logs ?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
@TsAdmin

Please consider @mph999's advice regarding scsi reservation : 'use the same scsi reservation type. I would suggest using Persistent over spc2',
and @Nicolai's advice regarding zoning.