11-05-2013 07:11 PM
Hi,
I started using Vault Duplication to Offsite tapes. (all LTO4 tapes)
It seems it either don't compress the data onto the duplication tape or it for some other reason don't fill it and just have to continue to another Offsite tape.
I run a normal backup on to local tape during the night, it's about 1TB
Vault do a duplication of todays data.
But it start duplication onto one tape but instead of filling it it starts another duplication onto a new tape.
It usually fills about 500-600 GB and then starts a new dupl. to a new tape.
They all have the same retention on the daily tape.
Why isn't it filling up the tape?
I went from inline copy to using Vault duplication and with inline copy it wasn't any problem.
- Roland
11-05-2013 08:57 PM
What is the status of media with only 500-600 GB written to it?
Check with 'bpmedialist -m <media-id>'
To force NBU to fill media before selecting new tape, change Max Partially Full setting on Offsite Pool to 1.
11-06-2013 02:08 PM
Hi,
Seems like it is not compressing at all!
DLT017 is the Daily Local tape and Duplicated by Vault to DRT120 and DRT122
Starting with DRT020.
Next day backups continued on the DLT017 tape so dates are a bit misleading.
root@ipndms:/nbu/bin# ./admincmd/bpmedialist -m DRT120
Server Host = pnms01
id rl images allocated last updated density kbytes restores
vimages expiration last read <------- STATUS ------->
--------------------------------------------------------------------------------
DRT120 10 90 11/04/2013 15:01 11/04/2013 23:52 hcart 791137984 0
90 12/02/2013 09:57 N/A FULL SUSPENDED
root@ipndms:/nbu/bin# ./admincmd/bpmedialist -m DRT122
Server Host = pnms01
id rl images allocated last updated density kbytes restores
vimages expiration last read <------- STATUS ------->
--------------------------------------------------------------------------------
DRT122 10 23 11/04/2013 23:52 11/05/2013 00:47 hcart 81293600 0
23 12/02/2013 08:12 N/A SUSPENDED
root@ipndms:/nbu/bin# ./admincmd/bpmedialist -m DLT017
Server Host = pnms01
id rl images allocated last updated density kbytes restores
vimages expiration last read <------- STATUS ------->
--------------------------------------------------------------------------------
DLT017 10 407 10/31/2013 08:59 11/05/2013 01:10 hcart 1239776352 0
407 12/03/2013 01:10 11/05/2013 16:34 FULL
root@ipndms:/nbu/bin#
I have now set the Max Partially Full setting to 1.
- Roland
11-06-2013 02:25 PM
11-06-2013 02:37 PM
I understand that but that doesn't explain why DLTxxx compresses data to 1.2TB and DRTxxx cant?
It is the same drives same robot same driver same os
It is duplicating DLTxxx to DRTxxx tapes using the same everything.
Shouldn't it manage to cram 1.2TB onto DRTxxx tapes?
Using inline copy did this without any problems, duplicating is another method though...
11-06-2013 04:24 PM
11-06-2013 09:34 PM
Have a look at this similar issue:
https://www-secure.symantec.com/connect/forums/vault-tapes-capacity
11-07-2013 09:31 AM
11-07-2013 08:40 PM
There is a way to find out if it's really specific to vault duplication.
Try the same data using SLP duplication, my guess is it would have the same behavior - I don't really think a vault feature carries a different kind of duplicate mechanism, or at least, I had not seen a technote about that. If there is indeed some difference, most of us would very much like to know from Symantec.
I am more inclined to see this as something on the tape drive firmware - are we talking about the same tape drive for all the tests carried out?
11-08-2013 01:23 AM
SLP and Vault both use bpduplicate - so there is no difference.
11-08-2013 02:02 AM
Sorry been busy with other stuff.
I will do some tests next week, I get more data over the weekend as we do a full backup of a RMAN directory.
Interesting is that we got a new drive with a lower fw than the other which is latest.
Solaris 10 and LTO4's
I'll let you know
11-10-2013 05:29 PM
Over the weekend we got about 1TB/day of backup.
Now it compressed a bit, filled first tape with 935GB and then anouther 87GB on the next tape.
The original DLT009 got 1.2TB
I have changed this parameter as suggested by Marianne.
change Max Partially Full setting on Offsite Pool to 1
Seems the DLT009 was written on drive 001 with the newer firmware, normal one tape backup.
Duplication to DRT125 was written on the same drive and DLT009 was in the drive with the older fw.
root@ipndms:/nbu/bin# ./admincmd/bpmedialist -m DRT125
Server Host = pnms01
id rl images allocated last updated density kbytes restores
vimages expiration last read <------- STATUS ------->
--------------------------------------------------------------------------------
DRT125 10 17 11/09/2013 15:00 11/10/2013 00:00 hcart 935192576 0
17 12/07/2013 05:04 N/A FULL SUSPENDED
root@ipndms:/nbu/bin# ./admincmd/bpmedialist -m DRT128
Server Host = pnms01
id rl images allocated last updated density kbytes restores
vimages expiration last read <------- STATUS ------->
--------------------------------------------------------------------------------
DRT128 10 2 11/10/2013 00:00 11/10/2013 00:16 hcart 87324064 0
2 12/07/2013 05:45 N/A SUSPENDED
root@ipndms:/nbu/bin# ./admincmd/bpmedialist -m DLT009
Server Host = pnms01
id rl images allocated last updated density kbytes restores
vimages expiration last read <------- STATUS ------->
--------------------------------------------------------------------------------
DLT009 10 209 11/09/2013 00:05 11/11/2013 04:01 hcart 1206396032 0
209 12/09/2013 04:01 11/10/2013 16:21
root@ipndms:/nbu/bin#
11-12-2013 12:29 AM
Do you have lines like these in the bptm log, just interested, no worries if not
11-15-2013 03:00 PM
I wonder if "preserve muliplexing" in vault would have any influence?
I'm carefull about changing "partially filled" tapes. I've seen backup queued because of this.
11-27-2013 02:05 PM
Hi Guys and Girls,
I've been busy with other stuff and havn't been able to pay the backups enough attention.
I cleared up the other issues and can now work more closely on this.
My main issue with the backups is that they are too slow so I run out of time in my 24 hour day.
It could be caused by several issues and I have ventilated a few here on the forum and got really good help.
My backup strategy is as follows.
I got 2 tape labels.
DLTxxx Which is local tapes sitting in the Robot at all times.
DRTxxx Which are Remote tapes going offsite every day.
Daily backups start at midnight and when they are quick ends about 6am.
At 3pm I start my Vault session which duplicates the Daily tapes onto remote tapes that gets Vaulted.
Sometimes the Daily backups is just so slow it doesn't finish until after 3pm and then my backup window is reached which results in failed backups (196) client backup was not attempted because backup window closed )
Sometimes duplication is so slow that it runs past midnight and it starts interfering with the normal Daily backups.
I got an Sun SL48 robot with 2xLTO4 SCSI drives.
Both drives has the latest firmware and so does the robot.
One of the drives /dev/rmt/1 has been replaced, I got a case with the other drive to be replaced but need to get some evidence that it needs to ve replaced.
I have replaced all tapes with new tapes so the oldest is about 3 months.
I got a master server running in a Solaris LDom and a separeate Mediaserver running Solaris 10 x86
Both are well equiped in terms of cpu and memory
I managed to get the backups go to both drives without each client multiplexing so one client goes to one drive and another to the other at the same time.
I can't see any real difference in speed going to one drive or the other.
I've tested the network and it shouldn't be a problem.
Outstanding issues with my backups.
1. It seems duplication from Local tapes to Remote tapes doesn't compress as good as on the daily.
When the Daily is marked full they have about 1.3TB
Remote tapes only gets about 8-900GB before marked as full.
2. Duplication is some days very slow, my backup data is about 800GB, 1.2TB on Saturdays.
On a good day the Duplication is about 3 hours and on a bad day 10 hours
The worst I've seen is a 16 hours duplication.
3. From another thread.
https://www-secure.symantec.com/connect/forums/how-calculate-sizedatabuffers
I still have big waiting and delayed times
11/28/2013 00:42:22 - Info bptm (pid=28893) waited for full buffer 84863 times, delayed 88347 times
But it varies and some days is worse than others and clients could be quick one day and slow the other.
I think it bolts down to the unreplaced drive, because everything is so random it's the only explanation.
But I need some proof that it needs to be replaced. (errors in a log)
I'm going to do some manual tar tests to the drive to see if I can get it to complain.
- Roland
11-27-2013 02:08 PM
Yes I have!
1562 lines with "block position check"
log.112813:04:35:51.762 [5595] <2> write_data: block position check: actual 3471627, expected 3471627
log.112813:04:37:07.163 [5595] <2> io_terminate_tape: block position check: actual 3476424, expected 3476424
log.112813:04:39:16.271 [5711] <2> write_data: block position check: actual 3448452, expected 3448452
log.112813:04:44:54.055 [5711] <2> write_data: block position check: actual 3467490, expected 3467490
log.112813:04:52:14.965 [5711] <2> write_data: block position check: actual 3495477, expected 3495477
log.112813:04:57:30.724 [5711] <2> write_data: block position check: actual 3514438, expected 3514438
log.112813:05:02:33.793 [5711] <2> write_data: block position check: actual 3526295, expected 3526295
log.112813:05:07:28.153 [5711] <2> write_data: block position check: actual 3537221, expected 3537221
log.112813:05:12:35.046 [5711] <2> write_data: block position check: actual 3543504, expected 3543504
log.112813:05:17:30.839 [5711] <2> io_terminate_tape: block position check: actual 3555096, expected 3555096
root@pnms01:/usr/openv/netbackup/logs/bptm# grep 'EOM encountered' *
log.111913:04:29:42.061 [9351] <2> write_backup: EOM encountered --- Fragmenting, TWIN_INDEX 0
log.112113:03:01:04.783 [29439] <2> write_backup: EOM encountered --- Fragmenting, TWIN_INDEX 0
log.112313:00:20:47.103 [20016] <2> write_backup: EOM encountered --- Fragmenting, TWIN_INDEX 0
log.112313:20:52:34.948 [21085] <2> write_backup: EOM encountered --- Fragmenting, TWIN_INDEX 0
log.112513:05:12:27.015 [22901] <2> write_backup: EOM encountered --- Fragmenting, TWIN_INDEX 0
root@pnms01:/usr/openv/netbackup/logs/bptm#
11-27-2013 02:45 PM
11-27-2013 03:18 PM
We have occationally OS errors from the tape as well as NBU log errors.
The last OS error last week pointed to a specific tape so I was told to exchange that tape first before they consider replacing tapes, I'm just waiting to get a new error.
From /var/adm/messages NBU errors loged by the OS.
Nov 25 01:42:32 pnms01 bptm[19833]: [ID 895065 daemon.warning] TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive HP.ULTRIUM4-SCSI.001 (index 1), Media Id DLT003
Nov 25 01:42:32 pnms01 bptm[19833]: [ID 356200 daemon.crit] TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive HP.ULTRIUM4-SCSI.001 (index 1), Media Id DLT003
Nov 25 01:42:32 pnms01 bptm[19833]: [ID 436935 daemon.crit] TapeAlert Code: 0x06, Type: Critical, Flag: WRITE FAILURE, from drive HP.ULTRIUM4-SCSI.001 (index 1), Media Id DLT003
Nov 25 01:42:32 pnms01 bptm[19833]: [ID 185535 daemon.crit] TapeAlert Code: 0x14, Type: Critical, Flag: CLEAN NOW, from drive HP.ULTRIUM4-SCSI.001 (index 1), Media Id DLT003
NBU errors are steadily incresing as seen with you tperr.sh scripts
HP.ULTRIUM4-SCSI.000 is /dev/rmt/1 which is replaced
HP.ULTRIUM4-SCSI.001 is /dev/rmt/0 which is the unreplaced drive.
Errors File exists ....
DLT010 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DLT001 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DLT011 has had errors in 1 different drives (Total occurrences (errors) of this volume is 4)
DLT002 has had errors in 1 different drives (Total occurrences (errors) of this volume is 4)
DLT012 has had errors in 1 different drives (Total occurrences (errors) of this volume is 4)
DLT003 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DLT014 has had errors in 1 different drives (Total occurrences (errors) of this volume is 1)
DLT005 has had errors in 1 different drives (Total occurrences (errors) of this volume is 1)
DRT010 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DRT001 has had errors in 1 different drives (Total occurrences (errors) of this volume is 4)
DLT016 has had errors in 1 different drives (Total occurrences (errors) of this volume is 5)
SCR004 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DRT110 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DLT008 has had errors in 1 different drives (Total occurrences (errors) of this volume is 18)
DRT021 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DLT018 has had errors in 2 different drives (Total occurrences (errors) of this volume is 4)
DRT031 has had errors in 1 different drives (Total occurrences (errors) of this volume is 5)
DRT113 has had errors in 1 different drives (Total occurrences (errors) of this volume is 1)
DRT014 has had errors in 1 different drives (Total occurrences (errors) of this volume is 4)
NLT002 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DRT115 has had errors in 2 different drives (Total occurrences (errors) of this volume is 3)
DRT106 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
DRT016 has had errors in 1 different drives (Total occurrences (errors) of this volume is 1)
NRT001 has had errors in 1 different drives (Total occurrences (errors) of this volume is 1)
NRT005 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
MRT006 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
NRT006 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
NRT008 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
NRT009 has had errors in 1 different drives (Total occurrences (errors) of this volume is 1)
HP.ULTRIUM4-SCSI.000 has had errors with 13 different tapes (Total occurrences (errors) for this drive is 33)
HP.ULTRIUM4-SCSI.001 has had errors with 18 different tapes (Total occurrences (errors) for this drive is 54)
The drive with incresing errors is not suprisingly HP.ULTRIUM4-SCSI.001
- Roland
11-28-2013 01:24 PM
11-28-2013 02:26 PM
I'm trying to find a way of getting some stats.
To me the only thing we could see on a regular basis is that Dup is slow or backups are slow.
We also see that Dup is not compressing good enough.
This theory is true IF we see the slow dup or slow backup when we use the nonreplaced drive.
All this could be derived using logs and bpdbjobs but it is very cumbersome finding a way of seeing.what I want.
Do you have an easy way of getting these columns
Policy Type Schedule Copy DSTMEDIA tapedrive
ie
"IPND_papp02" "Backup or Duplication" "Daily" "1" "DRT101" "/dev/rmt/0cbn"
Then I could maybe see some data pointing to /dev/rmt/0 as beeing slow reading or writing or something.
But as I see it it's not easy getting this output. especially the tape-drive
Also what should I do to reset the stats that tperr show?
Starting from 0 errors.