cancel
Showing results for 
Search instead for 
Did you mean: 

Compromised pool integrity

Trond_Endrestol
Level 3
We are running Novell NetWare 6.5, SP4a, NSS4b and Veritas Backup Exec 9.1.1158.4 with OFM 9.4.01m.

All backup jobs runs as expected except when
the Open File Option is used.

These are the error messages that appears when attempting to backup the SYS volume:

27/10/2005 17.14.07 : COMN-3.23-34
Severity = 5 Locus = 2 Class = 6
NSS-2.70-5005: Volume NWGTF/SYS_SV user data write
(20204(zio.c)) to block 81060(file block 305)(ZID 189302) failed.

27/10/2005 17.14.07 : COMN-3.23-1092
Severity = 4 Locus = 3 Class = 0
NSS-3.00-5001: Pool NWGTF/SYS_SP is being deactivated. An I/O
error (20204(zio.c)) at block 511879(file block -511879)(ZID 4) has compromised pool integrity.

Even with the latest updates installed for both NetWare and Backup Exec, backup jobs using OFO does not succeed, and the server goes nowhere until I reboot it the hard way.

Luckily, no data has been lost nor does nss VerifyPool complain about anything being wrong with the regular NSS pools.

Does somebody know how to make OFO work under these circumstances?

Regards,
Trond Endrestøl,
SysAdmin,
Gjøvik Technical College,
Gjøvik, Norway

email: Trond.Endrestol@fagskolen.gjovik.no
17 REPLIES 17

Daniel_Hesse
Level 3
I have the same OS, and same backup exec version....and I have the same problem. I have had it since the day I updated from NW6SP3 to SP4 OES. I am unable to fix it.

Daniel_Hesse
Level 3
Sorry..NW65SP3 to NW65SP4 OES. It also seems to do better if the machine is rebooted and iManager never starts up. I think it has Java issues.

Trond_Endrestol
Level 3
We ran a clean NetWare 6.5 server and upgraded the server straight to SP4.
Next, I installed OFO as shipped with 9.1.1158.4.
Subsequently the server was upgraded to SP4a and NSS4b.
OFO still refuses to work, even with the latest update from St. Bernard (9.4.01m).

Thank you for your input.
Apparently, I'm not the only SA with this problem. ;)

I guess Novell and/or St. Bernard should put their heads into this, but that's not guaranteed.
Maybe we all should upgrade to BENW 9.2 and hope for the best...

Daniel_Hesse
Level 3
Since the SP4 updates, everytime BKExec runs, the server runs low on logical address space. I think It is important to note that none of these issues existed prior to the SP4 updates and I check for updates all the time from Veritas. They claim it will run on NW65SP4, but !!!!

Daniel_Hesse
Level 3
I won't claim to be an expert of any sort for Netware or BackupExec. But I made these changes and I have had 3 successful backups without NSS Pool Corruption or running out of Logical Address Space. In fact ZERO errors and 43.6gb in 1hr 39mins, using OFO without a lock.

My Setup
NW65SP2 patched all the way to SP4 OES, with all the latest updates.
SYS volume = 8gb, 2gb free
Data Volume = 140gb, 95gb free



open Novell Remote Manager "https://Your_Server_IP:8009"
select "Manage eDirectory"
select "NDS iMonitor"
select "Agent Configuration"
select "Database Cache"
Scroll down to Database Cache Configuration
Select "Dynamic Adjust" I have mine set to 51% of available memory
click "Submit".


New to NetWare 6.5 Support Pack 3 is the set parameter "set auto tune server memory". This parameter is ON by default and will automatically set the file cache maximum size appropriately for the server during operation. When this feature is enabled, messages will be printed on the system console screen notifying the administrator of this activity. Novell recommends that this parameter stay at its default setting of ON.


Edit these two files

My server is a Novell Small Business setup. I run groupwise on this machine, so I changed two settings in this file. It is funny, as I reflect I could have sworn I set this to 30....either way, 10 is whats in there now. When I had it set to 1, I got the NSS SYS_SP I/O error imediately when the backup job started.

FILE - SYS:\ETC\SMS\TSA.CFG
set
Cache Memory Threshold:
10
Enable GroupWise:
1

If you dont run groupwise this probably wont affect you
FILE - SYS:\BKUPEXEC\besrvr.cfg
Enable_GWmail = 1

I also ran the command line of what is in tsa.cfg just to be sure
Load TSAFS /CacheMemoryThreshold=10

I purged the SYS volume
I purged the Data volume (this took a while, it holds groupwise)

Then Rebooted the server

Opened backup exec and deleted my existing backup job, and recreated it....again using the Open File Option, without a lock.

Runs great!!! Just wish I knew why I feel so strongly that I had set the cache memory threshold to 30.....either way, with 10 in the file, that is what it would have changed to on reboot.

GOOD LUCK!
http://support.novell.com/cgi-bin/search/searchtid.cgi?10091980.htm

http://support.novell.com/cgi-bin/search/searchtid.cgi?/10096642.htm

Trond_Endrestol
Level 3
> But I made these changes and I have
> had 3 successful backups without NSS Pool Corruption
> or running out of Logical Address Space. In fact ZERO
> errors and 43.6gb in 1hr 39mins, using OFO without a
> lock.

Lucky you!

> open Novell Remote Manager
> "https://Your_Server_IP:8009"
> select "Manage eDirectory"
> select "NDS iMonitor"
> select "Agent Configuration"
> select "Database Cache"
> Scroll down to Database Cache Configuration
> Select "Dynamic Adjust" I have mine set to 51% of
> available memory
> click "Submit".

There was no need to change this parameter on my system.

> FILE - SYS:\ETC\SMS\TSA.CFG
> set
> Cache Memory Threshold:
> 10

This file contained the value 25.
I'd set it to 10, and I'll attempt to do an OFO backup later this evening.
Fingers crossed... ;)

> I also ran the command line of what is in tsa.cfg
> just to be sure
> Load TSAFS /CacheMemoryThreshold=10

So did I.

> Runs great!!!

I hope I'll achieve the same success as you.

> GOOD LUCK!

Ta, I might need it. ;)

Thank you for your input!

Daniel_Hesse
Level 3
Bad Omen posting success! Last night my backup failed "Compromised Pool Integrity!!! Arghh

I hate backup exec!!! The salesman screwed up the sale and sold me Disaster recovery for a remote server but not the Primary server and all they want to do is charge me more for EDR on the primary, which is the only one I wanted EDR for in the 1st place........now this....

This software is way over rated!!!

Trond_Endrestol
Level 3
> I hate backup exec!!!

Me too, or...

Aside from our common issues with the non-functioning OFO, BE is working more or less as expected in my shop.

> The salesman screwed up the
> sale and sold me Disaster recovery for a remote
> server but not the Primary server and all they want
> to do is charge me more for EDR on the primary, which
> is the only one I wanted EDR for in the 1st
> place........now this....

Hmm, IDR works like a charm.
Though, I have only tested the CD on an old PC sitting in the corner.
I got as far as booting the restored NetWare system, but NetWare then complained about to little memory, and abended on the spot.

> This software is way over rated!!!

I agree with you.

There are some pieces of the software the developers could spend some more time polishing.

For instance, being able to specify both start and end date/time in the holiday calendar.
I had to mark each and every day belonging to the Xmas and Easter holidays.

When doing media rotation jobs, it would be nice to have different media descriptions, depending on whether the tape belongs to the daily, the weekly or the monthly media set.
That change would make it easier to distinguish each tape when doing a restore.

Ordinary backup jobs is currently limited to recur at n minutes/hours/days/weeks/months or the first/second/third/fourth/fifth/last week of each month.
Why can't they have the same recurring properties as rotation jobs, if you choose to do so?

Another issue is the attachments sent by email.
The attachments are marked as application/octet-stream when they really are text/plain and should be marked as such.

And that darn UTF-8 compromise of a character set should be abandoned all together.
Either you stick to single byte characters or you stick to double/quad byte characters, not alternating between the two forms.

I'm missing a configuration option to choose a character set from.
In Norway and the rest of the Western World, ISO 8859-1 is a reasonable character set.

Last but not least, the documentation is way bad.

The documentation claims you need at least 65 % cache buffers, where in reality you really need only about 300 MiB of cache buffers to be on the safe side.

Comparing the manuals for the Windows version with the NetWare version, you'll soon find out that the common chapters should be merged and polished.

I actually learned a thing or two about BE for NW by reading the Windows manual.

Trond.

Daniel_Hesse
Level 3
More interesting news from the front lines!!!

1st - When I said I thought I had changed TSAFS to 30....well, I had! Problem is, 30 is not a valid value. 1 - 25, so I changes mine back to 25.

2nd - Have you looked at this log before??
SYS:\SYSTEM\OFM\OFMNW.LOG
I Looked at mine and found some interesting things. I will paste some excerpts

Before upgrade to NW6SP4
02005 08 16 03:03:39 System Not Synchronized
02005 08 17 01:22:39 Agent requested synchronization.
02005 08 17 01:23:22 NSS Pools synchronized
02005 08 17 01:23:22 Backup Agent Activity Detected
02005 08 17 01:23:22 System Synchronizing
02005 08 17 01:23:22 System Synchronized
02005 08 17 03:06:14 Agent requested stop synchronization.
02005 08 17 03:06:16 Backup Complete
AFTER Update to SP4
02005 11 19 11:45:35 System Not Synchronized
02005 11 22 01:02:48 Agent requested synchronization.
22005 11 22 01:03:16 Pool SYS cache space exhausted while still expanding.
02005 11 22 10:29:13 Creating/appending to the log file SYS:\SYSTEM\OFM\OFMNW.Log
02005 11 22 10:29:13 Open File Manager(TM) 9.4 build 401 Driver Started
12005 11 22 10:32:01 Unsynchronizing Open File Manager.
02005 11 22 10:32:01 Request to stop synchronization received.
02005 11 22 10:32:13 Driver Paused
02005 11 22 10:32:13 Driver Stopped

I did a search on the Pool Sys Cache Space exhausted and found this
http://seer.support.veritas.com/docs/272737.htm

I made the change to 16mb and the 0x30
If you do this you may need to make the file read only so it doesnt change it back....it did it to me.

After the change I ran the backup, it worked, but Netware complained of Running Low on Logical Addresses and
automatically adjusted the File Cache Maximum Size, down about 40% on the server to 1392825344
and OFMNW.LOG complains of cache allocation inefficiency
LOG AFTER CHANGE
02005 11 29 14:52:58 System Not Synchronized
02005 11 29 15:11:58 Agent requested synchronization.
02005 11 29 15:12:25 NSS Pools synchronized
02005 11 29 15:12:25 Backup Agent Activity Detected
02005 11 29 15:12:25 System Synchronizing
02005 11 29 15:12:25 System Synchronized
12005 11 29 15:18:16 Pool SYS fragmentation is causing cache allocation inefficiency.
02005 11 29 15:48:16 Agent requested stop synchronization.
02005 11 29 15:48:17 Backup Complete

Subsequent backups - netware did not modify the File Max Cache, OFMNW log still complains about innefficiency....but 2 Run Job Now's, and a scheduled backup later it is backing up, without compromised pool integrity....
we shall see what tonight brings.

Trond_Endrestol
Level 3
> 2nd - Have you looked at this log before??
> SYS:\SYSTEM\OFM\OFMNW.LOG

Now that you mentioned it, I took a look at this file and found these entries:

02005 10 26 17:22:25 Agent requested synchronization.
22005 10 26 17:22:47 Pool SYS cache space exhausted while still expanding.
12005 10 26 17:26:25 Unsynchronizing Open File Manager.

> I did a search on the Pool Sys Cache Space exhausted
> and found this
> http://seer.support.veritas.com/docs/272737.htm

I find this very interesting.
If tweeking these settings is the cure, I'm really curious to try this myself.
Since I didn't attempt an OFO test backup last evening, I'll attempt a test backup after the regular backups this evening.
Stay tuned!

> I made the change to 16mb and the 0x30
> If you do this you may need to make the file read
> only so it doesnt change it back....it did it to me.

I think I'll reboot the server prior to performing the test backup, and if the contents of the configuration file is reverted back to the standard settings, I'll make this file read only.

> Subsequent backups - netware did not modify the File
> Max Cache, OFMNW log still complains about
> innefficiency....but 2 Run Job Now's, and a
> scheduled backup later it is backing up, without
> compromised pool integrity....
> we shall see what tonight brings.

Looks like the best news we've seen since upgrading to SP4a+NSS4b.
Fingers still crossed!

Trond.

Trond_Endrestol
Level 3
I just ran my first successful OFO backup!
All thanks to you and your research on this matter!
Thank you so much!

> 2nd - Have you looked at this log before??
> SYS:\SYSTEM\OFM\OFMNW.LOG

Here are the entries from this evening's OFO test backup:

02005 11 30 17:20:58 Creating/appending to the log file SYS:\SYSTEM\OFM\OFMNW.Log
02005 11 30 17:20:59 Open File Manager(TM) 9.4 build 401 Driver Started
02005 11 30 17:25:03 Agent requested synchronization.
02005 11 30 17:25:28 NSS Pools synchronized
02005 11 30 17:25:28 Backup Agent Activity Detected
02005 11 30 17:25:28 System Synchronizing
02005 11 30 17:25:28 System Synchronized
02005 11 30 17:25:28 Agent requested stop synchronization.
02005 11 30 17:25:34 Backup Complete
02005 11 30 17:25:34 System Not Synchronized

> I did a search on the Pool Sys Cache Space exhausted
> and found this
> http://seer.support.veritas.com/docs/272737.htm

It seems you we're on the right track!

> I made the change to 16mb and the 0x30
> If you do this you may need to make the file read
> only so it doesnt change it back....it did it to me.

The fact that we need to set this file read only, suggests to me that the information is generated each time BE is started.
Perhaps the default values are stored somewhere else and copied whenever ofmcdm.cfg is (re)generated.

> After the change I ran the backup, it worked, but
> Netware complained of Running Low on Logical
> Addresses and
> automatically adjusted the File Cache Maximum Size,
> down about 40% on the server to 1392825344
> and OFMNW.LOG complains of cache allocation
> inefficiency

No complaints were displayed on my system as far as I can tell.
However, my server has 1 GiB worth of RAM installed, maybe that's the reason I didn't see any warnings.

> we shall see what tonight brings.

I'll guess you'll be just as lucky as I was.

I believe I'm ready to attempt regular backups using OFO.
Tomorrow evening will be the real big test at my shop.

Again, thank you for all your effort!

Trond.

Trond_Endrestol
Level 3
Maybe I was too hasty...

After a few days of more or less successful OFO driven backups, BE started complaining about corruption of different files on each occation.

Last Friday these files were reported corrupt:
SYS:\SYSTEM\lang\ndsimon\errors\en_us\*

On today's backup (today being the following Monday) these files were reported instead:
SYS:\tomcat\4\webapps\nps\portal\modules\base\help\*

(For some reason BE chose to perform a full backup instead of performing an incremental backup.)

I renamed the directories closest to the reported files, and restored the files from the newest non-OFO full backup.

I'll attempt a NSS /PoolVerify=SYS later tonight when everybody is away but me, just to make sure everything is in place.

By searching this forum I found similar cases were it was suggested to upgrade the TSA, enable decompression of compressed files prior to backup, etc.

In SYS:\SYSTEM\OFM\OFMNW.LOG I found these entries:

02005 12 02 16:17:01 Agent requested synchronization.
02005 12 02 16:17:24 NSS Pools synchronized
02005 12 02 16:17:24 Backup Agent Activity Detected
02005 12 02 16:17:24 System Synchronizing
02005 12 02 16:17:24 System Synchronized
12005 12 02 16:17:38 Pool SYS fragmentation is causing cache allocation inefficiency.
22005 12 02 16:30:02 Pool SYS does not have enough free space for preview data
02005 12 02 16:30:02 Request to stop synchronization received.
12005 12 02 16:30:02 Pool SYS_SP has been deactivated.
02005 12 02 16:30:03 Backup Complete
02005 12 02 16:30:03 System Not Synchronized
02005 12 02 16:30:03 Agent requested stop synchronization.

and also these entries:

02005 12 05 16:19:00 Agent requested synchronization.
02005 12 05 16:19:24 NSS Pools synchronized
02005 12 05 16:19:24 Backup Agent Activity Detected
02005 12 05 16:19:24 System Synchronizing
02005 12 05 16:19:24 System Synchronized
22005 12 05 16:33:48 Pool SYS does not have enough free space for preview data
02005 12 05 16:33:48 Request to stop synchronization received.
12005 12 05 16:33:49 Pool SYS_SP has been deactivated.
02005 12 05 16:33:50 Backup Complete
02005 12 05 16:33:50 System Not Synchronized
02005 12 05 16:33:50 Agent requested stop synchronization.

I found these entries in the console log:

2/12/2005 16.30.01 : COMN-3.23-1092
NSS-3.00-5001: Pool NWGTF/SYS_SP is being deactivated.
An I/O error (20204(zio.c)) at block 460009(file block -460009)(ZID
1) has compromised pool integrity.

2/12/2005 16.30.01 : COMN-3.23-33
NSS-2.70-5004: Volume NWGTF/SYS_SV is being deactivated.
An I/O error (20204(zio.c)) at block 460009(file block -460009)(ZID
1) has compromised volume integrity.

It seems I might need to purge deleted files or delete unwanted files to free up more space.
I have yet to see this error with regard to the other and larger NSS pools on this server.

To resolve this matter we have several options (but note I'm no expert in this field):

1. Upgrade the TSA, if available.
2. Increase or decrease the InitialCacheSize and/or CacheSizeExpandThreshold parameters in
SYS:\SYSTEM\OFMCDM.CFG.
3. Recreate the NSS pools and in the same process create the SYS pool larger than the size of the current SYS pool.
Then restore all files from the latest backup.
4. Stop using OFO.

If anyone has something to contribute, please come forward. :)

Trond_Endrestol
Level 3
File corrution were reported by BE after a few days of otherwise normal behaviour. See my posting on the subject.

Daniel_Hesse
Level 3
I don't know if this will help you or not.

When I made the change to 16mb, I told you I had received the error
12005 11 29 15:18:16 Pool SYS fragmentation is causing cache allocation inefficiency.
02005 11 29 15:48:16 Agent requested stop synchronization.

Well, I thought I would take a chance and made an un-educated guess that "0x00000004" would be 4mb.
I made the change so OFMCDM.cfg looks like this.


MinFreeSpace = 0x0000000000400000
InitialCacheSize = 0x00000004
CacheSizeExpandThreshold = 0x30
CacheSizeFailThreshold = 0x5
Status = 0x79

Does it actually mean 4mb.....don't know!!!
Backups are working like a charm though, and I have not received anymore messages about innefficient cache

I am curious....when your backup job has completed, if you run "monitor".......what is your utilization??? If it is running 20 - 25 %, you can probably do an
ap2webdn
tc4stop
bestop
check your utilization....if it is still 20 -25% try a "java -exit"
Does that fix it???

What version of iManager are you running??

Daniel_Hesse
Level 3
Are using using the OFO with a Lock, or OFO Without a Lock..........I had issues with some database software and OFO with a Lock, so now I only use OFO without Lock....

Trond_Endrestol
Level 3
> Well, I thought I would take a chance and made an
> un-educated guess that "0x00000004" would be 4mb.
> I made the change so OFMCDM.cfg looks like this.
>
>
> MinFreeSpace = 0x0000000000400000
> InitialCacheSize = 0x00000004
> CacheSizeExpandThreshold = 0x30
> CacheSizeFailThreshold = 0x5
> Status = 0x79

0x1000000 is the same as the decimal number 16777216.

Divide this number by 1024 and you'll get a value expressed as kibibytes, divide this value again by 1024 and you'll get a value expressed as mebibytes.

16777216 / 1024 / 1024 = 16

16777216 B = 16384 kiB = 16 MiB

For an explanation of kibibytes and mebibytes, take a look at
http://en.wikipedia.org/wiki/Binary_prefix and
http://en.wikipedia.org/wiki/IEEE_1541
If hexadecimal notation is somewhat mysterious to you, take a look at http://en.wikipedia.org/wiki/Hexadecimal

> Does it actually mean 4mb.....don't know!!!

This leaves the InitialCacheSize at 4 bytes.
What you need is 4 * 1024 * 1024 = 4194304.
In hexadecimal notation this number is 0x400000.

If you take a look at the original value which is 0x200000 and noting that this equals 2 MiB,
the transition to 4 MiB is as simple as multiplying the above value by 2.
Time to pull out that old but trustworthy HP 48GX!
Though the calculator in Windows will suffice ;)

> Backups are working like a charm though, and I have
> not received anymore messages about innefficient
> cache

Right, I guess tuning these parameters will eventually get me on the right track.

> I am curious....when your backup job has completed,
> if you run "monitor".......what is your
> utilization??? If it is running 20 - 25 %, you can
> probably do an
> ap2webdn
> tc4stop
> bestop
> check your utilization....if it is still 20 -25% try
> a "java -exit"
> Does that fix it???

I'll check this after today's backups.
Since I rebooted the server last night, I'll let the backups run with OFO enabled anyway.

The NSS pools are just fine.
The corruption reported by BE happens only with the files stored on tape, naturally.

> What version of iManager are you running??

iManager Version 2.0.2.

Trond_Endrestol
Level 3
It seems you need to have a certain amount of free space in each regular pool.

The SYS pool on my system had about 900 MiB free space.

After archiving and deleting SYS:\SYSTEM\BACKSP4, the SYS pool gained about 800 MiB of free space.

The SYS pool now has about 1.62 GiB of free space.
This seems to be more than enough for OFO to succeed.

My other pools have far more free space, but when they fill up, I'll probably run into the same problems as before.