Items building up in watchfile and journalarchive tables
I have a customer with EV 9.0.2 that has built up a large backlog of items awaiting backup. I took a look at how they were doing their partition backups and identified some problems but that didn't explain the high number (over 7.5 million items). I ran queries against the watchfile and journalarchive tables to see what was there and came up with this:
indexcommited |
backupcomplete |
count |
0 |
0 |
64044 |
1 |
0 |
7623934 |
0 |
1 |
63882 |
itemsecured |
count |
|
0 |
591032 |
|
1 |
7096946 |
|
I tried to replicate what was going on in my lab but couldn't. When I test in the lab I find that as items get archived, rows are added to watchfile with itemsecured=0 and to journalarchive with backupcomplete=0. After backups of the partitions, the watchfile rows get changed to itemsecured=1 first and then the corresponding journalarchive rows get updated to backupcomplete=1 and the watchfile rows are deleted. That's not what I'm seeing in the customer data. For some reason, the journalarchive table rows are never getting set to backupcomplete=1 and the watchfile rows are not getting purged. This would seem to indicate that the trigger files were picked up and processed but the second pass through the tables never occurred. Has anyone seen this before and knew of a cause/resolution?
thanks,
Mark
The Itemsecured in the Watchfile table just tells EV whether the DVS file has been backed up or not. The Backupcomplete in the JournalArchive table when set indicates that any/all associated SISParts from remote VSDB's have also been backed up and you should be able to tell this from the 'Secured' column in the Fingerprint DB for any associated SiSParts (Obviously you need to track them all through)
In some scenario's it is possible that the Secured column in the FPDB for the SISPart is set to 0 albeit the DVS/SisParts in the remote (shared) VSDB have been backed up. When this happens on earlier (Pre V10.0.2) EV versions the code and SQL Stored Procs dealing with post processing of these records had some shortcomings and could potentially result in massive amounts of processing and never actually achieve anything depending on the range of the sequence numbers to be processed and because of this may never actually get to processing of recently archived items as the process spends all its time potentially looking for legacy content to process. This can end up in a vicious cycle that just gets worse over time.
I am speculating a bit based on the description here, however it is a possibility. Would really need to see some Dtrace as well entries in the DB's to confirm. Suggest raising a support call or upgrade.