cancel
Showing results for 
Search instead for 
Did you mean: 

Pre-ingestion OCRing

KeithL
Level 3

I've seen a few articles out in the forum along these lines, but they're all quite old so I thought I'd put the question out there again -

Has anybody come up with any clever solutions for OCRing fax documents, graphics, etc before ingesting them into the vault by FSA, SPA, etc?  Alternatively, has anybody heard rumours of this type of functionality being slated for incorporation into EV?

At the moment, the best idea I've come up with is to write a program that goes out and OCRs the relevant filetypes, writing a .txt file with the same base filename in the same directory - that way DA would at least be able to find the .txt file, which could then point the user in the right direction.  I'm hoping somebody has come up with something more elegant though!

Thanks,

Keith

 

 

 

 

 

4 REPLIES 4

Liam_Finn1
Level 6
Employee Accredited Certified

EV has the ability to do some text recognition on PDF and image documents. But this depends on how the document is created

 

Example is a PDF file. If a PDF is created by Adobe PDF writer then the content can be recognized and will be indexed. However if the PDF is created by some PDF Printer app or scanned in by a scanner into a PDF format then in most cases the text cant be recognized as it is see as just an image.

 

KeithL
Level 3

Good point, but I'd thought of that already...  Many of our newer scanned documents are stored in exactly that way but, unfortunately, most of our older stuff is TIFF, JPG, etc.  so that won't help in this particular case.

I suppose I could look at one of the tools that are out there to convert image files into PDF format, while performing an OCR at the same time.  I was just wondering whether someone had come up with a framework to make this work in the context of EV.

Thanks though!

Liam_Finn1
Level 6
Employee Accredited Certified

Sorry I know of no known framework that will perform this for you. If you really need this OCR'd then you need to go about it the long way then archive the result

MichelZ
Level 6
Partner Accredited Certified

Hi

I know that we have investigated on this road, and even had a product called "scan2EV", which would OCR your scans and then ingest them to EV. Not sure if there was other development activies around this. Best bet is probably to contact our EVTools group (http://www.evtools.net), maybe there is something we can do for you.

Cheers


cloudficient - EV Migration, creators of EVComplete.