cancel
Showing results for 
Search instead for 
Did you mean: 

Lack of Loadfile import AND OCR

It would be nice to be able to use load files with metadata AND be able to OCR image files...the way I understand it, you can either do one or the other with CW.  However, we recieve loadfiles that don't necessarily include the text files...so it's more advantageous to just load the image files.

Shouldn't there be an option to do both? Is there and I'm missing it?

 

Thanks!

1 Solution

Accepted Solutions
Accepted Solution!

You are corrrect we dont OCR

You are corrrect we dont OCR content on load from LFI. If you have the native files collect them into Clearwell as if they were normal files on a disk and then OCR the content

 

I believe the reasoning behind this is that data imported through LFI should already contain the txt needed in the case and if not then it needs to be handled like a standard collection. You may also talk with whomever created the load file export and ask them to do so with the TXT files so that you do gather the OCR data they extracted. If they did not extract the TDT from the images then you need to process it as you would any content from a drive

View solution in original post

3 Replies

In the loadfile

In the loadfile configuration, you can select, if Text should be taken from the loadfile or extracted from the Files. Select File first and it should do the rest. At least it did with my last NUIX import.

 

Hi, thanks for the response.

Hi, thanks for the response. Haven't had NUIX production yet...this is a concordance loadfile.

 

What version are you running? In 7.1.2, under the processing tab  when adding a loadfile source, I see three choices under Indexable Text:

 

No text content in load files or text files

Load file contains a link to an external text file:

Text content is in the load file:

 

And in page 37 of the Load File Import Guide: 

OCR

Image files will not be processed for OCR during import, or available for OCR once the file has been imported. If you need the image file to be OCR processed you must import the image file as a native file.

 

Basically, how I read this: if produced to you without text somewhere, Clearwell won't OCR an image to provide the text...like it would otherwise.  So, you have to load the image file loadfile and lose the searchable capability of the document's text or sacrifice metadata and load the images like a regular non-loadfile native image to get OCR.  Why in the world would you not be able to do both or have a choice?

 

Again, I'm probably just missing something very simple!! Will keep looking, thanks eeuerlings!

 

Accepted Solution!

You are corrrect we dont OCR

You are corrrect we dont OCR content on load from LFI. If you have the native files collect them into Clearwell as if they were normal files on a disk and then OCR the content

 

I believe the reasoning behind this is that data imported through LFI should already contain the txt needed in the case and if not then it needs to be handled like a standard collection. You may also talk with whomever created the load file export and ask them to do so with the TXT files so that you do gather the OCR data they extracted. If they did not extract the TDT from the images then you need to process it as you would any content from a drive

View solution in original post