cancel
Showing results for 
Search instead for 
Did you mean: 

Lack of Loadfile import AND OCR

ODS_Rob
Level 3

It would be nice to be able to use load files with metadata AND be able to OCR image files...the way I understand it, you can either do one or the other with CW.  However, we recieve loadfiles that don't necessarily include the text files...so it's more advantageous to just load the image files.

Shouldn't there be an option to do both? Is there and I'm missing it?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Liam_Finn1
Level 6
Employee Accredited Certified

You are corrrect we dont OCR content on load from LFI. If you have the native files collect them into Clearwell as if they were normal files on a disk and then OCR the content

 

I believe the reasoning behind this is that data imported through LFI should already contain the txt needed in the case and if not then it needs to be handled like a standard collection. You may also talk with whomever created the load file export and ask them to do so with the TXT files so that you do gather the OCR data they extracted. If they did not extract the TDT from the images then you need to process it as you would any content from a drive

View solution in original post

3 REPLIES 3

eeuerlings
Level 2

In the loadfile configuration, you can select, if Text should be taken from the loadfile or extracted from the Files. Select File first and it should do the rest. At least it did with my last NUIX import.

 

ODS_Rob
Level 3

Hi, thanks for the response. Haven't had NUIX production yet...this is a concordance loadfile.

 

What version are you running? In 7.1.2, under the processing tab  when adding a loadfile source, I see three choices under Indexable Text:

 

No text content in load files or text files

Load file contains a link to an external text file:

Text content is in the load file:

 

And in page 37 of the Load File Import Guide: 

OCR

Image files will not be processed for OCR during import, or available for OCR once the file has been imported. If you need the image file to be OCR processed you must import the image file as a native file.

 

Basically, how I read this: if produced to you without text somewhere, Clearwell won't OCR an image to provide the text...like it would otherwise.  So, you have to load the image file loadfile and lose the searchable capability of the document's text or sacrifice metadata and load the images like a regular non-loadfile native image to get OCR.  Why in the world would you not be able to do both or have a choice?

 

Again, I'm probably just missing something very simple!! Will keep looking, thanks eeuerlings!

 

Liam_Finn1
Level 6
Employee Accredited Certified

You are corrrect we dont OCR content on load from LFI. If you have the native files collect them into Clearwell as if they were normal files on a disk and then OCR the content

 

I believe the reasoning behind this is that data imported through LFI should already contain the txt needed in the case and if not then it needs to be handled like a standard collection. You may also talk with whomever created the load file export and ask them to do so with the TXT files so that you do gather the OCR data they extracted. If they did not extract the TDT from the images then you need to process it as you would any content from a drive