05-21-2013 08:03 AM
It would be nice to be able to use load files with metadata AND be able to OCR image files...the way I understand it, you can either do one or the other with CW. However, we recieve loadfiles that don't necessarily include the text files...so it's more advantageous to just load the image files.
Shouldn't there be an option to do both? Is there and I'm missing it?
Thanks!
Solved! Go to Solution.
05-23-2013 12:49 PM
You are corrrect we dont OCR content on load from LFI. If you have the native files collect them into Clearwell as if they were normal files on a disk and then OCR the content
I believe the reasoning behind this is that data imported through LFI should already contain the txt needed in the case and if not then it needs to be handled like a standard collection. You may also talk with whomever created the load file export and ask them to do so with the TXT files so that you do gather the OCR data they extracted. If they did not extract the TDT from the images then you need to process it as you would any content from a drive
05-23-2013 04:13 AM
In the loadfile configuration, you can select, if Text should be taken from the loadfile or extracted from the Files. Select File first and it should do the rest. At least it did with my last NUIX import.
05-23-2013 05:51 AM
Hi, thanks for the response. Haven't had NUIX production yet...this is a concordance loadfile.
What version are you running? In 7.1.2, under the processing tab when adding a loadfile source, I see three choices under Indexable Text:
No text content in load files or text files
Load file contains a link to an external text file:
Text content is in the load file:
And in page 37 of the Load File Import Guide:
OCR
Image files will not be processed for OCR during import, or available for OCR once the file has been imported. If you need the image file to be OCR processed you must import the image file as a native file.
Basically, how I read this: if produced to you without text somewhere, Clearwell won't OCR an image to provide the text...like it would otherwise. So, you have to load the image file loadfile and lose the searchable capability of the document's text or sacrifice metadata and load the images like a regular non-loadfile native image to get OCR. Why in the world would you not be able to do both or have a choice?
Again, I'm probably just missing something very simple!! Will keep looking, thanks eeuerlings!
05-23-2013 12:49 PM
You are corrrect we dont OCR content on load from LFI. If you have the native files collect them into Clearwell as if they were normal files on a disk and then OCR the content
I believe the reasoning behind this is that data imported through LFI should already contain the txt needed in the case and if not then it needs to be handled like a standard collection. You may also talk with whomever created the load file export and ask them to do so with the TXT files so that you do gather the OCR data they extracted. If they did not extract the TDT from the images then you need to process it as you would any content from a drive