cancel
Showing results for 
Search instead for 
Did you mean: 

"Unable to convert item content" for some PDF files in EV 7.5

Johnmen
Level 3
Hi there,

Every day there were some PDF files in emails didn't get converted into HTML / Text for indexing on my client's EV server.  Here is one example:

--------------------------------------------------------------------------------------------------
Event Type: Warning
Event Source: Enterprise Vault Converters
Event Category: Storage Archive 
Event ID: 6629
Date: 11/07/2010
Time: 7:00:56 p.m.
User: N/A
Computer: EV_Server
Description:
Unable to convert item content 
 
Reason: Unspecified error  [0x80004005] 
Supplementary Info: file is corrupt (9) [OHD] 
 
Item: Saveset Message for Mailbox /O=Russell/OU=NET/cn=Recipients/cn=Audy.Smith, Message titled: Online Travel Request - Client Statement - for Mr John Smith  281313 - Itinerary Commences 23 Mar 2010 
Subject: Online Travel Request - Client Statement - for Mr John Smith  281313 - Itinerary Commences 23 Mar 2010 
Attachment: Client_Statement_AKG280313_3314.pdf 
Type: pdf 
 
This item will be archived without a preview being available to the web application and the  content will not be indexed. It is not possible to search on the content but the item can be restored  as normal 
 
For more information, see Help and Support Center at http://evevent.symantec.com/rosetta/showevent.asp
--------------------------------------------------------------------------------------------------

I have back tracked the warnings back to the last week, every day there appears to be one or more warnings logged against some PDF files about unable to be converted into HTML hence indexing on them failed.  So far only PDF file type is affected.
 
According to Symantec (File types that Enterprise Vault can convert for indexing:( The Storage Service accepts items for archiving from the archiving tasks.  If possible, it generates a text or HTML version of each item, which the Indexing Service uses to compile indexing data for the item.  Those warnings indicate that searching affected PDF files will not work since no index data was generated for them.  However, the archiving was still successful hence they can be restored as normal.  It is not a big issue, and will only affect the search operations against those affected PDF files.
 
In regards to why the PDF content conversions were failed on those files, it is currently unknown to us.  Because it is not happening to all PDF files, the most likely cause is that those PDF files are different from normal PDF files which caused problem to EV Storage Service in the conversion process.
 
The difference can be brought in by a number of factors, some includes:
  • PDF was generated by 3rd party app, not the Adobe Acrobat.
  • PDF contains large amount of data or complex content (not our case since all affected PDFs are small)
  • PDF contains images (not our case since some affected PDFs don't have any image in them)

I would like to know if I want to get to the bottom of the issue, how do I go about investigating it?

Regards!
Johnmen
1 REPLY 1

Rob_Wilcox1
Level 6
Partner
Your best bet is to contact Support, and work through a Support case with them.

We have tools available which can "just" convert the PDF to Text/HTML.  
Working for cloudficient.com