Alfresco uses Open Office to convert documents to PDF but by default it doesn’t generate tagged PDF. This note describes how to configure Alfresco so that it does produce tagged PDF.
So what is a tagged PDF? Well, it’s a PDF that contains structural information about the content, e.g reading order, the presence of tables etc. This allows screen-readers to read the PDF document – it makes the PDF accessible. In order to get the most out of the conversion process, as much structural information as possible needs to be present in the original document. I came across these recently when doing some work for a local authority that is using Alfresco.
So how do you configure Alfresco to produce tagged PDF? Open up the file ‘openoffice-document-formats.xml’, which is located in <tomcat_home>/webapps/alfresco/WEB-INF/classes/alfresco/mimetype/, locate the Portable Document Format document format section (it should be at the top of the file) and modify it so it looks like this:
<document-format><name>Portable Document Format</name> <mime-type>application/pdf</mime-type> <file-extension>pdf</file-extension> <export-filters> <entry><family>Presentation</family><string>impress_pdf_Export</string></entry> <entry><family>Spreadsheet</family><string>calc_pdf_Export</string></entry> <entry><family>Text</family><string>writer_pdf_Export</string></entry> </export-filters> <export-options> <entry><string>EnableTextAccessForAccessibilityTools</string><boolean>true</boolean></entry> <entry><string>UseTaggedPDF</string><boolean>true</boolean></entry> </export-options> </document-format>
Restart Alfresco. That’s it! The next time you convert a document to PDF it should be tagged. You can test that the conversion worked (on the Mac) by using Adobe Reader 8.0. Open up the PDF file. Go to Document -> Security -> Show Security Properties. Click on the ‘Description’ tag. The ‘Tagged PDF’ entry should be set to ‘Yes’ if the conversion worked correctly. You can also check the document for accessibility by clicking on Document -> Accessibility Quick Check.
You can download the modified configuration file here.
I am not familiar with http://www.alfresco.com/ – could you meial me a Tagged PDF file so I might examine it ? Of course, be sure there are objects in the PDF that have useful tags !