Cloud Translation Blog
5 PDF Translation Tips for Scanned Documents
Scanned PDF translation (Portable Document Format) can be one of the most frustrating activities for those who produce language translations on behalf of their organization or clients.
This is because translating scanned PDF documents often presents a multitude of issues. These are just a few of the issues that affect the outcome and the time spent on a PDF translation:
- The quality of the original document
- The power of the OCR (Optical Character Recognition) tool you use
- The productivity tools included with the machine translation tool you use
However, these issues can be minimized with the right approach and technologies to achieve a high-quality translation and speed up the process required to do so.
Here are 5 tips for PDF translation of business documents. These tips are actionable and will help you prepare your PDF’s for translation in no time. We even recommend the best PDF translation software to use at the end of this article.
Top 5 Tips for Translating PDF Documents
1. Examine scanned PDF translation readiness using Adobe Acrobat
Not all scanned PDF documents can be successfully translated using software. For instance, If your scanned file was scanned at too low of a resolution you’ll need to enhance it. If there are obstructions, you’ll want to remove them. To get the best translation results you may need to improve the quality of your scanned PDF before you translate.
2. Remove all text obstructions
With PDF translation, it’s common to receive sub-par results if your file contains any of the following obstructions that make it difficult for OCR to read the characters in your file:
- Text covered by stamps
- Text covered by watermarks
- Text covered by handwriting
- Drastic creasing that obstructs your text
- Text showing through from the other side
- Another document layered on top of it
- Text within embedded images
- Very low contrast between the background & text
- Graphics on dark backgrounds
- Ornate / heavily scripted text
- Handwritten text (OCR technology cannot yet accurately recognize it)
- Text cut off mid-sentence
Watch the video below for examples of what these look like.
3. Improve the PDF resolution before using machine translation
Translating PDF documents becomes a lot easier when you improve the quality of the resolution prior to uploading them to machine translation software.
This makes it easier for the OCR engine in your translation software to read the text before it’s translated. OCR is what enables the language translation tool to recognize the characters in an image or scanned PDF.
You *might* even be able to bypass *some* of the text obstructions listed in Tip #2 if your PDF has a resolution of 300 DPI or higher.
How To Find Out the Resolution
To find out if your document is high resolution to begin with, check the file size in your finder window / file directory. If it’s high resolution, each page should be approximately 1MB or higher (this isn’t always the case–but most often it is).
How To Improve the Resolution
If you have a hardcopy, rescan the hardcopy of the file into your computer at a resolution of 300 DPI or higher. You can achieve this by opening your scanner’s settings on your computer before you scan the hardcopy, and change the resolution manually.
4. Play Around With File Formats
When it comes to successful PDF translation, sometimes converting a PDF to JPEG or DOC can yield better results.
You might translate the PDF and not be satisfied with it, and then convert and translate a JPEG and see improvements. Some documents will translate better as a DOCX and others will translate better as PDF––try playing around with it.
Disclaimer: No translation software will translate every PDF file. Your translation results will vary from file to file and language to language. Some PDF files are better served by being translated by a human translator.
5. Use Premium Machine Translation Software
Now that you’ve prepped your PDF to be translated, it’s time to consider the translation tool you’ll use. Translating PDF documents becomes easier if you choose the best machine translation software for the job.
Here’s what you’ll want to look for:
- Easy-to-use online interface
- System optimized for PDF’s and Microsoft Office (Excel, Outlook, PowerPoint, Word) files
- A more robust OCR engine than competitive solutions
- Helps you manage translations and collaborate with colleagues across the world
- Dynamic Machine Learning so you easily improve translation quality
- Saves you a significant amount of time and money with long-term use
- Secures your data, is confidential, compliant and has passed a third-party security audit
Get Started with PDF Translation
Pairaphrase is a web-based translation management system for enterprises, and it has everything you see in the list above. It’s optimized for PDF translation and is specifically built to meet the PDF needs of enterprises.
Pairaphrase is a great solution for translating scanned documents. However, there are limitations and you may need to do some testing with the tips above to achieve the best results. After all, the lower quality the document is, the harder it is for Pairaphrase (or any translation solution) to deliver a quality translation.
Choose a package or recurring subscription based on your PDF translation needs.
Or, schedule a live demo with a sales rep today and get your questions answered.
Jun 18, 2020
Jun 18, 2020
May 7, 2020
Mar 13, 2020
Mar 11, 2020