Cloud Translation Blog
How to Get the Highest Quality Scanned Document Translation Using OCRPosted on: Jul 13, 2018
Pairaphrase is a web-based translation management system for enterprises. And one of its features is the ability to translate a scanned document from one language to another while retaining as much of the original layout and formatting as possible.
But in order to get an accurate translation of a scanned document, it needs to be readable by Pairaphrase. Otherwise, the PDF translation will be in gibberish instead of your target language.
This is where OCR and resolution come into play.
Continue reading to learn how to get the highest quality PDF translation using OCR and high resolution scanning.
How to Optimize a Scanned Document for Translation
OCR stands for Optical Character Recognition. And this is the technology that allows your computer to read text characters in an image or PDF.
OCR needs to be applied to your scanned document before you upload it to Pairaphrase.
If you have access to the original file, ideally you would start by scanning it in again at a higher resolution. Otherwise, it might not be readable enough–even if it visually appears to have great resolution when you look at the file on your computer.
So in summary, before you upload a scanned document to Pairaphrase, you need to:
- Set the scan at a high enough resolution to be read by Pairaphrase (again, aim for 300-600 dpi). This is in your scanner’s settings.
- Apply OCR
Here’s more information on how to achieve this.
1. Re-Scan the Original Document at High Resolution
The higher the resolution, the better translation results. So if at all possible, try to get your hands on the original paper document and re-scan it in at a high resolution.
Aim for a 300-600 dpi resolution for the best results. The file size will be fairly large, but you will get a much better translation if you go this route.
The scanner software on your computer will have settings that you can change. This means you can make changes in the settings to scan the document in as a PDF and at high resolution.
Aim for a 300-600 dpi resolution for the best results, and save the file after you scan it.
2. Apply OCR to the Scanned Document
Some scanners automatically OCR your document as you scan it in as a PDF, but many others don’t. It’s safer to assume yours doesn’t.
For those that don’t automatically OCR your document, you’ll need to download Adobe Acrobat and open your scanned document with the program.
To do this, navigate to Tools>Enhance Scans. Make sure the language setting is correct.
If your document’s source language is English, you don’t need to make any changes to the language setting. This is because the default setting in Adobe Acrobat is English.
However, you would need to change the language if your source language wasn’t English.
Next, find the Recognize Text button and click it to then select “In this File.” Save the file and then you can continue to Pairaphrase to upload it for translation.
This process will help you achieve a higher-quality translation, because the source file will now be set up to be read properly by Pairaphrase.
How to Translate a Scanned PDF that’s Been Optimized with OCR
Upload your PDF to Pairaphrase by setting the source and target language.
Then, click “Translate File.”
When the file is done translating, you will be brought to the Active Documents tab. Once the Status says “Draft,” you can hover over the gear icon and select “Download Original.” This will show you how Pairaphrase has read your PDF.
Even with a high-resolution file that has been OCR’d, you might still see some misinterpretations of characters. For instance, there might be a couple of spaces missing between characters or a “v” read as an “r.”
However, most of the file characters will have been interpreted correctly if you optimize your file in the ways we’ve specified. Users can edit and correct any OCR mistakes and translate the file again.
But this just goes to show you how fragile scanned documents are.
Is there an Easier Way to Get High-Quality Translation of a Scanned Document?
Unfortunately, this is currently the only way to prepare a scanned file for the highest quality translation possible. This is why we recommend getting your hands on the original digital file that was eventually printed as the paper version of itself, if at all possible.
We hope this tutorial helps you to translate scanned documents more successfully and accurately in the future.
Please visit Pairaphrase.com to learn more about our web-based translation management system for enterprises, or request a free demo today!
Nov 15, 2018
Sep 18, 2018