Cloud Translation Blog
How to Get the Highest Quality Scanned Document Translation
Pairaphrase is a web-based translation management system for enterprises. And one of its features is the ability to translate a scanned document from one language to another while retaining as much of the original layout and formatting as possible.
But in order to get an accurate translation of a scanned document, it needs to be readable by Pairaphrase, which uses OCR (Optical Character Recognition) in its platform to read your file’s text. Otherwise, the PDF translation will be in gibberish instead of your target language.
In this article, we’ll discuss the importance of removing text obstructions and ensuring high resolution of your file in order to get the best results.
Continue reading to learn how to get the highest quality PDF translation with troubleshooting tips from the pros.
How to Optimize a Scanned Document for OCR & Translation
1. Check for Text Obstructions
One of the first steps to ensure you get the highest quality scanned document translation possible is to remove any text obstructions that might exist in your file. Watch the video above to get detailed examples and learn more about why this is important.
Text obstructions can include any of the following:
- Text covered by stamps
- Text covered by watermarks
- Text covered by handwriting
- Drastic creasing that obstructs your text
- Text showing through from the other side
- Another document layered on top of it
- Text within embedded images
- Very low contrast between the background & text
- Graphics on dark backgrounds
- Ornate / heavily scripted text
- Handwritten text (OCR technology cannot yet accurately recognize it)
- Text cut off mid-sentence
2. Confirm Your File is High Resolution
If you have access to the original file, ideally you would start by scanning it in again at a higher resolution (300 DPI or higher). Otherwise, it might not be readable enough–even if it visually appears to have great resolution when you look at the file on your computer.
Hint: To find out if your document is high resolution, check the file size in your finder window / file directory. Each page should be around 1MB or higher if it’s high resolution (this isn’t always the case–but most often it is).
How to Scan Your Document in at a Higher Resolution
If at all possible, try to get your hands on the original paper document and re-scan it in at a high resolution. Set the scan at a high enough resolution to be read by Pairaphrase (again, aim for 300+ DPI). This is in your scanner’s settings.
Aim for a 300-600 DPI resolution for the best results. The file size will be fairly large, but you will get a much better translation if you go this route.
The scanner software on your computer will have settings that you can change. This means you can make changes in the settings to scan the document in as a PDF and at high resolution.
Aim for a 300-600 dpi resolution for the best results, and save the file after you scan it.
How to Translate a Scanned PDF That Has Been Optimized
Upload your PDF to Pairaphrase by setting the source and target language.
Then, click “Translate File.”
When the file status says “Draft,” hover over the gear icon to the right and select “Open.” You will be brought to the Active Documents tab. Find the “Actions” button in the upper-right-hand corner of the screen and hover over it. Next, select “Download Original.” This will show you how Pairaphrase has read your PDF.
Even with a high-resolution file, you might still see some misinterpretations of characters. For instance, there might be a couple of spaces missing between characters or a “v” read as an “r.”
However, most of the file characters will have been interpreted correctly if you optimize your file in the ways we’ve specified. Users can edit and correct any OCR mistakes and translate the file again.
But this just goes to show you how fragile scanned documents are.
Is there an Easier Way to Get High-Quality Translation of a Scanned Document?
Unfortunately, this is currently the only way to prepare a scanned file for the highest quality translation possible. This is why we recommend getting your hands on the original digital file that was eventually printed as the paper version of itself, if at all possible.
We hope this tutorial helps you to translate scanned documents more successfully and accurately in the future.
Please visit Pairaphrase to learn more about our web-based translation management system for enterprises.
Aug 20, 2021
Aug 18, 2021