How to Get the Highest Quality Scanned Document Translation with OCR
Need to translate scanned documents but frequently encounter quality issues? When you translate a scanned PDF document from one language to another, it first needs to be readable by your file translator.
In this post, we’ll use Pairaphrase as an example (the web-based translation management system for enterprises). Like most scanned PDF translators, it uses OCR (Optical Character Recognition) to read your scanned PDF file’s text. Without OCR, the translation result will look like gibberish.
In this article, we’ll focus on the importance of resolution and text clarity in order to achieve the most accurate scanned document translations. Continue reading to learn how to get the highest quality PDF translation with these OCR troubleshooting tips.
How to Optimize a Scanned Document for OCR & Translation
1. Check for Text Obstructions
One of the first steps to ensure you get the highest quality scanned document translation possible is to remove any text obstructions that might exist in your file. Watch the video above to get detailed examples and learn more about why this is important.
Text obstructions can include any of the following:
- Stamps that cover text
- Watermarks that cover text
- Handwriting that covers text
- Drastic creasing that obstructs your text
- Text showing through from the other side
- Another document layered on top of it
- Text within embedded images
- Very low contrast between the background & text
- Graphics on dark backgrounds
- Ornate / heavily scripted text
- Handwritten text (OCR technology cannot yet accurately recognize it)
- Text cut off mid-sentence
2. Confirm Your File is High Resolution
If you have access to the original file, scan it in again at a higher resolution (300 DPI or higher). Otherwise, it might not be readable enough. Even if it visually appears to have great resolution when you look at the file on your computer.
Hint: To find out if your document is high resolution, check the file size in your finder window / file directory. Each page should be around 1MB or higher if it’s high resolution (this isn’t always the case–but most often it is).
How to Scan Your Document in at a Higher Resolution
If at all possible, try to get your hands on the original paper document and re-scan it in at a high resolution. Set the scan at a high enough resolution to be read by Pairaphrase (again, aim for 300+ DPI). This is in your scanner’s settings.
Aim for a 300-600 DPI resolution for the best results.
The file size will be fairly large, but you will get a much better translation if you go this route.
The scanner software on your computer will have settings that you can change. This means you can make changes in the settings to scan the document in as a PDF and at high resolution. Aim for a 300-600 dpi resolution for the best results, and save the file after you scan it.
How to Translate a Scanned PDF That Has Been Optimized
- Upload your PDF to Pairaphrase by setting the source and target language.
- Click “Translate File” and wait until the file status says “Draft.”
- Hover over the gear icon to the right and select “Open.”
- In the next screen (Active Documents), hover over the “Actions” button in the upper-right-hand corner.
- Select “Download Original.” (This will show you how Pairaphrase has read your PDF.)
Even with a high-resolution file, you might still see some misinterpretations of characters. For instance, there might be a couple of spaces missing between characters or a “v” read as an “r.”
However, most of the file characters will have been interpreted correctly if you optimize your file in the ways we’ve specified. Users can edit and correct any OCR mistakes and translate the file again.
But this just goes to show you how fragile scanned documents are.
Is there an Easier Way to Get High-Quality Translation of a Scanned Document?
Unfortunately, this is currently the only way to prepare a scanned file for the highest quality translation possible. This is why you should get your hands on the original digital file that was eventually printed as the paper version of itself (if at all possible).
We hope this tutorial helps you to translate scanned documents more successfully and accurately in the future.
Translate Scanned Documents with Powerful OCR
With Pairaphrase, you can translate a scanned document into more than 100 languages and 10,000+ language pairs. This includes English, Spanish, French, German, Italian, Portuguese, Arabic, Hindi, Swedish, Dutch, Korean, Japanese, Chinese and more.
Schedule a demo or share this article with a colleague.
Learn all about terminology management in this comprehensive guide.
Smartling Alternative for 2022 Onward
Want a Smartling alternative? Explore 10 possible reasons you need one, and why Pairaphrase might be your best alternative.
How to Translate XLIFF Files Online Successfully
Need to translate XLIFF files online? Learn why Pairaphrase is the best way to translate an XLIFF file.
Best Japanese to English Document Translation Software
Looking for Japanese to English translation software, but not sure what features you need? Access this buying guide.
How to Translate SRT Files Effectively
Learn how to translate SRT files in 2022! Explore the reasons Pairaphrase is the best way to translate an SRT file effectively.
Best English to Japanese Document Translation Software
Looking for English to Japanese translation software, but not sure what features you need? Access this buying guide.
Brochure Translation Software Features to Look for
Need brochure translation? Look for these 10 top brochure translator features (before you buy).
Best Way to Translate Articulate Course Files
Need to translate Articulate course files? Learn why Pairaphrase is the best way to translate Articulate files (XLIFF 2.0).
Best Way to Translate Multiple Files Simultaneously Online
Need to translate multiple files simultaneously online? Learn why Pairaphrase is best for batch translation.
Translating with Amazon Translate (AWS) Using Pairaphrase
Curious about translating with Amazon Translate (AWS)? Learn about using Amazon’s machine translation engine with Pairaphrase.
Best English to Hindi Document Translation Software
Looking for English to Hindi translation software, but not sure what features you need? Access this buying guide.
How to Translate Large Files with Ease
Need to translate large files? Learn about the best tools to use when translating large PDF’s, documents & more (10MB+).
Catalog Translation Software Features to Look for
Need catalog translation? Look for these 10 top catalog translator features (before you buy).
Best English to Vietnamese Document Translation Software
Looking for English to Vietnamese translation software, but not sure what features you need? Access this buying guide.
How to Create Effective Termbase Glossaries for Machine Translation
Need to create a termbase glossary? Get our top 5 tips for creating effective termbase glossaries for machine translation.
Best English to Korean Document Translation Software
Looking for English to Korean translation software, but not sure what features you need? Access this buying guide.
DeepL Alternative for 2022 Onward
Explore various DeepL drawbacks and user-specific needs you might identify with. This way, you can conclude whether alternatives to DeepL could fill these gaps.
Most Accurate Translator: How to Get it
While a 100% accurate translator does not yet exist, this post will tell you how to get the most accurate translator tailored to your company’s words and phrases.
2022 Translation Industry Trends to Anticipate
Explore 2022 translation industry trends! Learn about 10 translation industry trends 2022 will bring, according to our predictions »
Secure Transcription Software That Translates Languages
Need secure transcription software that translates languages? Learn what features to look for and how Pairaphrase will benefit you.
Best English to Portuguese Document Translation Software
Looking for English to Portuguese translation software, but not sure what features you need? Access this buying guide.
Translation KPIs for Translation Management Success
Establishing translation KPIs (Key Performance Indicators)? Here are the 5 Most Important KPIs for translation management success.
5 Things Pairaphrase Cannot Do For You
Pairaphrase is a powerful tool, but it is not a replacement for the human translation and post-editing review process. Read this before you buy.