FAQ Help Center

Category: How to's Termbase Glossary Tips

How to Create Effective Termbase Glossaries for Machine Translation

Need to create a termbase glossary for machine translation? Glossaries can really make a big difference in your translation quality. You can improve translation quality by using software that directly inserts your company’s terms into your machine translations.

In other words, creating a term base glossary for machine translation is well worth your time. However, there are some key tips and notes to keep in mind during translation glossary development if you want it to properly serve its purpose in terminology management.

That’s why we compiled this list of 5 tips for creating termbase glossaries for machine translation.

5 Tips for Creating Term Base Glossaries for Machine Translation

1. Machine translation glossaries are different from human translation glossaries

If your company is new to using machine translation within a translation management system, this is important to know. 

Up until the past several years, companies always developed glossaries for human translators to work with while using translation memory software. Now as artificial intelligence continuously improves machine translation, we have to rethink how to build those glossaries. It requires a different strategy for machine translation.

Now that you know this key fact, we’ll continue on with more specific glossary development tips.

2. Be precise when creating translation glossaries for machine translation

In machine translation glossary development, terms and their translations need to be exact matches. In other words, they can’t be ambiguous or have more than one meaning. 

You can’t include more than one translation for one term, because the machine translation engine won’t be able to discern which translation to use. Only include terms in the glossary that have one meaning, to be used one way.

Example: “assembly”

“Assembly” is a term that is sometimes used as a verb (the action of putting together components) and sometimes used as a noun (a unit composed of multiple parts). There can also be multiple meanings within one grammatical category. For example, “assembly” as a noun can also refer to a gathering of people in one place. In short, assembly also exists as multiple nouns.

3. Include various word shapes in your glossary

While the following logic doesn’t currently apply to building glossaries for all translation engines, it applies to building one for Amazon’s machine translation engine. Amazon allows you to use glossaries to create custom translations.

Put simply, a “word shape” means singular vs plural, initial cap vs lower case, versus all caps. 

Example: Eats, eat, eaten and ate

When developing a machine translation glossary, you’ll need to be mindful of word shapes. The more word shapes you include in your termbase, the better matches you receive. 

To produce custom translations with Amazon’s translation engine you’ll need to include all the different word shapes for each term. If you want Amazon’s engine to replace the term you lookup from your text, it needs to exist in that exact same word shape within your glossary.

For example, if the term in the glossary is initial caps (“Eat”), but the term in the text is lowercase (“eat”), it won’t pick it up. See tip #5 for more on the subject of replacing terms.

4. Understand the difference between glossaries and translation memories

What is a termbase / glossary?

A termbase is a bilingual (source and target) repository in CSV or TBX file format. It contains specific terms and terminology that regularly appear in a company’s content. Termbases typically include product names, company names, brand names, acronyms and other repetitive terminology. 

Termbases are also known as Glossaries. Once your glossary has been imported to a translation management system, you can look up terms while you are editing and reviewing translations. Some systems will automatically replace terms in the translation for you.

What is a translation memory?

A Translation Memory is a bilingual (source and target language) repository in TMX file format. It consists of text that has been translated from one language into another, and it is stored on a server for future reuse. Translation Memories can deliver nearly instant quality improvements and reduce translation costs and effort.

What’s the difference between translation memory and a glossary?

A translation memory consists of sentences, phrases and segments, whereas a glossary consists of keywords, names and acronyms.

5. You can now use termbases for more than term lookups

Translation management systems are beginning to enable users to leverage their termbase glossaries for more than just looking up terms. 

For example, Pairaphrase now allows users to also insert glossary terms into the text during a term lookup. This enhances productivity within the human translation workflow while improving the quality and consistency of translations. So when you create a translation glossary, take some extra time to include as many terms as possible. 

This upfront investment will significantly reduce translated-related costs down the line.

According to Pairaphrase CTO Rick Woyde in an interview with Enterprise Viewpoint, “the next wave of AI-led development will include the ability to use terminology glossaries to interactively improve machine translations and create custom translations.” This is something to keep in mind as you build your glossary, which your company will use for years to come. 

Learn more about what’s next in machine translation

Get More Out Of Your Translation Glossary File with Pairaphrase

Want a translation management system that can insert glossary terms into your text during a term lookup? Pairaphrase delivers a suite of easy-to-use (yet powerful) translation tools in a secure web-based ––all while enhancing translation productivity. We also make it possible to upload termbases via CSV.

Schedule a Demo or share this article with a colleague.

Recommended Posts