What Is Post-Editing? Here Are 3 Approaches

by Isabella Massardo | Aug 8, 2018

Post-editing of Machine Translation (PEMT) wasn’t born yesterday (MT and post-editing is mature in Wordbee, for example). On the contrary, it’s as old as machine translation (MT) itself. And although at the moment, we have a large amount of material at our disposal about this topic, the nuances of the discussion are such that in some cases we risk losing sight of what PEMT really is.

PEMT: No Need for Creativity

Let’s start by saying that PEMT has nothing to do with revision. Nor does it require the “creativity” of, let’s say, transcreation. In spite of all the articles written on the subject and a brand-new ISO standard, up to now the most accurate definition of post-editing comes from the 2010 TAUS Post-editing in Practice report: “Post-editing is the process of improving a machine-generated translation with “a minimum of manual labour.”

The keywords in this definition are “a minimum of manual labour.”

While revision is based on a contrastive analysis of source and target texts and requires the reviser to check and edit terminology, style, and grammar, PEMT is characterized by higher productivity and limited cognitive bilingual effort. The main changes will concern mechanical errors (capitalization and punctuation), grammar errors, terminology inconsistencies (e.g. missing words), and other issues that are often the product of a poor source text and result in poor readability of the target text. A post-editor is not expected to rewrite entire sentences (unless those sentences are obvious nonsense or contain word salads), so they should only amend what’s necessary to make a sentence clearer to the reader.

The skills that distinguish a reviser from a post-editor are also different: A reviser must have a sound knowledge of both source and target languages, of translation techniques, and of a specific domain, but a post-editor, on the other hand, may even be monolingual. No matter what, though, they must have a strong knowledge of the target language and of the specific domain, and, ideally, an idea of how machine translation works.

3 PEMT Approaches in Practice

PEMT and the Enterprise

Let’s take an enterprise that has developed its own engine. The PEMT task could take place in a CAT environment or, in the case of enterprises that have their own MT engines but no translation department as such, it could be entrusted to external language service providers (LSPs).

Because in this instance we’re dealing with a customized engine, the MT output will be of high or good quality. The PEMT guidelines will be very specific and rigorously based on the error typologies produced by the engine in question. It will be necessary to indicate the level of PEMT necessary (light or full post-editing), and what the purpose of the text and the target group are. Glossaries are essential if the MT engine has just been put into use and has shown some terminological teething problems.

PEMT and the LSP

Only a few LSPs have the financial and technical resources needed to develop client-specific MT engines. Most LSPs will resort to using vertical (domain-specific) engines developed by MT technology providers and available in SaaS mode according to a pay-per-use model. The MT output will be sent to internal or external post-editors. Alternatively, post-editors might receive an API key to use a vertical MT engine in a CAT tool environment. In this specific case, post-editing becomes an interactive task.

Some LSPs will pre-translate a source text with a general MT engine, for example Google Translate or DeepL. This is a viable financial choice when starting out with MT, translating small texts or, again, facing a lack of financial/technical resources.

In this approach, because the post-editing level and goals will change from project to project or from client to client, LSPs always need to provide information about the final use of the translation and accurate guidelines on how to conduct the task. PEMT projects could be split among many post-editors: The specificity and strictness of the guidelines will ensure a certain level of consistency. It’s also important to provide a client-specific glossary to reach a consistent use of terminology, especially in the case of a public engine.

PEMT and the Freelancer

Gone are the days of freelancers’ rage against machine translation. Nowadays, most freelancers use MT as a helpful tool that provides translation suggestions. The choice is usually Google Translate or DeepL (web version or with an API key).

There are not precise PEMT guidelines in this instance. The freelancer using one or more general MT systems is free to decide which MT tool to use, how to use it, and how much to use of the MT output. From an ethical point of view, they should inform the client about the use of a public MT engine, or in any case, ask the client if there are specific criteria that might prevent the use of public MT engines. Think medical records, legal documents (involving sensitive or personal data; in one word GDPR), and confidential or IP-protected documents.

One thing to remember: When using a public MT engine through an API key in a CAT tool environment, the segments containing MT output in some cases might be tagged with AT or MT, therefore revealing their origins.

PEMT in the Translation Workflow

It is also worth noting that an automatic translation is not something created by a machine with free will and an independent (albeit electronic) brain. An MT output (the technical term for automatic translation) is generated by stochastic calculation, like with statistical and neural machine translation, by an algorithm; the algorithm exploits bilingual translation corpora and, more generally, language data produced by humans. There can be various reasons for the high or low quality of an MT output: lack of clean language data, insufficient technical and financial resources for the development of an MT system, inadequate quality of the source text etc… But the common factor in all these reasons is due to humans.

PEMT should replace the two main phases of the TEP (translation, editing, proofreading) workflow. In order to implement an MT engine, it’s necessary to adapt the translation workflow to this technology and make sure that your TMS has what it takes to manage the PEMT phase efficiently.

If you have developed your own MT engine, your TMS provider should give you the ability to connect it through an API key.
Post-editors should have linguistic resources at hand: spell-check dictionaries, termbases, and the ability to upload the PEMT guidelines as reference material.

Project managers should have at their disposal functionalities like automated QA and 100% match blocking, adding labels to specific segments, and so on.

Training post-editors

The PEMT courses available nowadays provide a general training on how machine translation works and give a few examples of the differences between revision and post-editing. There’s no need to provide long trainings using the MT output of Google Translate and DeepL, as the differences in many cases are minimal.

Whether you’re an LSP or an enterprise, to help your post-editors to become more efficient it’s important to provide a second level of training based on customized engines, with specific domain, language pairs, and text typology.

Don’t forget the basics of automatic post-editing: Instruct your post-editors on which functionalities and controls can be done on the MT output, for example how to visualize suggestions and how to modify them within your work environment.