Computer Assisted Translation
Computer Assisted Translation (CAT) is often used for localization of applications. In this article we’ll share our experience with a multi-language project. We needed to localize a mobile application to more than 20 languages. Additionally, we needed to translate a website and a server’s strings. We decided that it was acceptable to try machine translation of English to Russian, Indonesian, Arabic and Farsi. This would be enough for our first experience. Such experience would help us learn the most efficient way to translate other languages.
Two Challenges: Large Amount of Words and Different Formats
We thought we could translate the strings quickly. Nevertheless even machine translation applied on just 4 languages was not as fast as we wished it to be. For a better understanding of this problem you need to understand that he application for basic English language contains 2000+ words. Now let’s double it because the app is being developed for two platforms (Android and iOS). Then add nearly 500 words for the server and another 500 words for the English version website.
The second challenge is the absolutely different formats and approaches to how the strings are saved. Android application works with strings.xml files. Each of such file is common XML. An iOS app uses set of files called Localizable.strings. The content of such files has syntax common for Swift programming language. Web pages are stored in common xhtml format. Finally, server strings were stored in json format as we use NodeJS.
The first time we tried to apply a self written utility that used Google translation API service. The main disadvantage was that the utility incorrectly treated variables and special characters. For instance, “%d” was converted to “% d”, “%@” to “٪ @” (for Arabic), “/n” to “/N”, et cetera. We ended up spending too much time editing the translated text. Additionally, the utility was able to work with xml files but not with other formats. So we started to search for other approach.
MateCAT The Translation Tool
There are a lot of tools for translation of content in various formats. But some of them are proprietary and the rest provide too poor a functionality to be really helpful. MateCAT is a 3-year research project funded by the European Union’s Seventh Framework Programme. The project consortium is led by FBK (Fondazione Bruno Kessler). The objective of MateCat is to improve translation workflow by integrating Machine Translation (MT) and human translation within the so-called CAT framework Wiki said.
Image: MateCAT has analyzed the English version and is ready to translate it to Persian and Indonesian
MateCAT Main Benefits
MateCAT combines perfect usability, high translation quality and availability (it’s free and online). We saw the following main benefits of MateCAT that helped us to make a translation quickly and quite qualitatively.
- Supports 68 different formats.
- Allows the upload of several files at once (very useful for iOS *.strings files).
- Supports the splitting up of a large task to subtasks. It is possible to do parallel translations.
- Translates only weighted words. MateCAT omits tags, comments and other overhead information. Destination (translated) file contains both translated content and untouchable overhead data.
- Remembers approved translation for further words.
- Suggests to deliver translation to professional translators, defines terms and cost.
- It is available online.
Files Synchronization Prior to Translation
Before translation be sure your lingual files are synchronized. Our project was translated partly. For example, sets of English and Indonesian words were not equivalent. That is why before translation we needed to synchronize lingual versions. For this task we used:
- BartyCrouch for adding missing strings to iOS files
- a self-written script for Android and other platforms.
Our experience in translation
As mentioned before, we needed to translate the following types of files:
The procedure was strictly the same for each format. It makes no difference to work either with HTML or with Swift. The translation was procedure was as follows:
- Open MateCAT site, press “Add files“, choose English files from local drive.
- Choose languages that need to be translated.
- Go to “Settings” and set “Pre-translate 100% matches from” to ON. Also set other options according to specific translations.
- Press “Analyze” button and wait a little until MateCAT defines how many weighted words to translate.
- Split the job if total sum of words is 300+.
- Press “Translate” and then “Open“.
- Translate each sentence one by one.
- Download translation and check to see if everything is fine (like encoding, etc.)
- Push translated files to GIT repository or a site.
Image: Fragment of MateCAT Translation UI
How to Avoid Translation Problems
MateCAT is cool but you may run into small issues. Here are some suggestions how to translate faster and with higher quality results.
- Check the option “Machine pre-translation” to ON after you upload file(s). It makes translation faster.
- Split a large job to sub-jobs. This makes the translation controllable. Additionally, you may deliver different parts to different executors. Perhaps your team has several native speakers for desired languages.
- Translate to several languages at once if possible. This option is especially useful for a smaller amount of words.
- Do not hurry up. MateCAT proceeds with translation fast but you should be attentive. Confirm each translation (use hotkeys or ‘Translate’ button).
- Ctrl+Shift+Enter (or “T+>>” button) is a very useful feature. You will save a lot of time if machine translation has been translated some strings.
- Use the suggested variants of translation for long sentences. It could be faster than editing a translated sentence by hand.
Of course, there were also some disadvantages we noticed. They are not deal breakers but sad since we had to figure out how to work around them.
- Can’t download translated files in UTF8 encoding.
- Sometimes not all translated strings were saved. It happened mainly for large files. That is why it is better to split large jobs.
- Sometimes MateCAT worked incorrectly with escape characters and quotes. For example:
- \” was changed to “ \
- “ was changed to ‘
- If you rush through the process it is possible some time of your work will be lost 🙂
- Analyzing stops if empty files were uploaded. All files end up needing to be re-uploaded again in this case.
But even these minor issues did not ruin our good impression of MateCAT since we were able to find ways to work around them.
And of course, if you find any better tool for translation please let us know! We would love to test it out. 🙂
Sources of images that are used in the article: