Thanks a lot for the contribution and demonstration. As a French, I was quite sceptical reading this thread, and you exposed the key issues magnificently. DITA adds semantic markup in the technical documentation, and does it well, but modeling natural language
is a different matter entirely.
At NXP we are considering using variables for product names, but they consist in type numbers (such as TEA1721AT), not actual words. I keep looking for potential pitfalls, but so far I found none. I plan to
collaborate with our local offices (mainly in Asia) to provide the translation agencies with language specific guidelines regarding the translation of these variables.
From: firstname.lastname@example.org [mailto:email@example.com]
On Behalf Of Andrzej Zydron
Sent: Sunday, January 20, 2013 10:48 PM
Subject: Re: [dita] Product names and reuse: a very serious anti-pattern when translating documents
Hi Troy, Mark and Kristen,
Thank you for your questions, replies and comments, and apologies for my tardiness in replying: I have been very busy last week.
The short answer is:
1. Coping with noun inflection changes will add between 10% to 20% to your translation costs.
2. You cannot escape the adjectival agreement trap which will produce ungrammatical output in most languages.
The cost of translating into one language will cost roughly the same as writing the original. If you translate into the typical 21 to 40 languages then the increase in cost will be substantial. You are creating a rod for your own back.
The long answer:
What is suggested is a very serious anti-pattern. As I laid out in my previous post, what is proposed works reasonably well in English and possibly a few other languages, like Mandarin, that have a primitive morphology. These languages are unfortunately atypical.
Languages with a primitive morphology belong predominantly to a category of language termed creole: they are formed by a fusion of two or more languages. The English we use today was formed during the 15th century by a fusion of medieval French and old English.
The impact of French on the English that we use today should not be understated - it was immense.
The vast majority of human languages have a rich, or in the case of Slavonic languages an extremely rich morphology. English nouns do not have gender association and their morphology is only expressed in the possessive and plural forms. An obvious consequence
of this is that word order in sentences is of paramount importance, which is not true of morphologically rich languages.
Let us now move on to the substantial flaw that is caused by treating product names, or any other noun, as a variable when it comes to translation: the noun inflection and the adjectival agreement trap.
1. Noun inflection
The only inflections for nouns in English is the possessive and plural forms. Other languages can have many more forms depending on the role that the noun is playing in the sentence. Take my mother tongue, Polish. There are 7 noun cases in Polish: nominative,
genitive, dative, accusative, instrumental, locative and vocative, each with a possible different ending. It is very difficult for monolingual English speakers to grasp the fact that nouns can have so many different forms. Why is this a problem for automatic
noun substitution? The answer is a great deal: 7 does not go into 2 (English nominative and possessive). Let is look at a practical example in the following sentence where the noun 'spanner' in Polish is 'klucz':
Please undo the bolt using a spanner.
Proszę odkręcić śrubę kluczem.
Please note the inflection of the noun in Polish as it takes on its instrumental form, which results in adding en 'em' ending. You can redo the translation so that spanner uses the nominative form:
Używając klucz, proszę odkręcić śrubę.
The English equivalent is:
Using a spanner please undo the bolt.
This imposes an extra burden on the translator which you will have to pay for. The translator has to rearrange the sentence: this is an extra task and will increase the cost of translation. It can also result in a very strange style for the document as a whole.
2. Adjectival agreement
Nouns in English do not express gender. This is quite unique. Most other languages associate a particular gender with each noun and require that any adjective accompanying a noun has to agree in terms of both gender and in most instances also with regard to
case. Let us take a simple example of automotive product names from Ford of Europe: Fiesta, Mondeo and Focus. Let us also take the example of Polish, which is typical of all Slavonic languages. Nouns in Polish can have three genders: masculine, feminine and
Fiesta in Polish is automatically assigned feminine gender because it ends with an 'a'. Mondeo is automatically associated with neuter as it ends in an 'o'. Focus is masculine, mainly because in ends in neither 'a' nor 'o'. Now let us look at the simple noun
phrase 'new model':
a) Nowa Fiesta
b) Nowe Mondeo
c) Nowy Focus
Please note that all three models force different endings on the adjective 'new'. Add to this the fact that the adjective will also have to take on the inflection of the noun we have the following examples:
Driving the new 'model' is a great experience:
a) Jazda nową Fiestę jest wspaniałym przeżyciem.
b) Jazda nowym Mondeo jest wspaniałym przeżyciem.
c) Jazda nowym Focus'em jest wspaniałym przeżyciem.
As you can see, even if we forced the use of the nominative case for the model name, which may result in a stilted translation, we cannot escape the gender trap. You will end up with ungrammatical text, which depending on your target audience may not be the
result you desired.
The examples given above were obviously for a single target language, but Polish is fairly typical of most morphologically rich languages. Other languages have different traits, such a Finnish which has 15 inflections for nouns, no gender but requires adjectival
agreement. French has an even more primitive noun morphology than English, but has a very strong gender requirement on adjectives and particles, e.g. nouveau, nouvelle, du, de la, le, la. Hebrew also requires adjectival agreement for gender and has three cases
for nouns as does unsurprisingly Arabic.
To sum up it is ill advised to use any mechanism to provide for individual word or noun phrase substitution if you are going to translate your output to any other language with a richer morphology than English, unless you are prepared for the extra cost
and possible low quality of the resultant output. Human language is too rich and varied to be treated in simple word substitution terms.
XTM International Ltd.
PO Box 2167, Gerrards Cross, SL9 8XF, UK
Tel: +44 (0) 1753 480 479
Mob: +44 (0) 7966 477 181
On 18/01/2013 17:20, Troy Klukewich wrote:
At the former company, we attempted to restrict product names as untranslated, and this initially worked for western languages where we started first, but then it didn't work later for some eastern languages that already had a special trademark for that country,
from what I remember. I think we had some problems too around feature names that in some cases simply had to be translated even for cultural reasons.
We had Information Developer training around how to use product and feature name variables effectively by restricting the way we wrote around them to avoid translation common issues.
Finally, we avoided references to the product in general going forward. Fortunately, in software, we can often refer generically to the product as "the system" or application in context. Still, we had some high level material that required the product name
and we would use the variable there.
In most cases, we used the majority of variables for nasty, constantly changing feature names. :-)
In another case, we had a specialized product that required variables for business object names that could be overridden by the customer, which we then output to in-place help variables with a dynamic mapping file. (Kind of cool, actually, though little seen
In short, I don't think there is an easy answer that avoids changes to schemas, workflow, writing practices, and output processing for holding and processing an effective product or feature element along the lines of a variable, but once implemented, the automation
and customization is very powerful.
I have played around with content references with some success to simulate variables as used in a previous company. I forget what the specific limitations are, but I wasn't totally happy with it.
If I were to create a variables architecture for DITA today using OT as a reference processor, I would create a generic schema for variables, then specialize attributes as needed for products and features (at minimum). Use a mapping file in XML to hold the
resolved values and keys. Then create a processing plug-in for those variables, which might include special graphics associated with the product name, if needed (has actually happened).
On 1/16/2013 9:03 AM, Kristen James Eberlein wrote:
Troy, there were a couple of questions that I want to ask. Did the company's style guide restrict the content developers as to how they used
the product names, for example, using product names only in the nominative case?
Were the product names translated or left in English? I think translated, based on your post, but wanted to check -- some companies handle the problem of reusing company names by NOT translating them.
You mentioned using attributes to indicate plurals or possessives; did you do anything to handle case or part of speech?
Thanks for your interesting post!
Kristen James Eberlein
Principal consultant, Eberlein Consulting
Co-chair, OASIS DITA Technical Committee
Charter member, OASIS DITA Adoption Committee
+1 919 682-2290; kriseberlein (skype)
On 1/16/2013 8:47 AM, Troy Klukewich wrote:
We translated the English XML content and mapping files into multiple languages, including Chinese, Japanese, and Arabic, which are probably some of the more difficult languages to translate. The agencies worked with our build kit and generated translated versions
per language, including PDF. Granted, we had some great development resources to program the XSLT and XSL:FO appropriately to handle multiple languages.
I do remember some iterations where translation had to tweak the PDF output until we programmed a solution (indexing was challenging for Japanese, right-to-left languages, etc.). In any case, the amount of total manual work translation had to perform versus
previous iterations was massively reduced with subsequent cost savings and rapid turn-around. From what I recall, we reached 100% automation for all languages handled.
If you could be more specific about which language morphologies cannot work with variables, I'd be interested. In some cases, the best solution might be to dump out an XLIFF with resolved values for variables, so the base XML isn't translated, but the XLIFF
version of the same with a two-way transformation back to the build kit for a translated version of deliverables.
Some translation groups will work with build kits, some don't, so this also needs to be factored into workflow.
So I suspect that there may be no perfect global solution to handle all possible languages from base DITA XML, but a technical solution that handles most and then an alternate workflow for those languages that cannot work with variables as such.
On 1/15/2013 1:34 PM, Andrzej Zydron wrote:
Thank you for this interesting post. Your mechanism will work for English, and the small group of languages with a similar primitive morphology. Unfortunately it will fall apart when you come to translate the XML content into any language with a richer morphology
- the resultant output will produce ungrammatical output and the cost of recovery from this will be extensive.
English is a linguistic freak (a fact that is lost on most monolingual English speakers) which allows for the relative
easy substitutions that you described. I you plan to translate your content into other languages this is not a practical possibility.
XTM International Ltd.
PO Box 2167, Gerrards Cross, SL9 8XF, UK
Tel: +44 (0) 1753 480 479
Mob: +44 (0) 7966 477 181