[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Generated text in the DITA Open Toolkit
Hello all - I was asked to clarify how the DITA Open Toolkit works with generated text, so here it is: As most on this list know, generated text describes any standard text that is not stored in the source files, but appears in the output. For example, the text "Related information" that appears above links is generated by the transform, as is the text "Note" that appears based on a <note> element. As with most other XML systems, DITA encourages users to keep this common text outside of the source files, and outside of the core transforms themselves. The common text is retrieved when the DITA content is published, based on the language setting in each document (or based on a default set in the transforms). In the toolkit, all generated text is kept in XML files outside of the transform code. If you check in the xsl/common/ directory of the toolkit, you will see that we have one string file for each supported locale. These files are named strings-XX-YY.xml, where XX-YY is the locale value. We chose to use a separate file for each locale based on the advice of many of our translators, rather than storing every language in a single file. There is also a file named strings.xml. This file is used to define what languages are available, and what file should be used for each language. For example, it indicates that a locale value of "en", "en-us", or "en-gb" should all use the file strings-en-us.xml for lookup. If the generated text for en-gb needs to differ at any point, a new file can be created, and the reference in strings.xml will change. The primary reason for this redirection is to make it easier to find out what languages are available, without trying to open files that do not exist. For example, the toolkit does not yet have language support for the "sa-in" locale. If that language is encountered, the XSLT should not try to open the strings-sa-in.xml locale without first trying to see if that file is available; otherwise, most parsers will generate missing file warnings. We did not want to keep the list of supported languages in every XSLT transform, for a couple of reasons. First, if any non-XSLT programs use the translations, then the supported languages would have to be maintained in multiple locations. Second, user extensions (with new translations) may not support the same set of languages as the base toolkit. HOW THE LOOKUP IS PERFORMED There is a common XSLT function called getString, which is used to look up each translation. This function is called with the name of the lookup-string as a parameter. For example, when generating a heading for a for the next topic, the function is called as <xsl:call-template name="getString"> <xsl:with-param name="stringName" select="'Next topic'"/> </xsl:call-template> The getString function determines the currently active language. In most cases we expect this to be at the level of the <topic> element, but it is taken from the closest ancestor with an xml:lang attribute. Assume for this explanation that the current topic is Swedish; so, the language is either "sv" or "sv-se". The getString template also has a parameter that tells it where to look for string information; by default, this is the strings.xml file. It will search this file, and find that strings-sv-se.xml is the correct place to find the current string: <lang xml:lang="sv" filename="strings-sv-se.xml"/> <lang xml:lang="sv-se" filename="strings-sv-se.xml"/> The indicated file contains the line: <str name="Next topic">Nästa avsnitt</str> So, the getString template returns "Nästa avsnitt" as the translation. ADDING NEW TRANSLATIONS This mechanism was designed to make it possible to add new translations, particularly in the case of specializations, without having to re-write the lookup code. For example, assume that I have a music specialization to describe my music collection. I have a table of bands and albums; so, I want to generate the headers "Band" and "Albums". For my selection of Swedish music, I've set the table to xml:lang="sv-se". So, how is this done-- I've placed all of my XSL and string files in the toolkit directory demo/music/xsl. When I call the getString template, I need to pass in two parameters - the first instructs the template on where to look for translations (relative to the getString template), and the second (as before) is the string value. So, I pass in <xsl:call-template name="getString"> <xsl:with-param name="stringFileList">../../demo/music/xsl/musicstrings.xml</xsl:with-param> <xsl:with-param name="stringName">Group</xsl:with-param> </xsl:call-template> The standard getString template looks in my stringFileList instead of the default location. That file tells me where to go for Swedish translations. Note that my specialization can support the same languages as the toolkit, or a subset, or a superset, depending on my needs. The file contains this: <lang xml:lang="sv-se" filename="music-sv-se.xml"/> I then look for the string in the file music-sv-se.xml, and come up with "Grupp": <str name="Group">Grupp</str> Of course, it would be much easier to understand all of this with an example to look at. Erik Hennum has a working example of this as part of his API Reference specialization, which will be available soon as a plugin to the toolkit. OTHER LOCALE PROBLEMS The toolkit currently accounts for a couple of other locale issues when generating text. The first is the need to rearrange word order. Currently, this is only done for Hungarian captions; for example, "Table 1" in English becomes "1 Táblázat" when translated. This is currently handled directly in the XSLT code for tables and figures -- when the language is Hungarian, we generate the number, followed by a space, followed by the table string; otherwise, we use the string, followed by a space, followed by a number. The other issue is for French text, where colons in text like "Note:" must be preceded by a space. In this case, we treat the colon as generated text, which is retrieved by looking up the value for "ColonSymbol". For French locales, this consists of a space followed by a colon, while for other languages it is simply a colon. DEFAULT LANGUAGES The Open Toolkit currently uses a default language of US English. This is set using the DEFAULTLANG parameter (inside the dita-utilities.xsl file, which also contains getString). To use a different default language, it is only necessary to reset this value or pass a new locale value to the transform as a parameter. The toolkit today supports 47 locales, representing 39 unique languages. It appears that we ship files today that are not actually referenced - for example, all English locales point to strings-en-us.xml in the lookup file, but we ship string files for UK English and Canadian English. The extra files are not used by the transform today. New translations that are added as user extensions should be kept together with the transform code that extends the toolkit. It is not a good idea to place new translations in the strings file, simply because these may get updated with new releases of the toolkit. As stated above, user extensions can support as many or as few languages as needed. I understand that this is all rather long and convoluted - so, I expect there to be questions... Robert D Anderson IBM Authoring Tools Development Chief Architect, DITA Open Toolkit
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]