docbook-apps message

Subject: Re: [docbook-apps] UI strings vs manual strings ?
From: Jean-Christophe Helary <lists@traduction-libre.org>
To: Jan Tosovsky <j.tosovsky@email.cz>
Date: Wed, 7 Dec 2022 09:09:30 +0900
Thank you all for the replies so far.

Let me reply in one mail.

> On Dec 6, 2022, at 21:44, Tony Graham <tgraham@antenna.co.jp> wrote:
> 
>> Problem at hand:
>> - a Java application with ~2k UI strings (not all users facing), in
>> a Bundle.properties file
> 
> Java also has an XML format for properties files.

Interesting. It could be part of a solution (esp. considering Florimond's reply).

>> - a ~80K words DocBook manual
>> It is not trivial to keep track of the whole string set (searches, etc.)
>> Also, the l10n process takes place on the DocBook sources, not on
>> the HTML output, so tricks like <link linkend endterm/> don't work because translators don't see the target terms.
> 
> Before translation, replace each <link/> with the replacement text from
> the XML properties file wrapped in a well-known element that still
> carries the identifier for the properties file entry.
> 
> After translation, if necessary, convert the well-known elements back
> into <link/> and also do something to handle the strings that have been
> translated differently in different places.

The problem is that it's not possible to do that for a lot of languages. There are inflected forms that transform the text of the "endterm" part and the translation targets 3 dozen languages, including BiDi documents.

That process would add another layer of transformation+verification.

Or maybe I missed something?

> Once you have the properties file for a second language, you could
> insert the translated strings in place of <link/> when preparing for
> translation.  Alternatively, or as well, you could set up your
> computer-aided translation tool to not translate the well-known elements
> for the strings and insert the translated strings after everything else
> is translated.

It looks feasible but only with a small set of target languages.

>> I'm left with having to rewrite the strings explicitly and that's a pain, and also adds risks of mistakes in translations.
> 
> The more that you can automate, the better.

Hence the question ;-)


> On Dec 6, 2022, at 22:04, Alemps Florimond <ntuflorimond@yahoo.com> wrote:
> 
> Hello,
> 
> I would transform the bundle.properties in a document (article, book or section whatever)
> Each line of the file corresponds to somethine like :
> <simpara><guilabel xml:id="messageId">My message</guilabel></simpara>
> 
> One element simpara for one guilabel is useless : it is just to make it readable in a DocBook parse.

Interesting.

Considering that Java properties can also be expressed as XML there could be some automation here.

> In the document, you include the message - something like :
> <para>You should see <xi:include href="bundle.properties.xml" xpointer="messageId"> after clicking on the button.</para>
> 
> The French, English, German version of the document will take advantage of the corresponding translated version of bundle.properties.xml

Why only those 3 languages?

My understanding of xi:include is that it is not required to be resolved before the actual documentation build process.

Which means that the document to translate (and the way it is displayed in the tool) is actually

> <para>You should see <arbitrary link shortcut> after clicking on the button.</para>

Which is not different from what we have now with <link linkend endterm/>

> As far as no id message starts with a number (NC Name for xml:id) you are ok.
> With an XSLT 2.0 processor, it might even be possible to transform the bundle.properties in XML.

It looks like Java properties can be expressed as XML natively (see above) so there is something to explore here.

> On Dec 7, 2022, at 5:13, Jan Tosovsky <j.tosovsky@email.cz> wrote:
> 
> On 05/12/2022 23:05, Jean-Christophe Helary wrote:
>> What's the best way in a DocBook centered process to ensure that the 
>> list of terms used in a software UI is (semi-automatically?) taken 
>> into account in the DocBook sources that describe that software?
> 
> In your document you can use <guilabel> and other <gui* related elements
> which can indicate the content must match the GUI label. You can then
> instruct the localization agency to follow this rule.
> But there is no way to avoid human error so this still has to be checked
> manually which is inefficient. 

The problem is not instructions, the problem is to lower the burden of the translators by explicitly displaying the strings in the DocBook sources.

Creating a normative glossary from the UI strings first could be something, but there are Windows/Linux mnemonics (&) characters in the strings so we'd need to remove them to create that glossary and that would add another step (which can be automatized I guess).

Full disclosure: the manual is for OmegaT, a free software solution for translators, that supports DocBook out of the box, and Java properties too. I am project leader, also in charge of the manual, I made a close to full rewrite of the thing this summer/fall to prepare for our next release but I know that the solution that I chose (link linkend endterm) is not optimal because the link contents/target is not available for inflected modifications required in some languages. (And I also happen to be a translation company, so I understand those issues quite well, but it was my first time on the DocBook authoring side, the last time I wrote the manual it was in HTML.)

As mentioned above in the reply to Tony, the issue with some strings is that they must be explicitly available for translation because some languages need to modify them (grammatical inflections, etc.)

> I've seen an interesting approach where any guilabel had a dedicated
> attribute storing a termbase ID. While a guilabel value was present, it was
> just informative (for the author, to understand the context). The actual
> value was taken from the termbase during generating outputs.
> 
> So if GUI labels are linked to the same termbase, this system ensures your
> document will never diverge. Moreover, it is ensured also to all translated
> documents. 

I'm not sure I understand how the termbase is linked to the document.

> The hardest step is consolidating such a termbase and establishing processes
> on DEV and DOC sides so both departments use the termbase as a single source
> of truth.

:-)



-- 
Jean-Christophe Helary @jchelary@emacs.ch
https://traductaire-libre.org
https://mac4translators.blogspot.com
https://sr.ht/~brandelune/omegat-as-a-book/
Follow-Ups:
- Re: [docbook-apps] UI strings vs manual strings ?
  - From: Tony Graham <tgraham@antenna.co.jp>
References:
- UI strings vs manual strings ?
  - From: Jean-Christophe Helary <lists@traduction-libre.org>
- RE: [docbook-apps] UI strings vs manual strings ?
  - From: "Jan Tosovsky" <j.tosovsky@email.cz>