OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] Preserving entities and language translation


On Tue, 19 Jan 2016, Shaun McCance wrote:

I'm not sure if you've successfully made the switch to itstool yet for
your PO round-tripping. I think we talked a bit about entity expansion
at the Open Help Conference last year.

Yes, we have:
https://www.freebsd.org/news/status/report-2015-07-2015-09.html#PO-Translation-Project

You are even mentioned there. :)

The default behavior in itstool is that it expands entities, but does
not do XInclude. So I recommend using entities for words, phrases, and
other mid-sentence substitutions. Use XInclude for entire blocks or
sections.

But again, doesn't that limit when we can use XInclude? Right now, we use entities for entire chapters in books. Those chapters can include inline entities that need to be translated, too. Here is our documentation manual in the repository:
https://svnweb.freebsd.org/doc/head/en_US.ISO8859-1/books/fdp-primer/

The chapters are defined as system entities here:
https://svnweb.freebsd.org/doc/head/en_US.ISO8859-1/books/fdp-primer/chapters.ent?view=markup

The main book.xml uses those entities at the end:
https://svnweb.freebsd.org/doc/head/en_US.ISO8859-1/books/fdp-primer/book.xml?view=markup

The idea is that translators have a difficult time with mid-sentence
substitutions when they need to do inflections and declensions on words.
For example:

<!ENTITY b "button">
<para>Click the &b;.</para>
<para>The &b; is blue.</para>

A translator can't reliably translate this without entity substitution,
because she might need to translate "button" differently in each case.
You can override this with the -k option, but I don't recommend it.

I never did get -k to work. It left entities unexpanded but then would choke on them, possibly due to our use of XML catalogs. With that and the translation issues, we just went with expanding all entities. That is fine except for these special-case uses for big lists of things that should *not* be translated. Like the PGP keys entities: https://svnweb.freebsd.org/doc/head/share/pgpkeys/pgpkeys.ent?view=markup

If we could somehow figure out which entities are inline and which are
block-level, that might help. Perhaps not expanding SYSTEM entities
(likely used for large blocks), expanding regular entities (likely used
for words and phrases), and expanding SYSTEM parameter entities (likely
used to pull in entity definitions).

Given the way our stuff is defined, that might work for some things.

I don't know if libxml2's API allows for that. libxml2 can do a lot that xmllint doesn't expose.

For us, xmllint is not so much a linter as an XML processing tool. Maybe for others, too. A mechanism for it to use PIs to say "transform these entities this way" seems logical. (Er, that's "transform" in the text or PCRE sense, not in the XSLT "dude, let's make XML into a programming language" sense.) Right now it only has --noent, so it's all or nothing.

Even being able to give xmllint a list of entities to preserve/not expand would be fine. Those could be changed to non-entities by postprocessing the output file before the PO information is extracted.

On Mon, 2016-01-18 at 21:42 -0700, Warren Block wrote:
Some of the articles in the FreeBSD documentation use entities to
include large blocks of data.  For example, one article is just a very
large list of PGP keys for developers:
https://www.freebsd.org/doc/en_US.ISO8859-1/articles/pgpkeys/index.html

The DocBook article.xml is only 2K, because it does things like this:

   <sect1 xml:id="pgpkeys-officers">
     <title>Officers</title>

     &section.pgpkeys-officers;
   </sect1>

We use xmllint to normalize the article into a single XML file for use
with PO translation tools.  Of course, all entities are expanded into
text at that point.

It would be really nice to mark that particular entity as one that
should be preserved in the translated file.  Is it possible to do that
with processing instructions or some other method?  For example:

   <sect1 xml:id="pgpkeys-officers">
     <title>Officers</title>

     <?translate off?>
     &section.pgpkeys-officers;
     <?translate on?>
   </sect1>

In the normalized file, it could be a string to indicate to translators
that it should be left alone:

   <sect1 xml:id="pgpkeys-officers">
     <title>Officers</title>

     <?translate off?>
     do not translate: section.pgpkeys-officers
     <?translate on?>
   </sect1>

Of course, it has to be changed back when the translated XML file is
generated.

Is there a standard or elegant way to do this?


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]