Subject: Re: [docbook-apps] Preserving entities and language translation
On Mon, 18 Jan 2016, Richard Hamilton wrote:
Hi Warren, I'm not sure if I can come up with a standard, or truly elegant way of doing this, but here are a few possibilities: 1) If the only entities in your files are the ones that shouldn't be translated, just don't have xmllint resolve the entities when you normalize the file, then use it again to expand entities when you get the translated files back. Of course, this option won't work if you have other entities that contain content you need to translate.
Unfortunately, we use lots of entities for lots of things, and almost all of them need to be translated.
2) Place the large blocks in files that would be included using xinclude. You could then translate the file before resolving the inclusions. Of course, this one doesn't work if you have other xincludes that contain content you need to translate.
The good aspect of that is that we currently don't have anything that uses xinclude. It seems like we would not be able to use xinclude for anything else afterward, though.
3) Create a simple XSL stylesheet or script (Perl, PHP, ...) that would convert the code as you have shown below, leaving everything else as is, then another script that converts the translated file back. You could then run xmllint to resolve those entities. This is a five-step process: - Run the first script to transform the translate on/off processing instructions into non-entities. - Run xmllint to normalize everything else - Translate - Run the second script on the translated file to restore the entities inside the translate instructions - Run xmllint to resolve those entities.
Right. It would be really nice if xmllint could call external programs to handle processing instruction filters. Then the entity-to-text and text-to-entity conversion would be done in memory.
Otherwise, we would have to temporarily modify the original source files on disk (!), run xmllint, then restore them:
make backup copies of document XML files transform marked entities to text in original files create normalized file with xmllint restore backups
I hope that helps.
It does, yes. Thank you!PS: I added my FreeBSD.org address to Cc. Apologies if my other address bounced anyone's mail.
On Jan 18, 2016, at 20:42, Warren Block <email@example.com> wrote:Some of the articles in the FreeBSD documentation use entities to include large blocks of data. For example, one article is just a very large list of PGP keys for developers: https://www.freebsd.org/doc/en_US.ISO8859-1/articles/pgpkeys/index.html The DocBook article.xml is only 2K, because it does things like this: <sect1 xml:id="pgpkeys-officers"> <title>Officers</title> §ion.pgpkeys-officers; </sect1> We use xmllint to normalize the article into a single XML file for use with PO translation tools. Of course, all entities are expanded into text at that point. It would be really nice to mark that particular entity as one that should be preserved in the translated file. Is it possible to do that with processing instructions or some other method? For example: <sect1 xml:id="pgpkeys-officers"> <title>Officers</title> <?translate off?> §ion.pgpkeys-officers; <?translate on?> </sect1> In the normalized file, it could be a string to indicate to translators that it should be left alone: <sect1 xml:id="pgpkeys-officers"> <title>Officers</title> <?translate off?> do not translate: section.pgpkeys-officers <?translate on?> </sect1> Of course, it has to be changed back when the translated XML file is generated. Is there a standard or elegant way to do this? --------------------------------------------------------------------- To unsubscribe, e-mail: firstname.lastname@example.org For additional commands, e-mail: email@example.com