+1 on that. It’s true that there are probably not enough examples in the specification.
Some of them however are using HTML, especially in the section regarding inline codes.
For instance the examples for the sub-flows: http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html#subflowsdesc
If it can help, a few other examples can be found in the Okapi Framework implementation.
There are two samples in HTML, with the originals and the XLIFF2 outputs:
The few pointers I can think of, from experience:
- Do not just extract the HTML content into a CDATA section.
- Only the in-line codes should be in XLIFF units (as XLIFF codes), that is: <b> not <p>.
- If possible use sub-flow for text embedded in HTML tags (e.g. alt or title text).
- If possible don’t use <ph/> for paired code, use <pc>…</pc>.
Also, there is the draft version of the old “XLIFF 1.2 Representation Guide for HTML” that is available. It was done for XLIFF 1.2, but most principles are the same for 2.0. You can find it here: http://docs.oasis-open.org/xliff/v1.2/xliff-profile-html/xliff-profile-html-1.2-cd02.html
I hope that helps.
I’m implementing an extractor from a CMS and by reading the specifications it’s not super-clear which is the right way to extract a piece of HTML to XLIFF.
I understand that extraction is a very personal and application specific matter so probably not to be standardised in the specs, but it would be helpful to add somewhere, either as notes or even in the test suite examples of how HTML fragments are to be converted into XLIFF.