[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [xliff-comment] HTML extranction examples
One more thingWith "extraction/merging best practices” do you refer to this? http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd03/ xliff-core-v2.1-csprd03.html# d0e10990
On 9 May 2017 at 19:00:28, Ján Husarčík (jan.husarcik@gmail.com) wrote:
JanRegards,- Your technology stack (do you plan just to extract/merge? Do you have terminology management, use for translation candidates or are you using some enricher? Can you make use of ITS metadata?)- CAT tool/LSP capabilities (does it support the features/modules you plan to use?)Other than that, Yves already listed the best practices and I'm glad he put the CDATA bit at the first place :). Using CDATA might seem like a simple way how to handle inline codes, however you are losing all the advantages of proper extraction.Printed version of XLIFF 2.0 has 135 pages, compared to 71 for v1.2. Inserting examples directly into the specs would further extend the length and might prove difficult to manage.Hello all,here are my few cents as somebody on the "receiving end" (LSP).
Putting them in the SVN, along the existing test-suite (as proposed in the original post) would be more maintainable. This way it could contain not just fragments, but the whole (commented) files in different stages of the life-cycle. Also different file-types can be included.
You can represent block elements using (nested) groups and units (table/row/cell as group/group/unit), inline codes using <ph/>, <pc> pair, or <sc/>, <ec/>. Please consider the update on extraction/merging best practices in the latest XLIFF2.1 draft.
Do not forget the editing hints, which will help you to prevent technical issues during merging, and context attributes (e.g. disp*, type), which will simplify the life for translator.- CMS capabilities (content fragmentation, multimedia, metadata available. Is the native format well-formed?)
However, a lot depend on your particular situation:- Will you merge with skeleton or reconstruct the target file from what's available in xliff?- Will the extractor perform also segmentation? (e.g <segment>paragraph</segment> vs. <segment>sentence</segment>)
On Tue, May 9, 2017 at 1:46 PM, Simone Chiaretta <simone@piyosailing.com> wrote:
Thank you very much for the pointers.Some I did found and used already, but since it requires a lot of hunting around, my suggestion was add them more prominently somewhere in the specs.It’s true that general principles stay the same in v 1.2 and 2.0, but v2 adds a lot more possibilities, like the originalData, the references, the FS module. One thing is adding more possibilities, the other is explaining how to use them in the best way :)Simone
On 9 May 2017 at 13:39:08, Yves (yves@opentag.com) wrote:
Hi Simone,
+1 on that. It’s true that there are probably not enough examples in the specification.
Some of them however are using HTML, especially in the section regarding inline codes.
For instance the examples for the sub-flows: http://docs.oasis-open.org/xli
ff/xliff-core/v2.0/xliff-core- v2.0.html#subflowsdesc
If it can help, a few other examples can be found in the Okapi Framework implementation.
There are two samples in HTML, with the originals and the XLIFF2 outputs:
The few pointers I can think of, from experience:
- Do not just extract the HTML content into a CDATA section.
- Only the in-line codes should be in XLIFF units (as XLIFF codes), that is: <b> not <p>.
- If possible use sub-flow for text embedded in HTML tags (e.g. alt or title text).
- If possible don’t use <ph/> for paired code, use <pc>…</pc>.
Also, there is the draft version of the old “XLIFF 1.2 Representation Guide for HTML” that is available. It was done for XLIFF 1.2, but most principles are the same for 2.0. You can find it here: http://docs.oasis-open.org/xli
ff/v1.2/xliff-profile-html/xli ff-profile-html-1.2-cd02.html
I hope that helps.
-yves
From: Simone Chiaretta [mailto:simone@piyosailing.com
]
Sent: Tuesday, May 9, 2017 4:58 AM
To: xliff-comment@lists.oasis-open.org
Subject: [xliff-comment] HTML extranction examples
Dear all,
I’m implementing an extractor from a CMS and by reading the specifications it’s not super-clear which is the right way to extract a piece of HTML to XLIFF.
I understand that extraction is a very personal and application specific matter so probably not to be standardised in the specs, but it would be helpful to add somewhere, either as notes or even in the test suite examples of how HTML fragments are to be converted into XLIFF.
Regards,
Simone
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]