OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-inline message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff-inline] Proposed requirement for inline SC: XMLwell-formed-ness as a design goal for XLIFF 2.0 inline markup


Hi Yves,

I think we are pretty much in agreement. (and I will do my best not to make this to turn into one of the classic debates from *years of yore* when philosophers argued how many angels could dance on the head of a pin ;-)

To jump the most important nasty side effect, and the biggest bummer with <bpt> (I'll refrain from using the overloaded phrase 'escaping XML markup' for the moment), and the point you make that I whole-heartedly agree with:

> I think the bigger problem, beyond escaping things, is that using 
> something like <bpt> means the text content of the segment is 
> sometime real text, sometimes native code.

Yes! (my own TMX files suffer from this and it's caused me and my company grief)

I should probably just end here, but you know me . . . 

This is why I say when processing XML, (1) as a best practice use XML friendly methods to represent XML markup whenever possible (in XLIFF vernacular, <g> and <x>). And (2) if you must deal with malformed XML, or inline elements that span segments, and you must use a method like (again in XLIFF vernacular) <bpt> <ept>, it is a best practice to not include the XML markup: (a) <bpt ctype="italic" /> is better than (b) <bpt ctype="italic">&lt;i&gt;</bpt>. And I acknowledge your point that some tools might need the syntactic sugar in (2)(b), but I'm afraid those tools might skimp, and create something like (c) <bpt>&lt;i&gt;</bpt>, a nightmare scenario for XML-dependant processes.

Since this SC hopes to create good work to pass on to the LISA TMX group, I bring all of this up. I've despaired when I've learned that TMX is leaning toward disallowing a mechanism like (1), and they seem to be favoring (2)(b), and maybe even (2)(c). I do not mean to put words in their mouth, just stating my perception. 

If the trend is toward XML publishing (DITA, XHTML, Docbook, docx, XUL, MXML, XAML, SVG, etc.), I think the lowest-common-denominator approach of (2), (speaking again on what I've heard about the leanings of TMX) starts to look like something that excludes (or is unfriendly) for a growing group of users.

- Bryan



-----Original Message-----
From: Yves Savourel [mailto:ysavourel@translate.com] 
Sent: Tuesday, May 11, 2010 2:05 PM
To: xliff-inline@lists.oasis-open.org
Cc: arle@lisa.org
Subject: RE: [xliff-inline] Proposed requirement for inline SC: XML well-formed-ness as a design goal for XLIFF 2.0 inline markup

Hi Bryan,

>> frankly I don't know how we could represent '<' in
>> XML content as anything but some kind of
>> escaped sequence.
>
> I am not objecting to the practice of preserving code. 
> I am specifically objecting to the practice of escaping XML markup.
> Escaping XML markup (<bpt ctype="italic">&lt;i&gt;</bpt>yippy<ept>&lt;/i&gt;</ept>)
> is nothing but syntactic sugar. 
> ...

If '<' is part of the original code and if that original code is stored in a <bpt>, it must be escaped. It's not an optional choice. ('>' is optional). http://www.w3.org/TR/REC-xml/#syntax (short of using a CDATA section).



> Omitting the escaped XML (in this example) no less clear
> (<bpt ctype="italic" />yippy<ept />, of course in both examples
> the required id would need to be included).

I see now: what you seem to be alluding to is whether putting the original code is needed/wanted or not. A fair question, but (I think) un-related to escaping XML. That is, I don't think the decision of putting the native codes inside <bpt> should be driven by the type of native format or if it will need to be escaped. To me it's a decision based on how the tool deals with native codes: some for whatever reason need to preserve it, while other do not.

I think the bigger problem, beyond escaping things, is that using something like <bpt> means the text content of the segment is sometime real text, sometimes native code. That is if we were to do a getText() on the segment node the text return would be a mix of real text and native codes. Ideally we would want only real text.


> ...We use XML because XML is the mechanism we chose to solve 
> the I (interchange) challenge. If we identify escaping the 
> XML as a good practice, we're a bit hypocritical.

I'm afraid I'm still not grasping the escaping issue to its full measure yet :) For me it's just a side effect, a symptom. The real cause is that we have constructs like <bpt> instead of <g>. Maybe we will find a solution this time around...


Have a good evening Bryan,
-ys



---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]