[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: FW: [xliff-inline] XLIFF Inline Markup Subcommittee Teleconference - Dec-13-2011 - Summary
On behalf of Rodolfo (who should have his memory elected to the hall of fame) . . . From: Rodolfo M. Raya [mailto:rmraya@maxprograms.com] Hi All, As Bryan says, we should not allow having <sc> and the corresponding <ec> in the same segment. The use of <sc>/<ec> pair should be reserved only for tags that start and end in different segments. As I expect that a <pc> element would not be able to span across multiple segments, we would have a very clear set of use cases for <pc> and <sc>/<ec>. My memory goes back to a face to face meeting in TEKOM (November 2006), when Magnus presented his proposal to “simplify, extend and clarify” XLIFF that recently was renamed as “modular approach”. We all liked his proposal and agreed to make the next version of XLIFF user friendly to make Magnus’ proposal possible. On 2007, when Tony was still chairing, we created the list with 3 sections that we now have in the wiki. In May of 2007 Bryan’s XLIFF RoundTrip Tool was envisioned as one of the deliverables for XLIFF 2.0. That tool needed XLIFF to be XML and XSLT friendly. We were working on getting 1.2 approved as a Committee specification at that time (see the minutes of 5/1/2007) We then revisited the XML-friendly philosophy in another face-to-face meeting in Berlin (November 2009). Bryan requested XLIFF to also be XSLT-friendly and we agreed. At an unknown moment in time, the representation guide for HTML was produced by the TC. In that guide the authors decided to use an approach that later would be called “maximalist”. The approach was based in an XSLT-friendly process that placed the HTML structure in attributes of different elements. That approach was approved by the TC when the guide was approved for publishing. When Bryan recently requested to add an option in XLIFF 2.0 for preserving attributes (listed in the wiki as feature to implement) I remembered that we agreed on making XLIFF 2.0 XML-friendly 5 years ago in Wiesbaden. Searching in old minutes would not be a trivial task, especially when the decision was taken in a face to face session that didn’t have formal minutes added to KAVI. Hope this helps, Rodolfo -- From: Schnabel, Bryan S [mailto:bryan.s.schnabel@tektronix.com] Hi Andrew, (this is actually quite fun for me - taking my mind off of mundane end of year firefighting and getting philosophical - thanks for this thread) > First, with <pc> plus <sc>/<ec>, you're give them two > concepts to implement instead of one. In my mind this statement is too incomplete to tell the whole story. For me, it helps to add a bit more for completeness: With <pc> plus <sc>/<ec>, you're give them two concepts to implement instead of one. While each could technically be used for all use cases, each is optimized for one case, and ugly for the other. Case 1 (90 percent), an inline contained in a single segment. <pc> is good. <sc>/<ec> is less good because it adds complexity (using empty elements to artificially emulate start and end tags is ugly - especially glaring when it is in a single segment). Case 2 (10 percent), an inline starts in one segment and ends in a different segment <sc>/<ec> is good. <pc> is less good because it adds complexity (extra attributes representing start and end are ugly). So my thinking is to exclusively choose one or the other means there will always be an ugly solution being used in one of the use cases. By offering both, but stipulating (in the conformance clause) that (a) to use <pc> in Case 2 is a violation, and (b) to use <sc>/<ec> in Case 1 is a violation, we avoid ugliness. And we do not give a lazy/sloppy implementer (who want to pass the conformance test) a way to mess up. And if we do need to choose just one, I would choose the one that is ugly only 10 percent of the time. I completely agree with the rest of your argument about human nature. > I guess I missed the XML philosophy decision while > reviewing the past TC activity. Can you give me any pointers to it? Sigh, I searched for about 20 minutes yesterday on the TC list. I tried all kinds of string searches ("XML Philosophy," "Operating Principle," "Principle ballot," I even tried misspelling (in case it was me who took notes that day) "Operating Principal"). While I found all kind of interesting results, I did not find the record of this vote. I considered leaving it out, but I remembered that Rodolfo recently referred to this vote as well. So if he can find the record of that vote (which I know I'm not dreaming), we'll have it to refer to. If he cannot, I will formally withdraw that part of my rant (and apologize for the extra bandwidth). Rodolfo, do you remember where the record of this vote can be found? Thanks, Bryan From: Andrew Pimlott [mailto:andrew@spartanconsultinginc.com] I have a bit of romantic admiration for Ted Nelson, though he's certainly been off the mark on many things. He once said, "the web is what we were trying to prevent"! Maybe a little out of touch with the real world. :-) Let me restate my point about implementer sloppiness. First, with <pc> plus <sc>/<ec>, you're give them two concepts to implement instead of one. That is inherently more complicated. Second, the first concept is simpler to understand and covers 9x% of cases; the second requires more explanation and is much less common. Which do you think implementers will spend their time on and get right? So I think that in the real world <sc>/<ec> will be poorly understood and badly implemented by XLIFF processors, no matter how well you state the requirements or try to enforce conformance. It's just my gut, but based on my experience with localization standards. I admit there is some appeal to <pc>. However, I would note that the inclusion of dispEnd in the <pc> start tag (as I understand is part of the <pc> proposal) looks ugly to me, because it separates the end markup from where it logically belongs. That's enough for me not to be attached to <pc> on aesthetic grounds. I guess I missed the XML philosophy decision while reviewing the past TC activity. Can you give me any pointers to it? I found one message from you titled "Philosophical Goal for XLIFF: design for the 'lowest common denominator'". But I can't find the follow-up or the decision. Andrew On Wed, Dec 14, 2011 at 4:30 PM, Schnabel, Bryan S <bryan.s.schnabel@tektronix.com> wrote: Hi Andrew, I think it's great that you are hitting the ground running. Super cool that you bring your thoughts to the table - and extra credit to you for being so organized, concise, detailed and coherent (and citing Ted Nelson was a nice touch). You make it a tough debate, but I have a few thoughts from the other point of view. Redundancy is not always evil - and is sometimes needed. Painters only *really* need a tube of Blue, a tube of Red, and a tube of Yellow - and they can mix any color they want. But giving them Purple, Orange, and Green isn't so bad I think. Or more to the point, in HTML, since you have <b>, <i>, <br> and <font size="x">, you don't really need <h1>, <h2>, <h3> etc. But giving web designers these extra <hx> tags probably is a good thing. I think characterizing <pc> as entirely redundant does not cover every use case. Yes, I can use <sc>/<ec> to model all cases. But processing <sc>/<ec> adds overhead in many cases. I might even say in most cases (as you say "Overlapping markup may not be the common case . . ."). I think the vast majority of inline cases could be handled by <pc> (though it's sometimes said that I live in an idyllic world where all XLIFF I deal has the beautiful luxury of being for well-formed XML - I wish!). In my mind <pc> has less overhead, and more application. Forcing all cases to use <sc>/<ec> is to force the ugly solution of the few onto the many. I think we can mitigate the risk of the lazy implementer's sloppiness by stating a very clear conformance requirement. Use <pc> when you can, and <sc>/<ec> when you must. Actually, I'm not sure how that leads to sloppiness. I don't think it's at all difficult to know when to use one or the other. I'm glad you brought up the XML-friendly matter (my turn to get philosophical). A while ago the TC came to a unanimous decision that we adopt an operating principle that we try to model features and processing in a way that is (to the extent that we can) compliant with XML philosophy. I don't think any of us were motivated by a love for XML, or an expectation that all, or even most of what XLIFF processes will be XML source. For me, it was because XML is a pretty simple (rational) common denominator. And I think some have even pondered the X in XLIFF. I think it is more than to say that the vocabulary of our Localization Interchange File Format is XML. I think nearly all Open Standards choose XML as their vocabulary. I personally have always look at the X to represent the methodology more than the vocabulary. So even though I did not coin the phrase, I tend to agree with the statement that we should not mandate a single method that is "XML hostile." In fact, XLIFF 1.2 was nearly not passed because of the way one of the OASIS voting members read our characterization of XML. And to pick one final nit, I don't think we all necessarily agree on the specific necessity of <sc>/<ec>. I think, more accurately, we all agree that we need to accommodate spans that either cross segments or overlap. Plenty of examples have been shown that can support this requirement by adding attributes to either <pc> or <ph>. But I think most of us see that as kind of a bad idea. And speaking for myself, it seems as bad an idea as using <sc>/<ec> in cases where it simply isn't needed. Just a different point of view . . ., Bryan From: Andrew Pimlott [mailto:andrew@spartanconsultinginc.com] As a newcomer, I am not yet eligible to vote, but this is a chance to start sharing my views. So here's my position on this ballot: - It looks like most everyone agrees on the necessity of <sc>/<ec>. So if <pc> were also included, it would be entirely redundant. Everything that can be expressed with <pc> can also be expressed with <sc>/<ec>, and <pc> becomes a "shorthand" for some forms of <sc>/<ec>. I think this is bad design for several reasons: - Consider how you would handle this as an implementor. When reading XLIFF, you would almost certainly map both constructs to the same one internally. When writing XLIFF, you have two choices: 1) always write <sc>/<ec>; or 2) figure out when you can use <pc> and use write <pc> in those cases, otherwise write <sc>/<ec>. Practically, why would anyone do (2)? The result is everyone writes <sc>/<ec>. <pc> is useless overhead. - Lazy implementors (and they're all lazy) are likely to implement <pc> right, and <sc>/<ec> sloppily. This is a serious practical problem, and if you think about the industry's experience with standards, I think you'll agree that implementors routinely do the easy parts and skip the hard parts. We're much more likely to get correct implementations if there is only one way to do it. (And of course we offer good developer guidance.) - Imagine a future specification for "XLIFF normalization" (which will be necessary someday). The obvious thing to do is normalize <pc> to <sc>/<ec>. So <pc> is just extra work. - I'm not entirely against shorthands. They are good for human-writable documents, and for saving space. But I don't think either consideration applies here. - Those who like <pc> for XML seem to agree that <sc>/<ec> are necessary even for XML, in some cases. So nobody gets to live in a fairy land without <sc>/<ec>. Why pretend? ;-) Maybe you can go a little way with just <pc>, but then some day you have a use case that requires <sc>/<ec>. You'll curse the day you started with <pc>! - "XML friendliness" has little value for me. XLIFF is designed for localizing any content, and while we should make sure it works well for XML, I would be wary of any bias to working "best" with XML. Besides, there's enough work between us and XLIFF 2.0 that giving special attention to XML doesn't seem to be a good use of effort. - (Ok, I can't resist veering into the philosophical.) I don't think <sc>/<ec> is hackish at all. The right question to ask here is whether overlapping markup has value in the real world. I think the answer is obvious. Consider a word processor. They user selects words 1-3 and presses "bold", then selects words 2-4 and presses "italics". How should the word processor represent this? Overlapping markup is clearly a reasonable option, and nested markup starts to look like a hack. Overlapping markup may not be the common case, but it's not a freak either. Don't treat it like one! Some ideas borrowed from Ted Nelson: Andrew On Tue, Dec 13, 2011 at 8:44 AM, Schnabel, Bryan S <bryan.s.schnabel@tektronix.com> wrote: Hi Yves,
-- Andrew Pimlott Chief Technology Office Spartan Consulting -- Andrew Pimlott Chief Technology Office Spartan Consulting |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]