RE: [xliff] Re: [xliff-inline] XLIFF Inline Markup Subcommittee Teleconf

Hi Andrew,

Interoperability is an overrated word. I read comments about lack of interoperability with XLIFF in many places. Interestingly, people blame the standard, not the implementers or the buyers of known faulty tools.

The XLIFF standard may have defects but that doesn’t mean the standard cannot be used in an interoperable way today.

Interoperability problems that users face today are not problems with the standard, they are the consequences of deliberate decisions made by some tool vendors to lock users. The less noticed detail is that affected users are victims of their own choice of tools. Those users purchase tools that are known to cause problems instead of buying tools that are known to support the standard in a decent way. Would you blame vendors when users are fully supporting them with money?

Very often I see tool developers complaining about the standard when the real problem they have is their own lack of ability to implement and support the standard. Sometimes I don’t know if they are acting when they say in public conferences that the standard is impossible to support or if they are really that bad as developers.

Today a CAT tool developer said in a conference (NLDITA 2011) that DITA files are impossible to translate properly because the specifications are faulty, blah, blah, blah. Fifteen minutes later, a tool that is capable of translating DITA files as the users present in the conference expected was demonstrated. All problems that the audience pointed out in the first tool and that the developer said that were due to DITA bad design were properly handled by the tool used in the second showcase. With XLIFF happens the same.

Defining processing expectations for XLIFF elements would be a healthy choice. Don’t hold your breath and expect that developers will follow all expectations. Better think of applications as black boxes that should return XLIFF documents in a certain state, compatible with the process applied to the document. Check the state of the document, forget about the application.

Regards,

Rodolfo

--
Rodolfo M. Raya rmraya@maxprograms.com
Maxprograms http://www.maxprograms.com

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Andrew Pimlott
Sent: Thursday, December 15, 2011 10:10 PM
To: Rodolfo M. Raya
Cc: Schnabel, Bryan S; Yves Savourel; xliff@lists.oasis-open.org
Subject: [xliff] Re: [xliff-inline] XLIFF Inline Markup Subcommittee Teleconference - Dec-13-2011 - Summary

[Hope nobody minds me moving this to the main xliff list. It's veered away form inline markup, and it's awkward for Rodolfo to participate since he's not on the inline SC. The thread is quoted below for those who missed it.]

You're right, we cannot enforce application conformance, and we should be careful what we say about application conformance.

But I strongly believe we cannot punt on this. This is exactly where interoperability falls down in practice: Applications accept and produce valid XLIFF, but one doesn't do what the other expects. We will never get tools to work together by treating them as black boxes. We have to say clearly and as precisely as possible what they should do.

Andrew

On Thu, Dec 15, 2011 at 3:58 PM, Rodolfo M. Raya <rmraya@maxprograms.com> wrote:

Hello Andrew,

We can test conformance of XLIFF documents. We cannot enforce application conformance or full support for XLIFF.

Treat applications as black boxes. If an application receives a valid XLIFF file and exports a valid XLIFF file, we have nothing to say about it. How it handles the XLIFF document internally is not our business.

If an application that receives an XLIFF file exports an invalid or damaged XLIFF file, the only thing we can do is tell the user what is wrong in the faulty XLIFF file. The user is responsible for his/her choice of tools, not the XLIFF TC.

We cannot say that an application is not conformant to XLIFF standard. What would the maker of a non-conformant tool do if we publish an statement saying that the tool doesn’t comply with the standard? I wouldn’t personally risk a law suit.

Regards,

Rodolfo

--
Rodolfo M. Raya rmraya@maxprograms.com
Maxprograms http://www.maxprograms.com

From: Andrew Pimlott [mailto:andrew@spartanconsultinginc.com]
Sent: Thursday, December 15, 2011 9:49 PM
To: Rodolfo M. Raya
Cc: Schnabel, Bryan S; Yves Savourel

Subject: Re: [xliff-inline] XLIFF Inline Markup Subcommittee Teleconference - Dec-13-2011 - Summary

[Rodolfo, your message has been forwarded to the list by Bryan. Everyone keep Rodolfo in the Cc list.]

I agree you can always test whether an XLIFF document produced by an application is correct, using a sufficiently good conformance checker.

But you cannot test in general whether an application behaves correctly after reading an XLIFF document. Even if the application subsequently produces a correct XLIFF document, you cannot be sure it behaved correctly. To take an example totally removed from the current discussion, if it an XLIFF-based workbench failed to offer <alt-trans> proposals, you would have to agree it does not fully support XLIFF. But it would be impossible to tell from the outside (just looking at inputs and outputs) that the application behaved incorrectly.

The only way to police conformance in cases like this is to state clear (but informal) expectations for how applications should behave. No automated test can do it. These is where the wording of the standard must be especially precise.

Andrew

On Thu, Dec 15, 2011 at 3:13 PM, Rodolfo M. Raya <rmraya@maxprograms.com> wrote:

Hi Andrew,

I designed XLIFFChecker to check those cases in which simple XML validation with a DTD or Schema is not enough. We can add conformance criteria to any element of XLIFF and I would certainly try to update XLIFFChecker to test conformance as dictated by the specs.

So far XLIFFChecker has detected lots of errors in many tools. It served to help developers improve their code.

As I’m not a member of the Inline SC, please forward this reply to the SC mailing list. My previous reply bounced, so please forward it too.

Regards,

Rodolfo

--
Rodolfo M. Raya rmraya@maxprograms.com
Maxprograms http://www.maxprograms.com

From: Andrew Pimlott [mailto:andrew@spartanconsultinginc.com]
Sent: Thursday, December 15, 2011 8:52 PM
To: Schnabel, Bryan S
Cc: Yves Savourel; xliff-inline@lists.oasis-open.org; Rodolfo M. Raya

Subject: Re: [xliff-inline] XLIFF Inline Markup Subcommittee Teleconference - Dec-13-2011 - Summary

On Thu, Dec 15, 2011 at 2:12 PM, Schnabel, Bryan S <bryan.s.schnabel@tektronix.com> wrote:

> (this is actually quite fun for me - taking my mind off of mundane end of year firefighting and getting philosophical - thanks for this thread)

No problem!

Just one clarification, where we might be thinking about different things. I think you're focused on applications that produce XLIFF. I'm mostly thinking about applications that process XLIFF. An XLIFF-based workbench is an obviously example. This kind of application has to handle both <pc> and <sc>/<ec> in full generality when opening an XLIFF file.

Also, you can't conformance test everything. For example, the XLIFF-based workbench might display <sc>/<ec> codes in a way that's nonsensical to the user, but no automated conformance test could catch this. This is where I'm worried that <sc>/<ec> will be implemented poorly.

Andrew

On Thu, Dec 15, 2011 at 2:12 PM, Schnabel, Bryan S <bryan.s.schnabel@tektronix.com> wrote:

Hi Andrew,

(this is actually quite fun for me - taking my mind off of mundane end of year firefighting and getting philosophical - thanks for this thread)

> First, with <pc> plus <sc>/<ec>, you're give them two

> concepts to implement instead of one.

In my mind this statement is too incomplete to tell the whole story. For me, it helps to add a bit more for completeness:

With <pc> plus <sc>/<ec>, you're give them two concepts to implement instead of one. While each could technically be used for all use cases, each is optimized for one case, and ugly for the other.

Case 1 (90 percent), an inline contained in a single segment.

<pc> is good. <sc>/<ec> is less good because it adds complexity (using empty elements to artificially emulate start and end tags is ugly - especially glaring when it is in a single segment).

Case 2 (10 percent), an inline starts in one segment and ends in a different segment

<sc>/<ec> is good. <pc> is less good because it adds complexity (extra attributes representing start and end are ugly).

So my thinking is to exclusively choose one or the other means there will always be an ugly solution being used in one of the use cases. By offering both, but stipulating (in the conformance clause) that (a) to use <pc> in Case 2 is a violation, and (b) to use <sc>/<ec> in Case 1 is a violation, we avoid ugliness. And we do not give a lazy/sloppy implementer (who want to pass the conformance test) a way to mess up.

And if we do need to choose just one, I would choose the one that is ugly only 10 percent of the time.

I completely agree with the rest of your argument about human nature.

> I guess I missed the XML philosophy decision while

> reviewing the past TC activity. Can you give me any pointers to it?

Sigh, I searched for about 20 minutes yesterday on the TC list. I tried all kinds of string searches ("XML Philosophy," "Operating Principle," "Principle ballot," I even tried misspelling (in case it was me who took notes that day) "Operating Principal"). While I found all kind of interesting results, I did not find the record of this vote. I considered leaving it out, but I remembered that Rodolfo recently referred to this vote as well. So if he can find the record of that vote (which I know I'm not dreaming), we'll have it to refer to. If he cannot, I will formally withdraw that part of my rant (and apologize for the extra bandwidth). Rodolfo, do you remember where the record of this vote can be found?

Thanks,

Bryan

From: Andrew Pimlott [mailto:andrew@spartanconsultinginc.com]
Sent: Thursday, December 15, 2011 1:22 PM

To: Schnabel, Bryan S
Cc: Yves Savourel; xliff-inline@lists.oasis-open.org
Subject: Re: [xliff-inline] XLIFF Inline Markup Subcommittee Teleconference - Dec-13-2011 - Summary

I have a bit of romantic admiration for Ted Nelson, though he's certainly been off the mark on many things. He once said, "the web is what we were trying to prevent"! Maybe a little out of touch with the real world. :-)

Let me restate my point about implementer sloppiness. First, with <pc> plus <sc>/<ec>, you're give them two concepts to implement instead of one. That is inherently more complicated. Second, the first concept is simpler to understand and covers 9x% of cases; the second requires more explanation and is much less common. Which do you think implementers will spend their time on and get right? So I think that in the real world <sc>/<ec> will be poorly understood and badly implemented by XLIFF processors, no matter how well you state the requirements or try to enforce conformance. It's just my gut, but based on my experience with localization standards.

I admit there is some appeal to <pc>. However, I would note that the inclusion of dispEnd in the <pc> start tag (as I understand is part of the <pc> proposal) looks ugly to me, because it separates the end markup from where it logically belongs. That's enough for me not to be attached to <pc> on aesthetic grounds.

I guess I missed the XML philosophy decision while reviewing the past TC activity. Can you give me any pointers to it? I found one message from you titled "Philosophical Goal for XLIFF: design for the 'lowest common denominator'". But I can't find the follow-up or the decision.

Andrew

On Wed, Dec 14, 2011 at 4:30 PM, Schnabel, Bryan S <bryan.s.schnabel@tektronix.com> wrote:

Hi Andrew,

I think it's great that you are hitting the ground running. Super cool that you bring your thoughts to the table - and extra credit to you for being so organized, concise, detailed and coherent (and citing Ted Nelson was a nice touch).

You make it a tough debate, but I have a few thoughts from the other point of view.

Redundancy is not always evil - and is sometimes needed. Painters only *really* need a tube of Blue, a tube of Red, and a tube of Yellow - and they can mix any color they want. But giving them Purple, Orange, and Green isn't so bad I think. Or more to the point, in HTML, since you have <b>, <i>, <br> and <font size="x">, you don't really need <h1>, <h2>, <h3> etc. But giving web designers these extra <hx> tags probably is a good thing.

I think characterizing <pc> as entirely redundant does not cover every use case. Yes, I can use <sc>/<ec> to model all cases. But processing <sc>/<ec> adds overhead in many cases. I might even say in most cases (as you say "Overlapping markup may not be the common case . . ."). I think the vast majority of inline cases could be handled by <pc> (though it's sometimes said that I live in an idyllic world where all XLIFF I deal has the beautiful luxury of being for well-formed XML - I wish!). In my mind <pc> has less overhead, and more application. Forcing all cases to use <sc>/<ec> is to force the ugly solution of the few onto the many.

I think we can mitigate the risk of the lazy implementer's sloppiness by stating a very clear conformance requirement. Use <pc> when you can, and <sc>/<ec> when you must. Actually, I'm not sure how that leads to sloppiness. I don't think it's at all difficult to know when to use one or the other.

I'm glad you brought up the XML-friendly matter (my turn to get philosophical). A while ago the TC came to a unanimous decision that we adopt an operating principle that we try to model features and processing in a way that is (to the extent that we can) compliant with XML philosophy. I don't think any of us were motivated by a love for XML, or an expectation that all, or even most of what XLIFF processes will be XML source. For me, it was because XML is a pretty simple (rational) common denominator. And I think some have even pondered the X in XLIFF. I think it is more than to say that the vocabulary of our Localization Interchange File Format is XML. I think nearly all Open Standards choose XML as their vocabulary. I personally have always look at the X to represent the methodology more than the vocabulary. So even though I did not coin the phrase, I tend to agree with the statement that we should not mandate a single method that is "XML hostile." In fact, XLIFF 1.2 was nearly not passed because of the way one of the OASIS voting members read our characterization of XML.

And to pick one final nit, I don't think we all necessarily agree on the specific necessity of <sc>/<ec>. I think, more accurately, we all agree that we need to accommodate spans that either cross segments or overlap. Plenty of examples have been shown that can support this requirement by adding attributes to either <pc> or <ph>. But I think most of us see that as kind of a bad idea. And speaking for myself, it seems as bad an idea as using <sc>/<ec> in cases where it simply isn't needed.

Just a different point of view . . .,

Bryan

From: Andrew Pimlott [mailto:andrew@spartanconsultinginc.com]
Sent: Tuesday, December 13, 2011 1:46 PM
To: Schnabel, Bryan S
Cc: Yves Savourel; xliff-inline@lists.oasis-open.org
Subject: Re: [xliff-inline] XLIFF Inline Markup Subcommittee Teleconference - Dec-13-2011 - Summary

As a newcomer, I am not yet eligible to vote, but this is a chance to start

sharing my views. So here's my position on this ballot:

- It looks like most everyone agrees on the necessity of <sc>/<ec>. So if

<pc> were also included, it would be entirely redundant. Everything that

can be expressed with <pc> can also be expressed with <sc>/<ec>, and <pc>

becomes a "shorthand" for some forms of <sc>/<ec>. I think this is bad

design for several reasons:

- Consider how you would handle this as an implementor. When reading

XLIFF, you would almost certainly map both constructs to the same one

internally. When writing XLIFF, you have two choices: 1) always write

<sc>/<ec>; or 2) figure out when you can use <pc> and use write <pc> in

those cases, otherwise write <sc>/<ec>. Practically, why would anyone

do (2)? The result is everyone writes <sc>/<ec>. <pc> is useless

overhead.

- Lazy implementors (and they're all lazy) are likely to implement <pc>

right, and <sc>/<ec> sloppily. This is a serious practical problem, and

if you think about the industry's experience with standards, I think

you'll agree that implementors routinely do the easy parts and skip the

hard parts. We're much more likely to get correct implementations if

there is only one way to do it. (And of course we offer good developer

guidance.)

- Imagine a future specification for "XLIFF normalization" (which will be

necessary someday). The obvious thing to do is normalize <pc> to

<sc>/<ec>. So <pc> is just extra work.

- I'm not entirely against shorthands. They are good for human-writable

documents, and for saving space. But I don't think either consideration

applies here.

- Those who like <pc> for XML seem to agree that <sc>/<ec> are necessary

even for XML, in some cases. So nobody gets to live in a fairy land

without <sc>/<ec>. Why pretend? ;-) Maybe you can go a little way with

just <pc>, but then some day you have a use case that requires <sc>/<ec>.

You'll curse the day you started with <pc>!

- "XML friendliness" has little value for me. XLIFF is designed for

localizing any content, and while we should make sure it works well for

XML, I would be wary of any bias to working "best" with XML. Besides,

there's enough work between us and XLIFF 2.0 that giving special attention

to XML doesn't seem to be a good use of effort.

- (Ok, I can't resist veering into the philosophical.) I don't think

<sc>/<ec> is hackish at all. The right question to ask here is whether

overlapping markup has value in the real world. I think the answer is

obvious. Consider a word processor. They user selects words 1-3 and

presses "bold", then selects words 2-4 and presses "italics". How should

the word processor represent this? Overlapping markup is clearly a

reasonable option, and nested markup starts to look like a hack.

Overlapping markup may not be the common case, but it's not a freak

either. Don't treat it like one!

Some ideas borrowed from Ted Nelson:

http://www.xml.com/pub/a/w3j/s3.nelson.html

Andrew

On Tue, Dec 13, 2011 at 8:44 AM, Schnabel, Bryan S <bryan.s.schnabel@tektronix.com> wrote:

Hi Yves,

I wanted very much to attend this meeting to add my point of view on the "Type of codes Decision on what set of codes to use: a) ph/pc/sc/ec, or b) ph/sc/ec, or c) ph/pc, or d) ph."

But an unexpected work emergency kept me otherwise occupied (sounds strange that one could have a Web CMS emergency - but so it goes . . . ).

Though the attendees spoke in favor of b) - I just wanted to add that there was also a strong argument made against b) in the email thread (and in the interest of full disclosure, I am included in that camp). See

http://lists.oasis-open.org/archives/xliff-inline/201112/msg00008.html
http://lists.oasis-open.org/archives/xliff-inline/201112/msg00004.html

So I am grateful that the action is to have a ballot.

Thanks,

Bryan

________________________________________
From: xliff-inline@lists.oasis-open.org [xliff-inline@lists.oasis-open.org] On Behalf Of Yves Savourel [ysavourel@enlaso.com]
Sent: Tuesday, December 13, 2011 8:18 AM
To: xliff-inline@lists.oasis-open.org
Subject: [xliff-inline] XLIFF Inline Markup Subcommittee Teleconference - Dec-13-2011 - Summary

XLIFF Inline Markup Subcommittee Teleconference

Present: Fredrik, Yves, Andrew-P, Christian, Arle.
Regrets: Andrew-S

The summary of our main items is here:
http://wiki.oasis-open.org/xliff/OneContentModel/Requirements

Draft is under SVN, here:
http://tools.oasis-open.org/version-control/svn/xliff/trunk/inline-markup/
(This will be replaced by DocBook version soon)

=== Type of codes

Decision on what set of codes to use: a) ph/pc/sc/ec, or b) ph/sc/ec, or c) ph/pc, or d) ph.
http://lists.oasis-open.org/archives/xliff-inline/201111/msg00004.html

Fredrik: pc/sc/ec means you have to both types to handle -> more work for implementers.
And one cannot avoid sc/ec so that one is not optional.
Simpler and clearer.
Andrew-P: agree with Fredrik: redundant representation is more dangerous for various reasons.
Christian: no input.
ACTION ITEM: Yves will create a ballot.

=== Annotations representation
(req 1.5: Must allow to associate spans of content with metadata)

Current proposed representation:
http://lists.oasis-open.org/archives/xliff-inline/201110/msg00013.html

<mrk [id='id'] type='type' [value='some value'] [ref='someId'] [translate='yes|no']>...</mrk>

--- Overlapping mrk:

-> split them and merge them as needed.
May require an attribute to make the distinction between lone consecutive annotations and split annotations.
ACTION ITEM: Yves to try implementation.
Fredrik: actual best representation may depend on whether there is pc or not.

--- annotation with only translate attribute.
What type value to use?
-> 'generic' (with only translate attribute allowed in that case?)
Fredrik: yes, understood that last time.
Christian: Is it wise to create dependencies between attributes and values etc?
This is tricky. Leads to schema question and also checking implementation for conformance.
Fredrik: Kind of agree with that.
Maybe wording can be changed. E.g. if type='generic' then ignore other attribute is possible.
Christian: working with MAY can be tricky.
Andrew-P: probably ok to have the dependencies
Implementing conformance could be done.
Fredrik: We could say MUST ignore.

Yves was to bring the question of schema version 1.1 vs 1.0 to the TC.
-> Done here:
http://lists.oasis-open.org/archives/xliff/201111/msg00044.html
One answer:
http://lists.oasis-open.org/archives/xliff/201111/msg00046.html

Andrew-P: Agree with Rodolfo: need probably more than a schema to do proper validation.
Fredrik: Would prefer to validate as much as possible with schema.
Avoid to rely on a specific tool.
Christian: Maybe this is a general guidelines: Try first with 1.0, and if you can't: document it.
Being able to rely only on 1.0 would be very nice.
Fredrik: But then we would need to re-introduce things like <it>
Arle: Validation for TBX was hard with 1.0. so we used RNG
Fredrik: yes key of 1.1 is the better support for conditions, etc. But 1.1 is not final.

ACTION ITEM: Yves to bring that back to TC.

=== What about crc and assoc?

Those are not needed was last meeting consensus.

Yves had the action item to ask to TC if anyone is using crc or assoc and report on current status for those.
-> Done here:
http://lists.oasis-open.org/archives/xliff/201111/msg00051.html

Consensus would be to just drop then in 2.0.
ACTION ITEM: Yves double check presence of assoc.

=== Uniqueness of attribute names (added)

See:
http://lists.oasis-open.org/archives/xliff-inline/201112/msg00001.html

Andrew-P: Context based seem fine.
Fredrik: Agree too.

ACTION ITEM: Yves to escalate the question to TC.

Fredrik: Note on 'id' attribute. How to define uniqueness within a scope.
For example trans-unit id unique within <file>, etc.
We need to be very clear on those definitions.
Christian: what about xml:id?
Yves: xml:id needing to be unique at document level doesn't fit our id requirement.
Christian: Possibly have different namespaces for different type of ids. E.g. different prefix.
Would make check easier.

=== Representation of editing hints

Need to have notation for 'permissions'
Current permission we thought about:

1: a code can be deleted
2: a code can be replicated
3: a code can be moved out of its original order
4 (possibly): a code can be moved outside its original parent span

All were to think about representation for hints.

See: http://lists.oasis-open.org/archives/xliff-inline/201111/msg00008.html

Arle: Like the letter based representation.
Fredrik: Skeptical about bit-field or letters.
Second option better (human readable)
But problem when extending it. For example with default value.
One attribute per info allows flexibility in extending things.

Discussion on how to represent defaults for new info.

Andrew-P: Using one attribute for each info is simpler/non-ambiguous
Yves: that list should be more or less done with for 2.0
Could also use new attributes for new info
Fredrik: possible addition: can be edited
Arle: hopefully this should be a close list.

=== Bi-direction markers

Several tools use Unicode characters in 1.2.
Goes against W3C recommendation but seems common practice.
-> need to know about other tools
-> ask for info to W3C
Sent (but not passed the W3C check yet).

Some reference material:
- http://www.w3.org/International/questions/qa-bidi-controls
- http://www.w3.org/International/wiki/BidiProposal

Arle: asked the same question to Felix and Mark, both were against Unicode chararacters.
Reason: bad because they are 'invisible'. Tags are visible.
Andrew-P: experience with implementing Unicode chars: it works well.
Fredrik: in some case xml 'looks' wrong. But with using proper editor things are ok.
I've heard argument about using tags. Basically it's for editing xml with non-rendering editor.

=== Codes and effect on segmentation
(req 1.16: Inline codes should have a way to store information about the effect on the segmentation)

Related email:
http://lists.oasis-open.org/archives/xliff-inline/201109/msg00005.html

Skipped.

=== Representation of added codes

Need to clarify use of duplicated code.
e.g. why not re-use same code?
(See http://lists.oasis-open.org/archives/xliff-inline/201110/msg00021.html)

Skipped.

=== Any other business

--- I assume Andrew-P is in Pacific time zone.
That would make now 3 members for 7am.
Do we need/is it possible/ to shift the time of the SC 1 hour later?
Fredrik: fine with me.
Arle: ULI meeting is at that time.
Yves: will have to stay at this time for now then.

-end

---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-inline-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-inline-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-inline-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-inline-help@lists.oasis-open.org