xliff message

Subject: RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

From: Yves Savourel <ysavourel@enlaso.com>
To: <xliff@lists.oasis-open.org>
Date: Thu, 14 Nov 2013 14:31:42 -0700

> if we were to try to transform escaped XML into real XML in the output stream 

I'm afraid still don't understand the problem on the input side.
The input is not "escaped XML" it's normal HTML stored in an XML attribute.


> Since we cannot validate the escaped XML prior to processing it...

If the content of the fs attribute does not follow the XML syntax the XSLT processor can't process that file. There is no need to
"validate" it.

It sounds like you're saying XSLT processors don't parse/resolve character entity references.

I think your issue seems to be with the output, where XSLT doesn't let you mix plain text and HTML.
So your solution is to construct those HTML elements from structured information you get from the FS attributes, instead of simply
transferring raw HTML. XSLT is probably the only programming language with that issue.


> Also, thanks for pointing out that by allowing escaped 
> whole elements in @fs we?d lose our constraint. 
> I think that is huge. Without that constraint we would risk 
> writers introducing non-HTML elements via @fs. I see your 
> point that they can already introduce non-HTML attributes. 
> However we do have processing requirements that prevent that.

I didn't see any PR that provides restriction on attributes.
How would you decide what is a risky element/attribute from a non-risky one?
One advantage of FS is to be able to provide more complex interactive preview of the document, using for example scripts and onclick
attributes.

It seems to me either the FS module will be very restrictive and not very appealing (one could do almost the same with a XSLT
stylesheet hosted outside the XLIFF document), or it'll be powerful but need the user to accept the risks.

Cheers,
-yves



From: Schnabel, Bryan S [mailto:bryan.s.schnabel@tektronix.com] 
Sent: Thursday, November 14, 2013 1:44 PM
To: Yves Savourel; xliff@lists.oasis-open.org
Subject: RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

Hi Yves,

In general, XSLT is philosophically obligated to create well-formed output. Since we cannot validate the escaped XML prior to
processing it,  if we were to try to transform escaped XML into real XML in the output stream, we would have no way to ensure that
the result would be well formed.

XSLT 1.0 allowed a way around this, disable-output-escaping. But, rightfully so, in XSLT 2.0 they deprecated it
(http://www.w3.org/TR/xslt20/#disable-output-escaping). Some XSLT processors still support it, but the spec does not require them
to. Hence the reluctance for XML developers to support escaped XML in the input as a means to generate real XML in the output. I
hope this sheds a little light on my dogmatic opinions over the years.

Also, thanks for pointing out that by allowing escaped whole elements in @fs we?d lose our constraint. I think that is huge. Without
that constraint we would risk writers introducing non-HTML elements via @fs. I see your point that they can already introduce
non-HTML attributes. However we do have processing requirements that prevent that.

Thanks,

Bryan
Ps, do I understand your vote in the straw poll to be for (1)? Or is it a non-vote?

From: Yves Savourel [mailto:ysavourel@enlaso.com] 
Sent: Thursday, November 14, 2013 11:48 AM
To: Schnabel, Bryan S; xliff@lists.oasis-open.org
Subject: RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

Hi Bryan,

Thanks for reminding me the choice for subFs. It?s ok with me if we go that way. I was just wondering if using separate attributes
was not making the work harder.

There are consideration in favors of subFs, but they are not strong: 
- without it you have to derive the closing tag from the opening one (or introduce startFs/endFs)
- without it we cannot constraint what goes in fs anymore (but then that?s already happening subFs).

As for the escaping the content in a potential fs with the whole tag: I?m not sure I understand your reluctance here. It?s not like
you have any choice. < and & (and " or ') must be escaped as part of the XML syntax, not any XLIFF specific choice.

Why is it a problem? Doesn?t an XSLT processor give you the normal text content when you want it? A DOM engine would give me the
value of fs="&lt;&apos>" as "<&>".

-ys



From: Schnabel, Bryan S [mailto:bryan.s.schnabel@tektronix.com] 
Sent: Thursday, November 14, 2013 12:24 PM
To: Schnabel, Bryan S; xliff@lists.oasis-open.org
Cc: Yves Savourel
Subject: RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

All,

I now realize that it would be easy to not notice that I continue my explanation of the two options, and make my request for
participation in a straw poll, at the end of Yves? note (past the second set of +++++++++++++++++++++++++). Please read beyond Yves?
note to see the whole explanation.

Of if that was already clear to you, sorry for the extra email.

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Schnabel, Bryan S
Sent: Thursday, November 14, 2013 10:39 AM
To: xliff@lists.oasis-open.org
Cc: Yves Savourel (ysavourel@enlaso.com)
Subject: [xliff] Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

Hi Yves,

Regarding: item 142: https://lists.oasis-open.org/archives/xliff-comment/201310/msg00031.html - I guess it comes down to (1) keeping
the subFs and adding a delimiter between attribute name/value pairs, or (2) eliminating @subFs and adding escaped XML to @fs.

++++++++++++++++++++++++++
subFs value and spaces
Hi all,

The definition of subFs says:

[[
The subFs MUST only be used to carry attribute name/value comma-delimited pairs for attributes that are valid for the HTML element
identified by the accompanied fs attribute.
Example: fs:fs="img" fs:subFs="src,smileface.png"
]]

It is unclear to me if you can have more than one pair of name/value per subFs. I assume you can because a) the definition uses
plural here with "the subFs" (so: one subFs with many pairs); and b) it wouldn't make sense to restrict attributes to a single one.
But it should be a lot clearer.

Also the example show that the delimiter comma is used to separate the two parts of a pair, but what is the delimiter between pairs?
If I assume it is space, then there is no ways to define a value containing a space since only \ and , are escaped.

Overall I think it would be a lot simpler to have only one fs attribute that hold the full element to use. Is there a reason why
not?

Regards,
-yves

++++++++++++++++++++++++++

You are asking for us to eliminate @subFs and just put the whole element, including the attribute name/value pairs(s) in the @fs.

I think when we debated this back when I wrote the module, that same idea was proposed, and ultimately voted down, in favor of the
@subFs method. 

I do not have all the details of that debate fresh in my mind, nor have I researched the prior debate much. But no doubt my
broken-record-objections to escaping XML were part of it.

So in crafting my counter-proposal to dropping the @subFs, I recalled the idea of delimiting each name/value pair with a backslash
(\). This because the spec already says to escape ?,? and ?\? with a backslash, and we say to use a comma to separate attribute name
from value.

Let?s call this proposal (1).

<ph id=?p1? fs=?img? subFs=?src,smile.png\alt,My Happy Smile\title,Smiling faces are nice? />

Resolves to this:

<img src=?smile.png? alt=?My Happy Smile? title=?Smiling faces are nice? />

Pros: as long as there are no escaped commas or backslashes in the value, this is quite easy to parse (I tried with XSLT, Perl, and
Java). 

But if you have something like this:

<ph id=?p1? fs=?img? subFs=?src,c:\\docs\\images\\smile.png\alt,My Happy Smile\title,Smiling faces\, are nice? />

And want this:

<img src=?c:\docs\images\smile.png? alt=?My Happy Smile? title=?Smiling faces, are nice? />

You add complexity.

Cons: when you encounter an escaped comma or backslash it gets very complex (but doable) to parse the string. I was eventually able
to do it with XSLT, but it took two call-templates, and a lot of string parsing with XPath expressions.

So let?s revisit your idea to eliminate @subFs and (sorry if this is an unfriendly term) overload @fs, call it proposal (2).

We could do this (putting aside  my disdain for escaped XML for now):

<ph id=?p1? fs=?&lt;img src=?c:\docs\images\smile.png? alt=?My Happy Smile? title=?Smiling faces, are nice? />? />

Pros: one less attribute to parse.

Cons: I think all the complexity is still there. It is just not as easy to spot. Plus we are escaping XML. This is very unfriendly
for XML processing (XSLT).

There are ambiguities. In the example above I substituted double quotes for single quotes. But what if the attribute had an
apostrophe, like  title=?Smiling faces, can?t be denied?? I suppose we could propose escaping single quotes with &apos; or U+0027,
or something, but can we guess what all the escapes that need to be specified are? I could probably come up with other tricky use
cases that are every bit as stumping.

This is not an easy one to solve. But I vote for (1).

So to Yves, and please, to others in the TC (Yves and usually find ourselves in a binary debate on the topic of escaping XML ? I
think we would both welcome fresh points of view), do you vote for (1) or (2), or some even better alternative not yet considered?

I will support the winner of the straw poll, and not gripe if my preference does not prevail.

Thanks,

Bryan

Follow-Ups:
- RE: [xliff] RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)
  - From: "Schnabel, Bryan S" <bryan.s.schnabel@tektronix.com>

References:
- Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)
  - From: "Schnabel, Bryan S" <bryan.s.schnabel@tektronix.com>
- RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)
  - From: "Schnabel, Bryan S" <bryan.s.schnabel@tektronix.com>
- RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)
  - From: Yves Savourel <ysavourel@enlaso.com>
- RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)
  - From: "Schnabel, Bryan S" <bryan.s.schnabel@tektronix.com>