[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)
In general, XSLT is philosophically obligated to create well-formed output. Since we cannot validate the escaped XML prior to processing it, if we were to try to transform escaped XML into real XML in the output stream, we would have no way to ensure that the result would be well formed.
XSLT 1.0 allowed a way around this, disable-output-escaping. But, rightfully so, in XSLT 2.0 they deprecated it (http://www.w3.org/TR/xslt20/#disable-output-escaping). Some XSLT processors still support it, but the spec does not require them to. Hence the reluctance for XML developers to support escaped XML in the input as a means to generate real XML in the output. I hope this sheds a little light on my dogmatic opinions over the years.
Also, thanks for pointing out that by allowing escaped whole elements in @fs we’d lose our constraint. I think that is huge. Without that constraint we would risk writers introducing non-HTML elements via @fs. I see your point that they can already introduce non-HTML attributes. However we do have processing requirements that prevent that.
Ps, do I understand your vote in the straw poll to be for (1)? Or is it a non-vote?
Thanks for reminding me the choice for subFs. It’s ok with me if we go that way. I was just wondering if using separate attributes was not making the work harder.
There are consideration in favors of subFs, but they are not strong:
- without it you have to derive the closing tag from the opening one (or introduce startFs/endFs)
- without it we cannot constraint what goes in fs anymore (but then that’s already happening subFs).
As for the escaping the content in a potential fs with the whole tag: I’m not sure I understand your reluctance here. It’s not like you have any choice. < and & (and " or ') must be escaped as part of the XML syntax, not any XLIFF specific choice.
Why is it a problem? Doesn’t an XSLT processor give you the normal text content when you want it? A DOM engine would give me the value of fs="<&apos>" as "<&>".
I now realize that it would be easy to not notice that I continue my explanation of the two options, and make my request for participation in a straw poll, at the end of Yves’ note (past the second set of +++++++++++++++++++++++++). Please read beyond Yves’ note to see the whole explanation.
Of if that was already clear to you, sorry for the extra email.
Regarding: item 142: https://lists.oasis-open.org/archives/xliff-comment/201310/msg00031.html - I guess it comes down to (1) keeping the subFs and adding a delimiter between attribute name/value pairs, or (2) eliminating @subFs and adding escaped XML to @fs.
The definition of subFs says:
The subFs MUST only be used to carry attribute name/value comma-delimited pairs for attributes that are valid for the HTML element
identified by the accompanied fs attribute.
Example: fs:fs="img" fs:subFs="src,smileface.png"
It is unclear to me if you can have more than one pair of name/value per subFs. I assume you can because a) the definition uses
plural here with "the subFs" (so: one subFs with many pairs); and b) it wouldn't make sense to restrict attributes to a single one.
But it should be a lot clearer.
Also the example show that the delimiter comma is used to separate the two parts of a pair, but what is the delimiter between pairs?
If I assume it is space, then there is no ways to define a value containing a space since only \ and , are escaped.
Overall I think it would be a lot simpler to have only one fs attribute that hold the full element to use. Is there a reason why
You are asking for us to eliminate @subFs and just put the whole element, including the attribute name/value pairs(s) in the @fs.
I think when we debated this back when I wrote the module, that same idea was proposed, and ultimately voted down, in favor of the @subFs method.
I do not have all the details of that debate fresh in my mind, nor have I researched the prior debate much. But no doubt my broken-record-objections to escaping XML were part of it.
So in crafting my counter-proposal to dropping the @subFs, I recalled the idea of delimiting each name/value pair with a backslash (\). This because the spec already says to escape “,” and “\” with a backslash, and we say to use a comma to separate attribute name from value.
Let’s call this proposal (1).
<ph id=”p1” fs=”img” subFs=”src,smile.png\alt,My Happy Smile\title,Smiling faces are nice” />
Resolves to this:
<img src="" alt=”My Happy Smile” title=”Smiling faces are nice” />
Pros: as long as there are no escaped commas or backslashes in the value, this is quite easy to parse (I tried with XSLT, Perl, and Java).
But if you have something like this:
<ph id=”p1” fs=”img” subFs=”src,c:\\docs\\images\\smile.png\alt,My Happy Smile\title,Smiling faces\, are nice” />
And want this:
<img src="" alt=”My Happy Smile” title=”Smiling faces, are nice” />
You add complexity.
Cons: when you encounter an escaped comma or backslash it gets very complex (but doable) to parse the string. I was eventually able to do it with XSLT, but it took two call-templates, and a lot of string parsing with XPath expressions.
So let’s revisit your idea to eliminate @subFs and (sorry if this is an unfriendly term) overload @fs, call it proposal (2).
We could do this (putting aside my disdain for escaped XML for now):
<ph id=”p1” fs=”<img src="" alt=’My Happy Smile’ title=’Smiling faces, are nice’ />” />
Pros: one less attribute to parse.
Cons: I think all the complexity is still there. It is just not as easy to spot. Plus we are escaping XML. This is very unfriendly for XML processing (XSLT).
There are ambiguities. In the example above I substituted double quotes for single quotes. But what if the attribute had an apostrophe, like title=”Smiling faces, can’t be denied”? I suppose we could propose escaping single quotes with ' or U+0027, or something, but can we guess what all the escapes that need to be specified are? I could probably come up with other tricky use cases that are every bit as stumping.
This is not an easy one to solve. But I vote for (1).
So to Yves, and please, to others in the TC (Yves and usually find ourselves in a binary debate on the topic of escaping XML – I think we would both welcome fresh points of view), do you vote for (1) or (2), or some even better alternative not yet considered?
I will support the winner of the straw poll, and not gripe if my preference does not prevail.