RE: [xliff] RE: Re: subFs value and spaces (item 142)

Hi Bryan,

> From XML's point of view (never mind XSLT for now),

> this is not an element:

>

Ah thanks! now I see what you meant.

Yes, "" is not an HTML element, but it's valid XML that your processor will resolve as "".

(You were talking about the content (escaped HTML (not XML)) while to me XML is the container (XML as the format needing to escape meta characters). That's clear now.

So your issue is that you want to make sure the HTML output generated is valid.

The more I look at FS, the less I think that is possible, except with very strict restrictions.

The subFs content would still have the second part of each pair potentially with syntax errors that you would have to validate. And if you can validate those, you can probably validate any snippet of raw HTML stored in fs as well.

> It sounds like either solution will work.

Yes.

> One advantage of (1) is that it lets us constrain

> that only our prescribed set of HTML elements are used.

Yes and no: one could still set PR restriction on raw HTML content.

It would be just a harder to verify.

> And a second is that it is XML-friendly

> (for what it's worth).

I assume that by "XML-friendly" you mean "It will be easier to check the HTML data and create a valid HTML output" (IMO that has nothing to do with the format of the container: You would get the same values in JSON for example).

On that aspect:

In case (1) you have to read two attributes, check them (and one has to be tokenized and then validated like an HTML snippet), reconstruct things, and finally do the output.

In case (2) you get one value, validate the snippet and output it.

Cheers,

-yves

-----Original Message-----

From: Schnabel, Bryan S [mailto:bryan.s.schnabel@tektronix.com]

Sent: Thursday, November 14, 2013 3:50 PM

To: Yves Savourel; xliff@lists.oasis-open.org

Subject: RE: [xliff] RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

From XML's point of view (never mind XSLT for now), this is not an element:

This is a string of text that has an entity, followed by the letter p, followed by greater than.

I can use Perl, for example, to convert it to . Probably because Perl does not care if the input or output are XML.

With XSLT 1.0 I could use d-o-e to do the same.

But there is no compliant XML parser that will treat that string as an element.

Further, this:

is perfectly valid markup.

But if I used d-o-e in XSLT to serialize this, it would try to make an element like this <888>.

And that is not a valid element name.

Whether we care about all of this or not is another matter. It sounds like either solution will work. One advantage of (1) is that it lets us constrain that only our prescribed set of HTML elements are used. And a second is that it is XML-friendly (for what it's worth).

________________________________________

From: xliff@lists.oasis-open.org [xliff@lists.oasis-open.org] on behalf of Yves Savourel [ysavourel@enlaso.com]

Sent: Thursday, November 14, 2013 1:31 PM

To: xliff@lists.oasis-open.org

Subject: [xliff] RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

> if we were to try to transform escaped XML into real XML in the output

> stream

I'm afraid still don't understand the problem on the input side.

The input is not "escaped XML" it's normal HTML stored in an XML attribute.

> Since we cannot validate the escaped XML prior to processing it...

If the content of the fs attribute does not follow the XML syntax the XSLT processor can't process that file. There is no need to "validate" it.

It sounds like you're saying XSLT processors don't parse/resolve character entity references.

I think your issue seems to be with the output, where XSLT doesn't let you mix plain text and HTML.

So your solution is to construct those HTML elements from structured information you get from the FS attributes, instead of simply transferring raw HTML. XSLT is probably the only programming language with that issue.

> Also, thanks for pointing out that by allowing escaped whole elements

> in @fs we’d lose our constraint.

> I think that is huge. Without that constraint we would risk writers

> introducing non-HTML elements via @fs. I see your point that they can

> already introduce non-HTML attributes.

> However we do have processing requirements that prevent that.

I didn't see any PR that provides restriction on attributes.

How would you decide what is a risky element/attribute from a non-risky one?

One advantage of FS is to be able to provide more complex interactive preview of the document, using for example scripts and onclick attributes.

It seems to me either the FS module will be very restrictive and not very appealing (one could do almost the same with a XSLT stylesheet hosted outside the XLIFF document), or it'll be powerful but need the user to accept the risks.

Cheers,

-yves

From: Schnabel, Bryan S [mailto:bryan.s.schnabel@tektronix.com]

Sent: Thursday, November 14, 2013 1:44 PM

To: Yves Savourel; xliff@lists.oasis-open.org

Subject: RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

Hi Yves,

In general, XSLT is philosophically obligated to create well-formed output. Since we cannot validate the escaped XML prior to processing it, if we were to try to transform escaped XML into real XML in the output stream, we would have no way to ensure that the result would be well formed.

XSLT 1.0 allowed a way around this, disable-output-escaping. But, rightfully so, in XSLT 2.0 they deprecated it (http://www.w3.org/TR/xslt20/#disable-output-escaping). Some XSLT processors still support it, but the spec does not require them to. Hence the reluctance for XML developers to support escaped XML in the input as a means to generate real XML in the output. I hope this sheds a little light on my dogmatic opinions over the years.

Also, thanks for pointing out that by allowing escaped whole elements in @fs we’d lose our constraint. I think that is huge. Without that constraint we would risk writers introducing non-HTML elements via @fs. I see your point that they can already introduce non-HTML attributes. However we do have processing requirements that prevent that.

Thanks,

Bryan

Ps, do I understand your vote in the straw poll to be for (1)? Or is it a non-vote?

From: Yves Savourel [mailto:ysavourel@enlaso.com]

Sent: Thursday, November 14, 2013 11:48 AM

To: Schnabel, Bryan S; xliff@lists.oasis-open.org

Subject: RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

Hi Bryan,

Thanks for reminding me the choice for subFs. It’s ok with me if we go that way. I was just wondering if using separate attributes was not making the work harder.

There are consideration in favors of subFs, but they are not strong:

- without it you have to derive the closing tag from the opening one (or introduce startFs/endFs)

- without it we cannot constraint what goes in fs anymore (but then that’s already happening subFs).

As for the escaping the content in a potential fs with the whole tag: I’m not sure I understand your reluctance here. It’s not like you have any choice. < and & (and " or ') must be escaped as part of the XML syntax, not any XLIFF specific choice.

Why is it a problem? Doesn’t an XSLT processor give you the normal text content when you want it? A DOM engine would give me the value of fs="<&apos>" as "<&>".

-ys

From: Schnabel, Bryan S [mailto:bryan.s.schnabel@tektronix.com]

Sent: Thursday, November 14, 2013 12:24 PM

To: Schnabel, Bryan S; xliff@lists.oasis-open.org

Cc: Yves Savourel

Subject: RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

All,

I now realize that it would be easy to not notice that I continue my explanation of the two options, and make my request for participation in a straw poll, at the end of Yves’ note (past the second set of +++++++++++++++++++++++++). Please read beyond Yves’

note to see the whole explanation.

Of if that was already clear to you, sorry for the extra email.

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Schnabel, Bryan S

Sent: Thursday, November 14, 2013 10:39 AM

To: xliff@lists.oasis-open.org

Cc: Yves Savourel (ysavourel@enlaso.com)

Subject: [xliff] Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)

Hi Yves,

Regarding: item 142: https://lists.oasis-open.org/archives/xliff-comment/201310/msg00031.html - I guess it comes down to (1) keeping the subFs and adding a delimiter between attribute name/value pairs, or (2) eliminating @subFs and adding escaped XML to @fs.

++++++++++++++++++++++++++

subFs value and spaces

Hi all,

The definition of subFs says:

[[

The subFs MUST only be used to carry attribute name/value comma-delimited pairs for attributes that are valid for the HTML element identified by the accompanied fs attribute.

Example: fs:fs="img" fs:subFs="src,smileface.png"

]]

It is unclear to me if you can have more than one pair of name/value per subFs. I assume you can because a) the definition uses plural here with "the subFs" (so: one subFs with many pairs); and b) it wouldn't make sense to restrict attributes to a single one.

But it should be a lot clearer.

Also the example show that the delimiter comma is used to separate the two parts of a pair, but what is the delimiter between pairs?

If I assume it is space, then there is no ways to define a value containing a space since only \ and , are escaped.

Overall I think it would be a lot simpler to have only one fs attribute that hold the full element to use. Is there a reason why not?

Regards,

-yves

++++++++++++++++++++++++++

You are asking for us to eliminate @subFs and just put the whole element, including the attribute name/value pairs(s) in the @fs.

I think when we debated this back when I wrote the module, that same idea was proposed, and ultimately voted down, in favor of the @subFs method.

I do not have all the details of that debate fresh in my mind, nor have I researched the prior debate much. But no doubt my broken-record-objections to escaping XML were part of it.

So in crafting my counter-proposal to dropping the @subFs, I recalled the idea of delimiting each name/value pair with a backslash (\). This because the spec already says to escape “,” and “\” with a backslash, and we say to use a comma to separate attribute name from value.

Let’s call this proposal (1).

Resolves to this:

Pros: as long as there are no escaped commas or backslashes in the value, this is quite easy to parse (I tried with XSLT, Perl, and Java).

But if you have something like this:

And want this:

You add complexity.

Cons: when you encounter an escaped comma or backslash it gets very complex (but doable) to parse the string. I was eventually able to do it with XSLT, but it took two call-templates, and a lot of string parsing with XPath expressions.

So let’s revisit your idea to eliminate @subFs and (sorry if this is an unfriendly term) overload @fs, call it proposal (2).

We could do this (putting aside my disdain for escaped XML for now):

Pros: one less attribute to parse.

Cons: I think all the complexity is still there. It is just not as easy to spot. Plus we are escaping XML. This is very unfriendly for XML processing (XSLT).

There are ambiguities. In the example above I substituted double quotes for single quotes. But what if the attribute had an apostrophe, like title=”Smiling faces, can’t be denied”? I suppose we could propose escaping single quotes with ' or U+0027, or something, but can we guess what all the escapes that need to be specified are? I could probably come up with other tricky use cases that are every bit as stumping.

This is not an easy one to solve. But I vote for (1).

So to Yves, and please, to others in the TC (Yves and usually find ourselves in a binary debate on the topic of escaping XML – I think we would both welcome fresh points of view), do you vote for (1) or (2), or some even better alternative not yet considered?

I will support the winner of the straw poll, and not gripe if my preference does not prevail.

Thanks,

Bryan

---------------------------------------------------------------------

To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at:

https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

xliff message