OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] RE: Re: subFs value and spaces (item 142) - not an official ballot, but please vote (straw poll?)


Hi Bryan, all,

Please see comments below, or just skip at the bottom, after the -- conclusion -- mark.


>> that your processor will resolve as "<p>".
>
> Not true. Any compliant XML processor will resolve this not as "<p>"
> - but as "&lt;p>" (it will literally output the entity, not the "<").

An *XML* parser (that's what I meant by XML processor) will resolve "&lt;p>" as "<p>".

An *XSLT* processor will first resolve "&lt;p> as "<p>" and then (if you make it do so) output that string in the format you
selected (xml, html, text, etc.)

If I have:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<doc>
 <elem attr="&lt;p>"/>
</doc>

With 

<xsl:output method="html"/>
 <xsl:template match="//elem">
output=<xsl:value-of select="@attr" />
 </xsl:template>
</xsl:stylesheet>

I get:

output=&lt;p&gt;

If I replace method="html" by method="text", I get:

output=<p>

This shows that XSLT does read "&lt;p>" correctly as "<p>" and is perfectly capable of outputting it as-it or escaped depending on
your choice.

The fact that XSLT is not very flexible in doing mixed output is a problem specific to XSLT.
Maybe (instead of outputting as html and fighting to un-escape the HTML tags) it's possible to output as text and have an extra
function to do the escaping for the HTML content (just like other programming languages).


>> you want to make sure the HTML output generated is valid.
>> The more I look at FS, the less I think that is possible
>
> It is abundantly possible with the backslash separated name/value 
> pair proposal I called option (2). 
> And it is even more easy to do with Tom's latest proposal.

Sure, if you decompose (pre-tokenize) the HTML to output into basic parts a) you by-pass any tokenization need; b) you can
relatively easily control the parts and c) you can output in XSLT without dealing with un/escaping.

Tom's proposal has those advantages.

But it has also several major drawbacks:

- You simply shift the burden of tokenizing the HTML to the tool/author generating the FS attributes.

- You have to validate which FS attributes can go with the element in the fs value.

- We have now potentially dozens of attributes in many places in the XLIFF document, as deep as the inline codes, and in places
where there is no extension points, just for a simple HTML preview output. And we have to preserve it through round-trips, etc.

- But worst, as Tom noted: "Ultimately it would still be up to the processing agent to ensure the attributes correspond to the HTML
elements and that the values are appropriate in the HTML context".

At the end you still rely on the author to put the right HTML data where they need to go, you just make her/him do it in little
pieces to facilitate your processing. In some ways it give more opportunities for mistakes and invalid data.


-- conclusion --

Bryan: I think I'll never be really happy with FS other than a simple raw HTML value associated with an XLIFF element. Most of what
we try to do around that seems (to me) to be because of XSLT limitations with that solution.

My initial comment was just:

a) can we put several attribute/value pairs?
b) what delimiter to use in that case?

My question "Overall I think it would be a lot simpler to have only one fs attribute that hold the full element to use. Is there a
reason why not?" was just a question and has now been answered: It's mainly because XSLT can't really deal with it.

So to go back to my initial concerns: can we write multiple attributes and how to separate them your answers are yes, and your
solution is to use ',' to separate the pairs and to use '\' to separate the attribute name from its value (and use a '\' prefix to
escape those two characters when used literally).

If the specification is changed to reflect this that would resolves my comment 142.

Cheers,
-yves




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]