OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cgmo-webcgm message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: Re[4]: [cgmo-webcgm] implications of URI vs. IRI


Hi Benoit,

good, and agreed.

One more comment:
Spaces in "name" attributes have been allowed long before any linkURI and/or
XML rules
existed, thus nobody ever thought about this detail. Everything was stored
in the CGM
as the rules for non-graphical strings mandated.
One could say that this could have been clarified in WebCGM 1.0, however, I
find it
quite useful to have both forms available.

Dieter 

> -----Original Message-----
> From: Benoit Bezaire [mailto:benoit@itedo.com] 
> Sent: Tuesday, October 11, 2005 5:07 PM
> To: cgmo-webcgm@lists.oasis-open.org
> Subject: Re[4]: [cgmo-webcgm] implications of URI vs. IRI
> 
> Hi Dieter,
> 
> Thanks for the example, we are talking about the same thing.
> 
> I understand that ATA and WebCGM has allowed spaces in URI 
> fragments for the last 10 years, but from my interpretation 
> of RFC2396; those linkuris are illegal. Here is a quote from 
> Section 4.1 of http://www.ietf.org/rfc/rfc2396.txt
> "The character restrictions described in Section 2 for URI 
> also apply to the fragment in a URI-reference."
> 
> And by reading Section 2, you end up reading that spaces are 
> not allowed.
> 
> That being said, your interpretation of the SVG wording 
> sounds acceptable. The sentence 'or must result in a URI 
> reference after the escaping procedure' seems to be saving 
> us! I'm in favor of adding wording to the spec to clarify 
> this issue (the 3 bullet wording would be good also).
> 
> I no longer have a preference if we should deprecate or not. 
> On one side, I think that this is a can of worms and forcing 
> escaping simplifies things; on the other, I agree that long 
> %HH for Asian names is not ideal.
> 
> Allowing both is probably the less painful approach for users 
> and implementers at this time.
> 
> Regards,
> 
> -- 
>  Benoit   mailto:benoit@itedo.com
> 
> 
> Tuesday, October 11, 2005, 10:15:06 AM, Dieter wrote:
> 
> DW> Hi Benoit,
> 
> DW> see inline
> 
> >> -----Original Message-----
> >> From: Benoit Bezaire [mailto:benoit@itedo.com]
> >> Sent: Tuesday, October 11, 2005 3:48 PM
> >> To: cgmo-webcgm@lists.oasis-open.org
> >> Subject: Re[2]: [cgmo-webcgm] implications of URI vs. IRI
> >> 
> >> Hi Dieter,
> >> 
> >> You said:
> >> NOTE: If we required an escaped string inside the CGM now, 
> this will 
> >> make almost all existing files invalid ones as soon as a 
> simple space 
> >> is in a name attribute.
> >> 
> >> You are talking about the 'name' attribute within a URI 
> only, correct?
> >> Or, let me rephrase...
> >> Files which have a name attribute (containing a space) 
> that is used 
> >> in a URI become invalid, right?
> DW> I am referring to the link destination parameter of a 
> linkuri attribute.
> DW> Yes, something like (pseudo-code)
> 
> DW> linkuri "http://www.cgmopen.org/abc.cgm#name(my name with blank)" 
> DW> "some title" "_blank"
> 
> DW> would become illegal, and this is the form (without 
> escaping) that 
> DW> has been used forever in the ATA and WebCGM environment 
> (almost 10 years now).
>  
> >> 
> >> I would be in favor of deprecating (i.e., authors should stop 
> >> creating such files) the old behavior (no escaping) and 
> adding 'a la' 
> >> SVG wording to the spec. Like Dieter says, but with an emphasis on 
> >> deprecating the old behavior.
> DW> The way I understand the SVG wording is that both forms 
> would be legal:
> 
> DW> http://www.cgmopen.org/abc.cgm#name(my name with blank) 
> DW> http://www.cgmopen.org/abc.cgm#name(my name%20with%20blank)
> 
> DW> I would NOT deprecate the first form, because it would 
> force us to 
> DW> build long strings for japanese or similar characters, 
> following the 
> DW> rules as described below.
> 
> DW> Do you read the SVG spec the same way, or am I wrong?
> 
> DW> Regards,
> DW> Dieter
> 
> >> 
> >> -- 
> >>  Benoit   mailto:benoit@itedo.com
> >> 
> >>  
> >> Thursday, October 6, 2005, 7:52:43 AM, Dieter wrote:
> >> 
> >> DW> All,
> >> 
> >> DW> I am not yet convinced that we are heading in the right
> >> direction here.
> >> 
> >> DW> Example:
> >> DW> Let's assume we have the string "nihon" inside a 
> linkUri: "id(ÈÕ±¾)"
> >> 
> >> DW> using UTF-16 (big endian) this is: 65 e5 67 2c (4 Bytes)
> >> converted
> >> DW> to UTF-8: EF BB BF E6 97 A5 E6 9C AC (9 Bytes)
> >> 
> >> DW> and then you can apply escaping for all non-ascii chars
> >> 
> >> DW> %EF%BB%BF%E6%97%A5%E6%9C%AC (27 Bytes)
> >> 
> >> DW> and now we store it into the linkURI attribute, however, since 
> >> DW> somewhere else in the file we have this string in japanese 
> >> DW> characters as an ID, all non-graphical strings will be 
> stored as
> >> DW> UTF-16 (could be
> >> DW> UTF-8 as well):
> >> 
> >> DW> I save the writing, you end up with 54 bytes.
> >> 
> >> DW> So we are moving from 4 bytes to 54 bytes.
> >> 
> >> DW> I hope that this accurately describes the procedure 
> that has been 
> >> DW> discussed over the past couple of days.
> >> 
> >> DW> Comparison to SVG:
> >> DW> In 5.3.2. [1], SVG says the following:
> >> 
> >> DW> "The value of the href attribute must be a URI reference
> >> as defined
> >> DW> in [RFC2396], or must result in a URI reference after the
> >> escaping
> >> DW> procedure described below is applied. The procedure is
> >> applied when
> >> DW> passing the URI reference to a URI resolver."
> >> 
> >> DW> Interesting to see the last sentence here. IMO this 
> means, it is 
> >> DW> perfectly legal to store the URI reference using any 
> encoding, as 
> >> DW> long as it will be transcoded to UTF-8 and escaped before
> >> passing it on to a URI resolver.
> >> 
> >> DW> This has always been my understanding, and this is how 
> all of our 
> >> DW> products have been handling references.
> >> 
> >> DW> NOTE:
> >> DW> If we required an escaped string inside the CGM now, this
> >> will make
> >> DW> almost all existing files invalid ones as soon as a
> >> simple space is
> >> DW> in a name attribute.
> >> 
> >> DW> RECOMMENDATION:
> >> DW> Amend wording slightly to match watch SVG is doing and 
> allow for 
> >> DW> both styles, escaped and not escaped.
> >> 
> >> DW> Comments?
> >> 
> >> DW> Regards,
> >> DW> Dieter
> >> 
> >> 
> >> DW> [1] http://www.w3.org/TR/SVG11/struct.html#xlinkRefAttrs
> >> 
> >> 
> >> >> -----Original Message-----
> >> >> From: Lofton Henderson [mailto:lofton@rockynet.com]
> >> >> Sent: Wednesday, October 05, 2005 1:06 AM
> >> >> To: Benoit Bezaire; cgmo-webcgm@lists.oasis-open.org
> >> >> Subject: Re: [cgmo-webcgm] implications of URI vs. IRI
> >> >> 
> >> >> At 05:09 PM 10/4/2005 -0400, Benoit Bezaire wrote:
> >> >> >Hi Lofton,
> >> >> >
> >> >> >I just did a quick search... I think that URI is only 
> restricting 
> >> >> >characters to US-ASCII; it has no control on the 
> encoding (utf-8,
> >> >> >utf-16 etc...).
> >> >> >
> >> >> >In XML syntax such as XHTML and SVG, files can have just
> >> about any
> >> >> >encoding; I'm not aware of any special processing for the
> >> xlink:href
> >> >> >attribute (i.e., this is a URI, change the encoding to
> >> _blah_). It
> >> >> >wouldn't make any sense. The scope of the encoding is for
> >> >> the complete
> >> >> >document.
> >> >> >
> >> >> >The above is not a fact, only my understanding.
> >> >> 
> >> >> It matches my understanding.  And it is clear that XML 
> and/or URI
> >> >> (rfc3986) require "URI escaping" for non-ASCII 
> characters in URIs, 
> >> >> i.e., for character that are outside of the ASCII 
> repertoire.  And 
> >> >> this is independent of the character-set encoding of the URI.
> >> >> 
> >> >> So finally, a URI from HTML into CGM containing a
> >> reference-by-name
> >> >> to "my object group" would be written like this:
> >> >> 
> >> >> <a
> >> >> 
> >> 
> href="http://example.org/myCGM.cgm#name(my%20object%20group)">blah</a
> >> >> >
> >> >> 
> >> >> and a WebCGM 'linkuri' first parameter would be this:
> >> >> 
> >> >> http://example.org/myCGM.cgm#name(my%20object%20group)
> >> >> 
> >> >> -Lofton.
> >> >> 
> >> >> 
> >> >> >Tuesday, September 20, 2005, 2:45:48 PM, Lofton wrote:
> >> >> >
> >> >> >LH> All --
> >> >> >
> >> >> >LH> When I was putting together first unicode tests, 
> Dieter also 
> >> >> >LH> supplied me with this nifty "advanced" test.  It gets
> >> >> into Japanese
> >> >> >LH> text for SF text like APS ids and names.
> >> >> >
> >> >> >LH> It highlights an interesting implication of our decision
> >> >> to stick
> >> >> >LH> with URI instead of switching to IRI.  URI encoding
> >> >> requires that
> >> >> >LH> any non-ASCII characters are included by the "URI escaping 
> >> >> >LH> mechanism", see WebCGM
> >> >> >3.1.1.4
> >> >> >LH> [1], and the more detailed XML description [2].  
> >> >> Basically, get the
> >> >> >LH> **UTF8** representation of the characters, and replace
> >> >> each byte in
> >> >> >LH> that representation by the 3-character string %HH, where
> >> >> HH is the
> >> >> >LH> hex representation of the byte.
> >> >> >
> >> >> >LH> So suppose consider for example the 2-character id of
> >> >> the object in
> >> >> >LH> the upper-left box, and its use in a link from the
> >> object in the
> >> >> >upper-right box.
> >> >> >
> >> >> >LH> If that id were the two characters c1c2, lets suppose
> >> >> that it could
> >> >> >LH> be represented by the 4 utf8 bytes b1b2b3b4 (I'm just
> >> guessing
> >> >> >LH> about "4", since UTF8 is variable length, it could be
> >> >> more).  Then
> >> >> >LH> to put that id
> >> >> >into
> >> >> >LH> a URI string, it would have to be the 12-character string:
> >> >> >
> >> >> >LH> %hh%hh%hh%hh
> >> >> >
> >> >> >LH> where the hh are the are the 4 pairs of hex digits that
> >> >> represent
> >> >> >LH> the 4
> >> >> >LH> utf16 bytes. I.e., the CGM URI for the link would be:
> >> >> >
> >> >> >LH> #id(%hh%hh%hh%hh, view_context)
> >> >> >
> >> >> >LH> Side question.  Does URI (rfc3986 [3]) restrict only the
> >> >> character
> >> >> >LH> repertoire of the URI, or does it restrict also the
> >> >> encoding? I.e.,
> >> >> >LH> can a URI be encoded in ascii, isoLatin1, or utf8, or
> >> utf16, or
> >> >> >LH> whatever, as
> >> >> >long
> >> >> >LH> as it restricts its repertoire to the URI repertoire? 
> >>  I suspect
> >> >> >"yes", but
> >> >> >LH> I don't know the answer.  It would be interesting for
> >> someone to
> >> >> >research it.
> >> >> >
> >> >> >LH> Thoughts?
> >> >> >
> >> >> >LH> Regards,
> >> >> >LH> -Lofton.
> >> >> >
> >> >> >LH> [0]
> >> >> >LH> 
> >> >>
> >> http://docs.oasis-open.org/webcgm/v2.0/WebCGM20-IC.html#webcgm_3_1_
> >> >> >LH> 1_4 [1]
> >> >> >LH>
> >> http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent
> >> >> >LH> [3] URI:  http://www.ietf.org/rfc/rfc3986.txt
> >> >> >LH> [4] IRI:  http://www.ietf.org/rfc/rfc3987.txt
> 
> 
> 
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]