[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: Re[4]: [cgmo-webcgm] implications of URI vs. IRI
Hi Benoit, good, and agreed. One more comment: Spaces in "name" attributes have been allowed long before any linkURI and/or XML rules existed, thus nobody ever thought about this detail. Everything was stored in the CGM as the rules for non-graphical strings mandated. One could say that this could have been clarified in WebCGM 1.0, however, I find it quite useful to have both forms available. Dieter > -----Original Message----- > From: Benoit Bezaire [mailto:benoit@itedo.com] > Sent: Tuesday, October 11, 2005 5:07 PM > To: cgmo-webcgm@lists.oasis-open.org > Subject: Re[4]: [cgmo-webcgm] implications of URI vs. IRI > > Hi Dieter, > > Thanks for the example, we are talking about the same thing. > > I understand that ATA and WebCGM has allowed spaces in URI > fragments for the last 10 years, but from my interpretation > of RFC2396; those linkuris are illegal. Here is a quote from > Section 4.1 of http://www.ietf.org/rfc/rfc2396.txt > "The character restrictions described in Section 2 for URI > also apply to the fragment in a URI-reference." > > And by reading Section 2, you end up reading that spaces are > not allowed. > > That being said, your interpretation of the SVG wording > sounds acceptable. The sentence 'or must result in a URI > reference after the escaping procedure' seems to be saving > us! I'm in favor of adding wording to the spec to clarify > this issue (the 3 bullet wording would be good also). > > I no longer have a preference if we should deprecate or not. > On one side, I think that this is a can of worms and forcing > escaping simplifies things; on the other, I agree that long > %HH for Asian names is not ideal. > > Allowing both is probably the less painful approach for users > and implementers at this time. > > Regards, > > -- > Benoit mailto:benoit@itedo.com > > > Tuesday, October 11, 2005, 10:15:06 AM, Dieter wrote: > > DW> Hi Benoit, > > DW> see inline > > >> -----Original Message----- > >> From: Benoit Bezaire [mailto:benoit@itedo.com] > >> Sent: Tuesday, October 11, 2005 3:48 PM > >> To: cgmo-webcgm@lists.oasis-open.org > >> Subject: Re[2]: [cgmo-webcgm] implications of URI vs. IRI > >> > >> Hi Dieter, > >> > >> You said: > >> NOTE: If we required an escaped string inside the CGM now, > this will > >> make almost all existing files invalid ones as soon as a > simple space > >> is in a name attribute. > >> > >> You are talking about the 'name' attribute within a URI > only, correct? > >> Or, let me rephrase... > >> Files which have a name attribute (containing a space) > that is used > >> in a URI become invalid, right? > DW> I am referring to the link destination parameter of a > linkuri attribute. > DW> Yes, something like (pseudo-code) > > DW> linkuri "http://www.cgmopen.org/abc.cgm#name(my name with blank)" > DW> "some title" "_blank" > > DW> would become illegal, and this is the form (without > escaping) that > DW> has been used forever in the ATA and WebCGM environment > (almost 10 years now). > > >> > >> I would be in favor of deprecating (i.e., authors should stop > >> creating such files) the old behavior (no escaping) and > adding 'a la' > >> SVG wording to the spec. Like Dieter says, but with an emphasis on > >> deprecating the old behavior. > DW> The way I understand the SVG wording is that both forms > would be legal: > > DW> http://www.cgmopen.org/abc.cgm#name(my name with blank) > DW> http://www.cgmopen.org/abc.cgm#name(my name%20with%20blank) > > DW> I would NOT deprecate the first form, because it would > force us to > DW> build long strings for japanese or similar characters, > following the > DW> rules as described below. > > DW> Do you read the SVG spec the same way, or am I wrong? > > DW> Regards, > DW> Dieter > > >> > >> -- > >> Benoit mailto:benoit@itedo.com > >> > >> > >> Thursday, October 6, 2005, 7:52:43 AM, Dieter wrote: > >> > >> DW> All, > >> > >> DW> I am not yet convinced that we are heading in the right > >> direction here. > >> > >> DW> Example: > >> DW> Let's assume we have the string "nihon" inside a > linkUri: "id(ÈÕ±¾)" > >> > >> DW> using UTF-16 (big endian) this is: 65 e5 67 2c (4 Bytes) > >> converted > >> DW> to UTF-8: EF BB BF E6 97 A5 E6 9C AC (9 Bytes) > >> > >> DW> and then you can apply escaping for all non-ascii chars > >> > >> DW> %EF%BB%BF%E6%97%A5%E6%9C%AC (27 Bytes) > >> > >> DW> and now we store it into the linkURI attribute, however, since > >> DW> somewhere else in the file we have this string in japanese > >> DW> characters as an ID, all non-graphical strings will be > stored as > >> DW> UTF-16 (could be > >> DW> UTF-8 as well): > >> > >> DW> I save the writing, you end up with 54 bytes. > >> > >> DW> So we are moving from 4 bytes to 54 bytes. > >> > >> DW> I hope that this accurately describes the procedure > that has been > >> DW> discussed over the past couple of days. > >> > >> DW> Comparison to SVG: > >> DW> In 5.3.2. [1], SVG says the following: > >> > >> DW> "The value of the href attribute must be a URI reference > >> as defined > >> DW> in [RFC2396], or must result in a URI reference after the > >> escaping > >> DW> procedure described below is applied. The procedure is > >> applied when > >> DW> passing the URI reference to a URI resolver." > >> > >> DW> Interesting to see the last sentence here. IMO this > means, it is > >> DW> perfectly legal to store the URI reference using any > encoding, as > >> DW> long as it will be transcoded to UTF-8 and escaped before > >> passing it on to a URI resolver. > >> > >> DW> This has always been my understanding, and this is how > all of our > >> DW> products have been handling references. > >> > >> DW> NOTE: > >> DW> If we required an escaped string inside the CGM now, this > >> will make > >> DW> almost all existing files invalid ones as soon as a > >> simple space is > >> DW> in a name attribute. > >> > >> DW> RECOMMENDATION: > >> DW> Amend wording slightly to match watch SVG is doing and > allow for > >> DW> both styles, escaped and not escaped. > >> > >> DW> Comments? > >> > >> DW> Regards, > >> DW> Dieter > >> > >> > >> DW> [1] http://www.w3.org/TR/SVG11/struct.html#xlinkRefAttrs > >> > >> > >> >> -----Original Message----- > >> >> From: Lofton Henderson [mailto:lofton@rockynet.com] > >> >> Sent: Wednesday, October 05, 2005 1:06 AM > >> >> To: Benoit Bezaire; cgmo-webcgm@lists.oasis-open.org > >> >> Subject: Re: [cgmo-webcgm] implications of URI vs. IRI > >> >> > >> >> At 05:09 PM 10/4/2005 -0400, Benoit Bezaire wrote: > >> >> >Hi Lofton, > >> >> > > >> >> >I just did a quick search... I think that URI is only > restricting > >> >> >characters to US-ASCII; it has no control on the > encoding (utf-8, > >> >> >utf-16 etc...). > >> >> > > >> >> >In XML syntax such as XHTML and SVG, files can have just > >> about any > >> >> >encoding; I'm not aware of any special processing for the > >> xlink:href > >> >> >attribute (i.e., this is a URI, change the encoding to > >> _blah_). It > >> >> >wouldn't make any sense. The scope of the encoding is for > >> >> the complete > >> >> >document. > >> >> > > >> >> >The above is not a fact, only my understanding. > >> >> > >> >> It matches my understanding. And it is clear that XML > and/or URI > >> >> (rfc3986) require "URI escaping" for non-ASCII > characters in URIs, > >> >> i.e., for character that are outside of the ASCII > repertoire. And > >> >> this is independent of the character-set encoding of the URI. > >> >> > >> >> So finally, a URI from HTML into CGM containing a > >> reference-by-name > >> >> to "my object group" would be written like this: > >> >> > >> >> <a > >> >> > >> > href="http://example.org/myCGM.cgm#name(my%20object%20group)">blah</a > >> >> > > >> >> > >> >> and a WebCGM 'linkuri' first parameter would be this: > >> >> > >> >> http://example.org/myCGM.cgm#name(my%20object%20group) > >> >> > >> >> -Lofton. > >> >> > >> >> > >> >> >Tuesday, September 20, 2005, 2:45:48 PM, Lofton wrote: > >> >> > > >> >> >LH> All -- > >> >> > > >> >> >LH> When I was putting together first unicode tests, > Dieter also > >> >> >LH> supplied me with this nifty "advanced" test. It gets > >> >> into Japanese > >> >> >LH> text for SF text like APS ids and names. > >> >> > > >> >> >LH> It highlights an interesting implication of our decision > >> >> to stick > >> >> >LH> with URI instead of switching to IRI. URI encoding > >> >> requires that > >> >> >LH> any non-ASCII characters are included by the "URI escaping > >> >> >LH> mechanism", see WebCGM > >> >> >3.1.1.4 > >> >> >LH> [1], and the more detailed XML description [2]. > >> >> Basically, get the > >> >> >LH> **UTF8** representation of the characters, and replace > >> >> each byte in > >> >> >LH> that representation by the 3-character string %HH, where > >> >> HH is the > >> >> >LH> hex representation of the byte. > >> >> > > >> >> >LH> So suppose consider for example the 2-character id of > >> >> the object in > >> >> >LH> the upper-left box, and its use in a link from the > >> object in the > >> >> >upper-right box. > >> >> > > >> >> >LH> If that id were the two characters c1c2, lets suppose > >> >> that it could > >> >> >LH> be represented by the 4 utf8 bytes b1b2b3b4 (I'm just > >> guessing > >> >> >LH> about "4", since UTF8 is variable length, it could be > >> >> more). Then > >> >> >LH> to put that id > >> >> >into > >> >> >LH> a URI string, it would have to be the 12-character string: > >> >> > > >> >> >LH> %hh%hh%hh%hh > >> >> > > >> >> >LH> where the hh are the are the 4 pairs of hex digits that > >> >> represent > >> >> >LH> the 4 > >> >> >LH> utf16 bytes. I.e., the CGM URI for the link would be: > >> >> > > >> >> >LH> #id(%hh%hh%hh%hh, view_context) > >> >> > > >> >> >LH> Side question. Does URI (rfc3986 [3]) restrict only the > >> >> character > >> >> >LH> repertoire of the URI, or does it restrict also the > >> >> encoding? I.e., > >> >> >LH> can a URI be encoded in ascii, isoLatin1, or utf8, or > >> utf16, or > >> >> >LH> whatever, as > >> >> >long > >> >> >LH> as it restricts its repertoire to the URI repertoire? > >> I suspect > >> >> >"yes", but > >> >> >LH> I don't know the answer. It would be interesting for > >> someone to > >> >> >research it. > >> >> > > >> >> >LH> Thoughts? > >> >> > > >> >> >LH> Regards, > >> >> >LH> -Lofton. > >> >> > > >> >> >LH> [0] > >> >> >LH> > >> >> > >> http://docs.oasis-open.org/webcgm/v2.0/WebCGM20-IC.html#webcgm_3_1_ > >> >> >LH> 1_4 [1] > >> >> >LH> > >> http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent > >> >> >LH> [3] URI: http://www.ietf.org/rfc/rfc3986.txt > >> >> >LH> [4] IRI: http://www.ietf.org/rfc/rfc3987.txt > > > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]