[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Attention Implementors -- Rob, Don, Forrest, Ulrich
Attention: Rob, Don, Forrest, Ulrich This is on Thursday agenda. Please be prepared to discuss what your implementation does about it. (Dave, is the question relevant for Boeing CGM tools?) >Mailing-List: contact cgmo-webcgm-help@lists.oasis-open.org; run by ezmlm >X-No-Archive: yes >List-Post: <mailto:cgmo-webcgm@lists.oasis-open.org> >List-Help: <mailto:cgmo-webcgm-help@lists.oasis-open.org> >List-Unsubscribe: <mailto:cgmo-webcgm-unsubscribe@lists.oasis-open.org> >List-Subscribe: <mailto:cgmo-webcgm-subscribe@lists.oasis-open.org> >Delivered-To: mailing list cgmo-webcgm@lists.oasis-open.org >Reply-To: <dieter@itedo.com> >From: Dieter Weidenbrück <dieter@itedo.com> >To: "'Benoit Bezaire'" <benoit@itedo.com>, > <cgmo-webcgm@lists.oasis-open.org> >Date: Wed, 12 Oct 2005 07:52:57 +0200 >X-Mailer: Microsoft Office Outlook, Build 11.0.6353 >Thread-Index: AcXO2SVcZjQfxUjCQLGi1fIiGD9L9wAF2s4g >X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on > hermes.oasis-open.org >X-Spam-Status: No, hits=2.0 required=7.0 tests=HTML_MESSAGE autolearn=no > version=2.64 >X-Spam-Level: ** >Subject: RE: Re[6]: [cgmo-webcgm] implications of URI vs. IRI >Mailarmory-Level: >Mailarmory-Category: clean (0) >Mailarmory-Filter-Date: Tue, 11 Oct 2005 23:53:07 -0600 (MDT) >Mailarmory-Details: >UmFuZG9tSVak1VVM+kLioPy3ZCyxngm7bHWTkuNQgua4TJO7CRmtDwHIbXeXKeNNScv/t3n03IsuWxaENe1Q9Q== >X-RCPT-TO: <lofton@rockynet.com> >X-SpamCatcher-Score: 0 >X-SpamCatcher-IP: 127.0.0.1 >X-SpamCatcher-1: 9f148a978443d101904582efcde30104 > >All, > >Benoit is right, this is important. > >Consequences: >- if we go for "escaped only", most likely every file from the past will > be invalid if it had a space or similar in it. >- if we go for "non-escaped only" we will have no change compared to > WebCGM 1.0, however, we need to double-check whether this is in line > with the RFC. > >Questions: >- How did other authoring tools do this in the past? >- What do other viewer tools expect if they read an existing WebCGM file? > >I think this information is urgently needed to understand the situation >a bit better. > >Regards, >Dieter > > > -----Original Message----- > > From: Benoit Bezaire [mailto:benoit@itedo.com] > > Sent: Wednesday, October 12, 2005 5:05 AM > > To: cgmo-webcgm@lists.oasis-open.org > > Subject: Re[6]: [cgmo-webcgm] implications of URI vs. IRI > > > > Hi Lofton, > > > > I think that some of your questions are answered in 2.4.2: > > > > 2.4.2. When to Escape and Unescape > > > > A URI is always in an "escaped" form, since escaping or unescaping a > > completed URI might change its semantics. > > [...] > > Because the percent "%" character always has the reserved purpose of > > being the escape indicator, it must be escaped as "%25" in order to > > be used as data within a URI. Implementers should be careful not to > > escape or unescape the same string more than once, since unescaping > > an already unescaped string might lead to misinterpreting a percent > > data character as another escaped character, or vice versa in the > > case of escaping an already escaped string. > > > > One last comment; this is _again_ a three way conversation > > (Lofton, Dieter and myself)... everyone should be involved in > > this conversation (users and implementers, what do you want), > > you are all affected by this. We want a 'valid' solution that > > will have little disruption on WebCGM 1.0 content; let's try > > to work towards that goal. > > > > Regards, > > > > -- > > Benoit mailto:benoit@itedo.com > > > > > > Tuesday, October 11, 2005, 7:05:19 PM, Lofton wrote: > > > > LH> More... > > > > LH> I am giving some more thought to it to the ambiguity problem > > LH> about"both" (i.e., both forms allowed in the fragment, linkuri, > > LH> etc,a'la SVG.) > > > > LH> Firstly, a possible solution. One could always add a rule for > > LH> CGMinterpreters, that any %hh 3-tuple in a fragment (or linkuri > > LH> 1stparameter, or ...) will be take by the CGM interpreter > > as a URI > > LH> escapingsequence. So caveat to WebCGM generators ... > > LH> although the 'name'ApsAttr might allow something like > > that as part > > LH> of the 'name' value, youhad better not do it, because you will > > LH> create an ambiguity when you usethat 'name' value in a > > fragment (or > > LH> linkuri, DOM, XCF) and will NOT getthe result you want. > > > > LH> Secondly... > > > > LH> There is still something about the SVG sentence that bothers > > LH> me,"...must be a URI reference as defined in [RFC2396], or must > > LH> resultin a URI reference after the escaping procedure described > > LH> below isapplied". Specifically, was the *first* phrase > > ("must bea > > LH> URI reference as defined in [RFC2396]") meant to include > > LH> thecase(s): > > > > LH> 1.) it is all safe ASCII in its original data form, with > > no URI escapingneeded or present? > > LH> 2.) or was it maybe unsafe, but is already URI escaped? > > LH> 3.) or both? > > > > LH> e1 illustrates #1 (all safe, no problem characters, no escaping > > LH> needed ordone). e2 illustrates #2 (already escaped). > > > > LH> e1) <image href="rasterImage.png" .../> > > LH> e2) <image href="raster%20image.png" .../> > > > > LH> Are both valid in SVG? > > > > LH> I'm going to reread 2396 again. Chapter 2 talks about > > all thisstuff > > LH> (as well as questions like local encoding), but it is not > > LH> lightreading. I'm also thinking to ask Chris about his memory of > > LH> thesentence, particularly the intent of its first phrase. > > > > LH> -Lofton. > > > > LH> At 01:10 PM 10/11/2005 -0600, Lofton Henderson wrote: > > LH> At 05:20 PM 10/11/2005 > > LH> +0200,=?GB2312?B?RGlldGVyICBXZWlkZW5icqi5Y2s=?= wrote: > > LH> [...] > > LH> good, and agreed. > > > > > > LH> Not so fast! > > > > LH> Actually, I do agree that we should use the SVG interpretation, > > LH> ifpossible. I'm not sure how we ended up differently, > > since Chris > > LH> wasconsulting on and helping with this detail (it might be the > > LH> timedifference -- 1999 for WebCGM 1.0 versus 2001 for SVG > > -- Chris > > LH> and SVGmight have figured out properly in those two years). > > > > LH> My problem is: exactly how to do it. One logical method > > mightbe an > > LH> erratum on 1.0 -- logical because we ended up diverging > > from SVG1.0 > > LH> on that detail, and didn't intend to. (Would require someaction > > LH> within W3C, to update the errata file that is linked from > > theStatus > > LH> section of the WebCGM 1.0 Recommendation.) An erratum > > (inthe "both" > > LH> direction) would mean that both forms are valid > > 1.0content, from the > > LH> very beginning > > > > LH> Anther possibility: fix the language for 2.0, so that"both" > > LH> are allowed from 2.0 on. (This makes 1.0 contentproblematic, if > > LH> both forms have been used.) > > > > LH> About the question of "both"... > > > > >> The sentence '...must be a URIreference as defined in > > [RFC2396], or > > >> must result in a URI referenceafter the escaping procedure > > described > > >> below is applied" > > >> > > >> DW> The way I understand the SVG wording is that both > > forms wouldbe legal: > > >> > > >> DW>http://www.cgmopen.org/abc.cgm#name(myname with blank) > > >> DW>http://www.cgmopen.org/abc.cgm#name(my name%20with%20blank) > > > > > > LH> Rfc2396 makes it clear (section 2.3 and 2.4) that the > > presence of % > > LH> should tell a URI resolver that URI escaping is in effect > > -- % isn't > > LH> a valid reserved (delimiter or subdelimiter) character, > > nor a valid > > LH> unreserved character, for the URI. > > > > LH> However, % is a valid character in the repertoire of the 'name' > > LH> ApsAttr, right? So "%myFunnyName%" is a valid 'name' > > LH> APSattr in a WebCGM instance, right? And the 3-character > > "%20" is a > > LH> valid 'name' ApsAttr, right? > > > > LH> So if WebCGM allowed "both", and you encountered a fragment: > > > > LH> #name(a%20b) , > > > > LH> what would you give to the URI resolver? Two choices: > > > > LH> a%20b [assumes that the generator already applied uri-escaping] > > LH> a%2520b [assumes that generator did NOT uri-escape already] > > > > LH> [btw, hex for % is 0x25, so % as an actual URI character > > is given to > > LH> URI resolver as %25] > > > > LH> Thoughts? (This gives me a headache!) > > > > LH> -Lofton. > > > > > > LH> One more comment: > > LH> Spaces in "name" attributes have been allowed long before any > > LH> linkURI and/or XML rules existed, thus nobody ever thought about > > LH> this detail. Everything was stored in the CGM as the rules for > > LH> non-graphical strings mandated. > > LH> One could say that this could have been clarified in WebCGM 1.0, > > LH> however, I find it quite useful to have both forms available. > > > > LH> Dieter > > > > >> -----Original Message----- > > >> From: Benoit Bezaire [mailto:benoit@itedo.com] > > >> Sent: Tuesday, October 11, 2005 5:07 PM > > >> To: cgmo-webcgm@lists.oasis-open.org > > >> Subject: Re[4]: [cgmo-webcgm] implications of URI vs. IRI > > >> > > >> Hi Dieter, > > >> > > >> Thanks for the example, we are talking about the same thing. > > >> > > >> I understand that ATA and WebCGM has allowed spaces in URI > > fragments > > >> for the last 10 years, but from my interpretation of > > RFC2396; those > > >> linkuris are illegal. Here is a quote from Section 4.1 of > > >> http://www.ietf.org/rfc/rfc2396.txt > > >> "The character restrictions described in Section 2 for URI > > also apply > > >> to the fragment in a URI-reference." > > >> > > >> And by reading Section 2, you end up reading that spaces are not > > >> allowed. > > >> > > >> That being said, your interpretation of the SVG wording sounds > > >> acceptable. The sentence 'or must result in a URI > > reference after the > > >> escaping procedure' seems to be saving us! I'm in favor of adding > > >> wording to the spec to clarify this issue (the 3 bullet > > wording would > > >> be good also). > > >> > > >> I no longer have a preference if we should deprecate or not. > > >> On one side, I think that this is a can of worms and > > forcing escaping > > >> simplifies things; on the other, I agree that long %HH for Asian > > >> names is not ideal. > > >> > > >> Allowing both is probably the less painful approach for users and > > >> implementers at this time. > > >> > > >> Regards, > > >> > > >> -- > > >> Benoit mailto:benoit@itedo.com > > >> > > >> > > >> Tuesday, October 11, 2005, 10:15:06 AM, Dieter wrote: > > >> > > >> DW> Hi Benoit, > > >> > > >> DW> see inline > > >> > > >> >> -----Original Message----- > > >> >> From: Benoit Bezaire [mailto:benoit@itedo.com] > > >> >> Sent: Tuesday, October 11, 2005 3:48 PM > > >> >> To: cgmo-webcgm@lists.oasis-open.org > > >> >> Subject: Re[2]: [cgmo-webcgm] implications of URI vs. IRI > > >> >> > > >> >> Hi Dieter, > > >> >> > > >> >> You said: > > >> >> NOTE: If we required an escaped string inside the CGM now, > > >> this will > > >> >> make almost all existing files invalid ones as soon as a > > >> simple space > > >> >> is in a name attribute. > > >> >> > > >> >> You are talking about the 'name' attribute within a URI > > >> only, correct? > > >> >> Or, let me rephrase... > > >> >> Files which have a name attribute (containing a space) > > >> that is used > > >> >> in a URI become invalid, right? > > >> DW> I am referring to the link destination parameter of a > > >> linkuri attribute. > > >> DW> Yes, something like (pseudo-code) > > >> > > >> DW> linkuri "http://www.cgmopen.org/abc.cgm#name(my name > > with blank)" > > >> DW> "some title" "_blank" > > >> > > >> DW> would become illegal, and this is the form (without > > >> escaping) that > > >> DW> has been used forever in the ATA and WebCGM environment > > >> (almost 10 years now). > > >> > > >> >> > > >> >> I would be in favor of deprecating (i.e., authors should stop > > >> >> creating such files) the old behavior (no escaping) and > > >> adding 'a la' > > >> >> SVG wording to the spec. Like Dieter says, but with an > > emphasis on > > >> >> deprecating the old behavior. > > >> DW> The way I understand the SVG wording is that both forms > > >> would be legal: > > >> > > >> DW> http://www.cgmopen.org/abc.cgm#name(my name with blank) > > >> DW> http://www.cgmopen.org/abc.cgm#name(my name%20with%20blank) > > >> > > >> DW> I would NOT deprecate the first form, because it would > > >> force us to > > >> DW> build long strings for japanese or similar characters, > > >> following the > > >> DW> rules as described below. > > >> > > >> DW> Do you read the SVG spec the same way, or am I wrong? > > >> > > >> DW> Regards, > > >> DW> Dieter > > >> > > >> >> > > >> >> -- > > >> >> Benoit mailto:benoit@itedo.com > > >> >> > > >> >> > > >> >> Thursday, October 6, 2005, 7:52:43 AM, Dieter wrote: > > >> >> > > >> >> DW> All, > > >> >> > > >> >> DW> I am not yet convinced that we are heading in the right > > >> >> direction here. > > >> >> > > >> >> DW> Example: > > >> >> DW> Let's assume we have the string "nihon" inside a > > >> linkUri: "id(ÈÕ±¾)" > > >> >> > > >> >> DW> using UTF-16 (big endian) this is: 65 e5 67 2c (4 Bytes) > > >> >> converted > > >> >> DW> to UTF-8: EF BB BF E6 97 A5 E6 9C AC (9 Bytes) > > >> >> > > >> >> DW> and then you can apply escaping for all non-ascii chars > > >> >> > > >> >> DW> %EF%BB%BF%E6%97%A5%E6%9C%AC (27 Bytes) > > >> >> > > >> >> DW> and now we store it into the linkURI attribute, > > however, since > > >> >> DW> somewhere else in the file we have this string in japanese > > >> >> DW> characters as an ID, all non-graphical strings will be > > >> stored as > > >> >> DW> UTF-16 (could be > > >> >> DW> UTF-8 as well): > > >> >> > > >> >> DW> I save the writing, you end up with 54 bytes. > > >> >> > > >> >> DW> So we are moving from 4 bytes to 54 bytes. > > >> >> > > >> >> DW> I hope that this accurately describes the procedure > > >> that has been > > >> >> DW> discussed over the past couple of days. > > >> >> > > >> >> DW> Comparison to SVG: > > >> >> DW> In 5.3.2. [1], SVG says the following: > > >> >> > > >> >> DW> "The value of the href attribute must be a URI reference > > >> >> as defined > > >> >> DW> in [RFC2396], or must result in a URI reference after the > > >> >> escaping > > >> >> DW> procedure described below is applied. The procedure is > > >> >> applied when > > >> >> DW> passing the URI reference to a URI resolver." > > >> >> > > >> >> DW> Interesting to see the last sentence here. IMO this > > >> means, it is > > >> >> DW> perfectly legal to store the URI reference using any > > >> encoding, as > > >> >> DW> long as it will be transcoded to UTF-8 and escaped before > > >> >> passing it on to a URI resolver. > > >> >> > > >> >> DW> This has always been my understanding, and this is how > > >> all of our > > >> >> DW> products have been handling references. > > >> >> > > >> >> DW> NOTE: > > >> >> DW> If we required an escaped string inside the CGM now, this > > >> >> will make > > >> >> DW> almost all existing files invalid ones as soon as a > > >> >> simple space is > > >> >> DW> in a name attribute. > > >> >> > > >> >> DW> RECOMMENDATION: > > >> >> DW> Amend wording slightly to match watch SVG is doing and > > >> allow for > > >> >> DW> both styles, escaped and not escaped. > > >> >> > > >> >> DW> Comments? > > >> >> > > >> >> DW> Regards, > > >> >> DW> Dieter > > >> >> > > >> >> > > >> >> DW> [1] http://www.w3.org/TR/SVG11/struct.html#xlinkRefAttrs > > >> >> > > >> >> > > >> >> >> -----Original Message----- > > >> >> >> From: Lofton Henderson [mailto:lofton@rockynet.com] > > >> >> >> Sent: Wednesday, October 05, 2005 1:06 AM > > >> >> >> To: Benoit Bezaire; cgmo-webcgm@lists.oasis-open.org > > >> >> >> Subject: Re: [cgmo-webcgm] implications of URI vs. IRI > > >> >> >> > > >> >> >> At 05:09 PM 10/4/2005 -0400, Benoit Bezaire wrote: > > >> >> >> >Hi Lofton, > > >> >> >> > > > >> >> >> >I just did a quick search... I think that URI is only > > >> restricting > > >> >> >> >characters to US-ASCII; it has no control on the > > >> encoding (utf-8, > > >> >> >> >utf-16 etc...). > > >> >> >> > > > >> >> >> >In XML syntax such as XHTML and SVG, files can have just > > >> >> about any > > >> >> >> >encoding; I'm not aware of any special processing for the > > >> >> xlink:href > > >> >> >> >attribute (i.e., this is a URI, change the encoding to > > >> >> _blah_). It > > >> >> >> >wouldn't make any sense. The scope of the encoding is for > > >> >> >> the complete > > >> >> >> >document. > > >> >> >> > > > >> >> >> >The above is not a fact, only my understanding. > > >> >> >> > > >> >> >> It matches my understanding. And it is clear that XML > > >> and/or URI > > >> >> >> (rfc3986) require "URI escaping" for non-ASCII > > >> characters in URIs, > > >> >> >> i.e., for character that are outside of the ASCII > > >> repertoire. And > > >> >> >> this is independent of the character-set encoding of the URI. > > >> >> >> > > >> >> >> So finally, a URI from HTML into CGM containing a > > >> >> reference-by-name > > >> >> >> to "my object group" would be written like this: > > >> >> >> > > >> >> >> <a > > >> >> >> > > >> >> > > >> > > href="http://example.org/myCGM.cgm#name(my%20object%20group)">blah</a > > >> >> >> > > > >> >> >> > > >> >> >> and a WebCGM 'linkuri' first parameter would be this: > > >> >> >> > > >> >> >> http://example.org/myCGM.cgm#name(my%20object%20group) > > >> >> >> > > >> >> >> -Lofton. > > >> >> >> > > >> >> >> > > >> >> >> >Tuesday, September 20, 2005, 2:45:48 PM, Lofton wrote: > > >> >> >> > > > >> >> >> >LH> All -- > > >> >> >> > > > >> >> >> >LH> When I was putting together first unicode tests, > > >> Dieter also > > >> >> >> >LH> supplied me with this nifty "advanced" test. It gets > > >> >> >> into Japanese > > >> >> >> >LH> text for SF text like APS ids and names. > > >> >> >> > > > >> >> >> >LH> It highlights an interesting implication of our decision > > >> >> >> to stick > > >> >> >> >LH> with URI instead of switching to IRI. URI encoding > > >> >> >> requires that > > >> >> >> >LH> any non-ASCII characters are included by the > > "URI escaping > > >> >> >> >LH> mechanism", see WebCGM > > >> >> >> >3.1.1.4 > > >> >> >> >LH> [1], and the more detailed XML description [2]. > > >> >> >> Basically, get the > > >> >> >> >LH> **UTF8** representation of the characters, and replace > > >> >> >> each byte in > > >> >> >> >LH> that representation by the 3-character string %HH, where > > >> >> >> HH is the > > >> >> >> >LH> hex representation of the byte. > > >> >> >> > > > >> >> >> >LH> So suppose consider for example the 2-character id of > > >> >> >> the object in > > >> >> >> >LH> the upper-left box, and its use in a link from the > > >> >> object in the > > >> >> >> >upper-right box. > > >> >> >> > > > >> >> >> >LH> If that id were the two characters c1c2, lets suppose > > >> >> >> that it could > > >> >> >> >LH> be represented by the 4 utf8 bytes b1b2b3b4 (I'm just > > >> >> guessing > > >> >> >> >LH> about "4", since UTF8 is variable length, it could be > > >> >> >> more). Then > > >> >> >> >LH> to put that id > > >> >> >> >into > > >> >> >> >LH> a URI string, it would have to be the > > 12-character string: > > >> >> >> > > > >> >> >> >LH> %hh%hh%hh%hh > > >> >> >> > > > >> >> >> >LH> where the hh are the are the 4 pairs of hex digits that > > >> >> >> represent > > >> >> >> >LH> the 4 > > >> >> >> >LH> utf16 bytes. I.e., the CGM URI for the link would be: > > >> >> >> > > > >> >> >> >LH> #id(%hh%hh%hh%hh, view_context) > > >> >> >> > > > >> >> >> >LH> Side question. Does URI (rfc3986 [3]) restrict only the > > >> >> >> character > > >> >> >> >LH> repertoire of the URI, or does it restrict also the > > >> >> >> encoding? I.e., > > >> >> >> >LH> can a URI be encoded in ascii, isoLatin1, or utf8, or > > >> >> utf16, or > > >> >> >> >LH> whatever, as > > >> >> >> >long > > >> >> >> >LH> as it restricts its repertoire to the URI repertoire? > > >> >> I suspect > > >> >> >> >"yes", but > > >> >> >> >LH> I don't know the answer. It would be interesting for > > >> >> someone to > > >> >> >> >research it. > > >> >> >> > > > >> >> >> >LH> Thoughts? > > >> >> >> > > > >> >> >> >LH> Regards, > > >> >> >> >LH> -Lofton. > > >> >> >> > > > >> >> >> >LH> [0] > > >> >> >> >LH> > > >> >> >> > > >> >> > > >> http://docs.oasis-open.org/webcgm/v2.0/WebCGM20-IC.html#webcgm_3_1_ > > >> >> >> >LH> 1_4 [1] > > >> >> >> >LH> > > >> >> http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent > > >> >> >> >LH> [3] URI: http://www.ietf.org/rfc/rfc3986.txt > > >> >> >> >LH> [4] IRI: http://www.ietf.org/rfc/rfc3987.txt > > > > > > > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]