OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cgmo-webcgm message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Attention Implementors -- Rob, Don, Forrest, Ulrich


Attention:  Rob, Don, Forrest, Ulrich

This is on Thursday agenda.  Please be prepared to discuss what your 
implementation does about it.  (Dave, is the question relevant for Boeing 
CGM tools?)

>Mailing-List: contact cgmo-webcgm-help@lists.oasis-open.org; run by ezmlm
>X-No-Archive: yes
>List-Post: <mailto:cgmo-webcgm@lists.oasis-open.org>
>List-Help: <mailto:cgmo-webcgm-help@lists.oasis-open.org>
>List-Unsubscribe: <mailto:cgmo-webcgm-unsubscribe@lists.oasis-open.org>
>List-Subscribe: <mailto:cgmo-webcgm-subscribe@lists.oasis-open.org>
>Delivered-To: mailing list cgmo-webcgm@lists.oasis-open.org
>Reply-To: <dieter@itedo.com>
>From: Dieter  Weidenbrück <dieter@itedo.com>
>To: "'Benoit Bezaire'" <benoit@itedo.com>,
>         <cgmo-webcgm@lists.oasis-open.org>
>Date: Wed, 12 Oct 2005 07:52:57 +0200
>X-Mailer: Microsoft Office Outlook, Build 11.0.6353
>Thread-Index: AcXO2SVcZjQfxUjCQLGi1fIiGD9L9wAF2s4g
>X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on
>         hermes.oasis-open.org
>X-Spam-Status: No, hits=2.0 required=7.0 tests=HTML_MESSAGE autolearn=no
>         version=2.64
>X-Spam-Level: **
>Subject: RE: Re[6]: [cgmo-webcgm] implications of URI vs. IRI
>Mailarmory-Level:
>Mailarmory-Category: clean (0)
>Mailarmory-Filter-Date: Tue, 11 Oct 2005 23:53:07 -0600 (MDT)
>Mailarmory-Details: 
>UmFuZG9tSVak1VVM+kLioPy3ZCyxngm7bHWTkuNQgua4TJO7CRmtDwHIbXeXKeNNScv/t3n03IsuWxaENe1Q9Q==
>X-RCPT-TO: <lofton@rockynet.com>
>X-SpamCatcher-Score: 0
>X-SpamCatcher-IP: 127.0.0.1
>X-SpamCatcher-1: 9f148a978443d101904582efcde30104
>
>All,
>
>Benoit is right, this is important.
>
>Consequences:
>- if we go for "escaped only", most likely every file from the past will
>   be invalid if it had a space or similar in it.
>- if we go for "non-escaped only" we will have no change compared to
>   WebCGM 1.0, however, we need to double-check whether this is in line
>   with the RFC.
>
>Questions:
>- How did other authoring tools do this in the past?
>- What do other viewer tools expect if they read an existing WebCGM file?
>
>I think this information is urgently needed to understand the situation
>a bit better.
>
>Regards,
>Dieter
>
> > -----Original Message-----
> > From: Benoit Bezaire [mailto:benoit@itedo.com]
> > Sent: Wednesday, October 12, 2005 5:05 AM
> > To: cgmo-webcgm@lists.oasis-open.org
> > Subject: Re[6]: [cgmo-webcgm] implications of URI vs. IRI
> >
> > Hi Lofton,
> >
> > I think that some of your questions are answered in 2.4.2:
> >
> > 2.4.2. When to Escape and Unescape
> >
> >   A URI is always in an "escaped" form, since escaping or unescaping a
> >   completed URI might change its semantics.
> >   [...]
> >   Because the percent "%" character always has the reserved purpose of
> >   being the escape indicator, it must be escaped as "%25" in order to
> >   be used as data within a URI.  Implementers should be careful not to
> >   escape or unescape the same string more than once, since unescaping
> >   an already unescaped string might lead to misinterpreting a percent
> >   data character as another escaped character, or vice versa in the
> >   case of escaping an already escaped string.
> >
> > One last comment; this is _again_ a three way conversation
> > (Lofton, Dieter and myself)... everyone should be involved in
> > this conversation (users and implementers, what do you want),
> > you are all affected by this. We want a 'valid' solution that
> > will have little disruption on WebCGM 1.0 content; let's try
> > to work towards that goal.
> >
> > Regards,
> >
> > --
> >  Benoit   mailto:benoit@itedo.com
> >
> >
> > Tuesday, October 11, 2005, 7:05:19 PM, Lofton wrote:
> >
> > LH> More...
> >
> > LH> I am giving some more thought to it to the ambiguity problem
> > LH> about"both" (i.e., both forms allowed in the fragment, linkuri,
> > LH> etc,a'la SVG.)
> >
> > LH> Firstly, a possible solution.  One could always add a rule for
> > LH> CGMinterpreters, that any %hh 3-tuple in a fragment (or linkuri
> > LH> 1stparameter, or ...) will be take by the CGM interpreter
> > as a URI
> > LH> escapingsequence.  So caveat to WebCGM generators ...
> > LH> although the 'name'ApsAttr might allow something like
> > that as part
> > LH> of the 'name' value, youhad better not do it, because you will
> > LH> create an ambiguity when you usethat 'name' value in a
> > fragment (or
> > LH> linkuri, DOM, XCF) and will NOT getthe result you want.
> >
> > LH> Secondly...
> >
> > LH> There is still something about the SVG sentence that bothers
> > LH> me,"...must be a URI reference as defined in [RFC2396], or must
> > LH> resultin a URI reference after the escaping procedure described
> > LH> below isapplied".  Specifically, was the *first* phrase
> > ("must bea
> > LH> URI reference as defined in [RFC2396]") meant to include
> > LH> thecase(s):
> >
> > LH> 1.) it is all safe ASCII in its original data form, with
> > no URI escapingneeded or present?
> > LH> 2.) or was it maybe unsafe, but is already URI escaped?
> > LH> 3.) or both?
> >
> > LH> e1 illustrates #1 (all safe, no problem characters, no escaping
> > LH> needed ordone).  e2 illustrates #2 (already escaped).
> >
> > LH> e1)  <image href="rasterImage.png" .../>
> > LH> e2)  <image href="raster%20image.png" .../>
> >
> > LH> Are both valid in SVG?
> >
> > LH> I'm going to reread 2396 again.  Chapter 2 talks about
> > all thisstuff
> > LH> (as well as questions like local encoding), but it is not
> > LH> lightreading.  I'm also thinking to ask Chris about his memory of
> > LH> thesentence, particularly the intent of its first phrase.
> >
> > LH> -Lofton.
> >
> > LH> At 01:10 PM 10/11/2005 -0600, Lofton Henderson wrote:
> > LH> At 05:20 PM 10/11/2005
> > LH> +0200,=?GB2312?B?RGlldGVyICBXZWlkZW5icqi5Y2s=?= wrote:
> > LH> [...]
> > LH> good, and agreed.
> >
> >
> > LH> Not so fast!
> >
> > LH> Actually, I do agree that we should use the SVG interpretation,
> > LH> ifpossible.  I'm not sure how we ended up differently,
> > since Chris
> > LH> wasconsulting on and helping with this detail (it might be the
> > LH> timedifference -- 1999 for WebCGM 1.0 versus 2001 for SVG
> > -- Chris
> > LH> and SVGmight have figured out properly in those two years).
> >
> > LH> My problem is:  exactly how to do it.  One logical method
> > mightbe an
> > LH> erratum on 1.0 -- logical because we ended up diverging
> > from SVG1.0
> > LH> on that detail, and didn't intend to.  (Would require someaction
> > LH> within W3C, to update the errata file that is linked from
> > theStatus
> > LH> section of the WebCGM 1.0 Recommendation.)  An erratum
> > (inthe "both"
> > LH> direction) would mean that both forms are valid
> > 1.0content, from the
> > LH> very beginning
> >
> > LH> Anther possibility:  fix the language for 2.0, so that"both"
> > LH> are allowed from 2.0 on.  (This makes 1.0 contentproblematic, if
> > LH> both forms have been used.)
> >
> > LH> About the question of "both"...
> >
> > >> The sentence '...must be a URIreference as defined in
> > [RFC2396], or
> > >> must result in a URI referenceafter the escaping procedure
> > described
> > >> below is applied"
> > >>
> > >> DW> The way I understand the SVG wording is that both
> > forms wouldbe legal:
> > >>
> > >> DW>http://www.cgmopen.org/abc.cgm#name(myname with blank)
> > >> DW>http://www.cgmopen.org/abc.cgm#name(my name%20with%20blank)
> >
> >
> > LH> Rfc2396 makes it clear (section 2.3 and 2.4) that the
> > presence of %
> > LH> should tell a URI resolver that URI escaping is in effect
> > -- % isn't
> > LH> a valid reserved (delimiter or subdelimiter) character,
> > nor a valid
> > LH> unreserved character, for the URI.
> >
> > LH> However, % is a valid character in the repertoire of the 'name'
> > LH> ApsAttr, right?  So "%myFunnyName%" is a valid 'name'
> > LH> APSattr in a WebCGM instance, right?  And the 3-character
> > "%20" is a
> > LH> valid 'name' ApsAttr, right?
> >
> > LH> So if WebCGM allowed "both", and you encountered a fragment:
> >
> > LH> #name(a%20b) ,
> >
> > LH> what would you give to the URI resolver?  Two choices:
> >
> > LH> a%20b  [assumes that the generator already applied uri-escaping]
> > LH> a%2520b  [assumes that generator did NOT uri-escape already]
> >
> > LH> [btw, hex for % is 0x25, so % as an actual URI character
> > is given to
> > LH> URI resolver as %25]
> >
> > LH> Thoughts?  (This gives me a headache!)
> >
> > LH> -Lofton.
> >
> >
> > LH> One more comment:
> > LH> Spaces in "name" attributes have been allowed long before any
> > LH> linkURI and/or XML rules existed, thus nobody ever thought about
> > LH> this detail. Everything was stored in the CGM as the rules for
> > LH> non-graphical strings mandated.
> > LH> One could say that this could have been clarified in WebCGM 1.0,
> > LH> however, I find it quite useful to have both forms available.
> >
> > LH> Dieter
> >
> > >> -----Original Message-----
> > >> From: Benoit Bezaire [mailto:benoit@itedo.com]
> > >> Sent: Tuesday, October 11, 2005 5:07 PM
> > >> To: cgmo-webcgm@lists.oasis-open.org
> > >> Subject: Re[4]: [cgmo-webcgm] implications of URI vs. IRI
> > >>
> > >> Hi Dieter,
> > >>
> > >> Thanks for the example, we are talking about the same thing.
> > >>
> > >> I understand that ATA and WebCGM has allowed spaces in URI
> > fragments
> > >> for the last 10 years, but from my interpretation of
> > RFC2396; those
> > >> linkuris are illegal. Here is a quote from Section 4.1 of
> > >> http://www.ietf.org/rfc/rfc2396.txt
> > >> "The character restrictions described in Section 2 for URI
> > also apply
> > >> to the fragment in a URI-reference."
> > >>
> > >> And by reading Section 2, you end up reading that spaces are not
> > >> allowed.
> > >>
> > >> That being said, your interpretation of the SVG wording sounds
> > >> acceptable. The sentence 'or must result in a URI
> > reference after the
> > >> escaping procedure' seems to be saving us! I'm in favor of adding
> > >> wording to the spec to clarify this issue (the 3 bullet
> > wording would
> > >> be good also).
> > >>
> > >> I no longer have a preference if we should deprecate or not.
> > >> On one side, I think that this is a can of worms and
> > forcing escaping
> > >> simplifies things; on the other, I agree that long %HH for Asian
> > >> names is not ideal.
> > >>
> > >> Allowing both is probably the less painful approach for users and
> > >> implementers at this time.
> > >>
> > >> Regards,
> > >>
> > >> --
> > >>  Benoit   mailto:benoit@itedo.com
> > >>
> > >>
> > >> Tuesday, October 11, 2005, 10:15:06 AM, Dieter wrote:
> > >>
> > >> DW> Hi Benoit,
> > >>
> > >> DW> see inline
> > >>
> > >> >> -----Original Message-----
> > >> >> From: Benoit Bezaire [mailto:benoit@itedo.com]
> > >> >> Sent: Tuesday, October 11, 2005 3:48 PM
> > >> >> To: cgmo-webcgm@lists.oasis-open.org
> > >> >> Subject: Re[2]: [cgmo-webcgm] implications of URI vs. IRI
> > >> >>
> > >> >> Hi Dieter,
> > >> >>
> > >> >> You said:
> > >> >> NOTE: If we required an escaped string inside the CGM now,
> > >> this will
> > >> >> make almost all existing files invalid ones as soon as a
> > >> simple space
> > >> >> is in a name attribute.
> > >> >>
> > >> >> You are talking about the 'name' attribute within a URI
> > >> only, correct?
> > >> >> Or, let me rephrase...
> > >> >> Files which have a name attribute (containing a space)
> > >> that is used
> > >> >> in a URI become invalid, right?
> > >> DW> I am referring to the link destination parameter of a
> > >> linkuri attribute.
> > >> DW> Yes, something like (pseudo-code)
> > >>
> > >> DW> linkuri "http://www.cgmopen.org/abc.cgm#name(my name
> > with blank)"
> > >> DW> "some title" "_blank"
> > >>
> > >> DW> would become illegal, and this is the form (without
> > >> escaping) that
> > >> DW> has been used forever in the ATA and WebCGM environment
> > >> (almost 10 years now).
> > >>
> > >> >>
> > >> >> I would be in favor of deprecating (i.e., authors should stop
> > >> >> creating such files) the old behavior (no escaping) and
> > >> adding 'a la'
> > >> >> SVG wording to the spec. Like Dieter says, but with an
> > emphasis on
> > >> >> deprecating the old behavior.
> > >> DW> The way I understand the SVG wording is that both forms
> > >> would be legal:
> > >>
> > >> DW> http://www.cgmopen.org/abc.cgm#name(my name with blank)
> > >> DW> http://www.cgmopen.org/abc.cgm#name(my name%20with%20blank)
> > >>
> > >> DW> I would NOT deprecate the first form, because it would
> > >> force us to
> > >> DW> build long strings for japanese or similar characters,
> > >> following the
> > >> DW> rules as described below.
> > >>
> > >> DW> Do you read the SVG spec the same way, or am I wrong?
> > >>
> > >> DW> Regards,
> > >> DW> Dieter
> > >>
> > >> >>
> > >> >> --
> > >> >>  Benoit   mailto:benoit@itedo.com
> > >> >>
> > >> >>
> > >> >> Thursday, October 6, 2005, 7:52:43 AM, Dieter wrote:
> > >> >>
> > >> >> DW> All,
> > >> >>
> > >> >> DW> I am not yet convinced that we are heading in the right
> > >> >> direction here.
> > >> >>
> > >> >> DW> Example:
> > >> >> DW> Let's assume we have the string "nihon" inside a
> > >> linkUri: "id(ÈÕ±¾)"
> > >> >>
> > >> >> DW> using UTF-16 (big endian) this is: 65 e5 67 2c (4 Bytes)
> > >> >> converted
> > >> >> DW> to UTF-8: EF BB BF E6 97 A5 E6 9C AC (9 Bytes)
> > >> >>
> > >> >> DW> and then you can apply escaping for all non-ascii chars
> > >> >>
> > >> >> DW> %EF%BB%BF%E6%97%A5%E6%9C%AC (27 Bytes)
> > >> >>
> > >> >> DW> and now we store it into the linkURI attribute,
> > however, since
> > >> >> DW> somewhere else in the file we have this string in japanese
> > >> >> DW> characters as an ID, all non-graphical strings will be
> > >> stored as
> > >> >> DW> UTF-16 (could be
> > >> >> DW> UTF-8 as well):
> > >> >>
> > >> >> DW> I save the writing, you end up with 54 bytes.
> > >> >>
> > >> >> DW> So we are moving from 4 bytes to 54 bytes.
> > >> >>
> > >> >> DW> I hope that this accurately describes the procedure
> > >> that has been
> > >> >> DW> discussed over the past couple of days.
> > >> >>
> > >> >> DW> Comparison to SVG:
> > >> >> DW> In 5.3.2. [1], SVG says the following:
> > >> >>
> > >> >> DW> "The value of the href attribute must be a URI reference
> > >> >> as defined
> > >> >> DW> in [RFC2396], or must result in a URI reference after the
> > >> >> escaping
> > >> >> DW> procedure described below is applied. The procedure is
> > >> >> applied when
> > >> >> DW> passing the URI reference to a URI resolver."
> > >> >>
> > >> >> DW> Interesting to see the last sentence here. IMO this
> > >> means, it is
> > >> >> DW> perfectly legal to store the URI reference using any
> > >> encoding, as
> > >> >> DW> long as it will be transcoded to UTF-8 and escaped before
> > >> >> passing it on to a URI resolver.
> > >> >>
> > >> >> DW> This has always been my understanding, and this is how
> > >> all of our
> > >> >> DW> products have been handling references.
> > >> >>
> > >> >> DW> NOTE:
> > >> >> DW> If we required an escaped string inside the CGM now, this
> > >> >> will make
> > >> >> DW> almost all existing files invalid ones as soon as a
> > >> >> simple space is
> > >> >> DW> in a name attribute.
> > >> >>
> > >> >> DW> RECOMMENDATION:
> > >> >> DW> Amend wording slightly to match watch SVG is doing and
> > >> allow for
> > >> >> DW> both styles, escaped and not escaped.
> > >> >>
> > >> >> DW> Comments?
> > >> >>
> > >> >> DW> Regards,
> > >> >> DW> Dieter
> > >> >>
> > >> >>
> > >> >> DW> [1] http://www.w3.org/TR/SVG11/struct.html#xlinkRefAttrs
> > >> >>
> > >> >>
> > >> >> >> -----Original Message-----
> > >> >> >> From: Lofton Henderson [mailto:lofton@rockynet.com]
> > >> >> >> Sent: Wednesday, October 05, 2005 1:06 AM
> > >> >> >> To: Benoit Bezaire; cgmo-webcgm@lists.oasis-open.org
> > >> >> >> Subject: Re: [cgmo-webcgm] implications of URI vs. IRI
> > >> >> >>
> > >> >> >> At 05:09 PM 10/4/2005 -0400, Benoit Bezaire wrote:
> > >> >> >> >Hi Lofton,
> > >> >> >> >
> > >> >> >> >I just did a quick search... I think that URI is only
> > >> restricting
> > >> >> >> >characters to US-ASCII; it has no control on the
> > >> encoding (utf-8,
> > >> >> >> >utf-16 etc...).
> > >> >> >> >
> > >> >> >> >In XML syntax such as XHTML and SVG, files can have just
> > >> >> about any
> > >> >> >> >encoding; I'm not aware of any special processing for the
> > >> >> xlink:href
> > >> >> >> >attribute (i.e., this is a URI, change the encoding to
> > >> >> _blah_). It
> > >> >> >> >wouldn't make any sense. The scope of the encoding is for
> > >> >> >> the complete
> > >> >> >> >document.
> > >> >> >> >
> > >> >> >> >The above is not a fact, only my understanding.
> > >> >> >>
> > >> >> >> It matches my understanding.  And it is clear that XML
> > >> and/or URI
> > >> >> >> (rfc3986) require "URI escaping" for non-ASCII
> > >> characters in URIs,
> > >> >> >> i.e., for character that are outside of the ASCII
> > >> repertoire.  And
> > >> >> >> this is independent of the character-set encoding of the URI.
> > >> >> >>
> > >> >> >> So finally, a URI from HTML into CGM containing a
> > >> >> reference-by-name
> > >> >> >> to "my object group" would be written like this:
> > >> >> >>
> > >> >> >> <a
> > >> >> >>
> > >> >>
> > >>
> > href="http://example.org/myCGM.cgm#name(my%20object%20group)">blah</a
> > >> >> >> >
> > >> >> >>
> > >> >> >> and a WebCGM 'linkuri' first parameter would be this:
> > >> >> >>
> > >> >> >> http://example.org/myCGM.cgm#name(my%20object%20group)
> > >> >> >>
> > >> >> >> -Lofton.
> > >> >> >>
> > >> >> >>
> > >> >> >> >Tuesday, September 20, 2005, 2:45:48 PM, Lofton wrote:
> > >> >> >> >
> > >> >> >> >LH> All --
> > >> >> >> >
> > >> >> >> >LH> When I was putting together first unicode tests,
> > >> Dieter also
> > >> >> >> >LH> supplied me with this nifty "advanced" test.  It gets
> > >> >> >> into Japanese
> > >> >> >> >LH> text for SF text like APS ids and names.
> > >> >> >> >
> > >> >> >> >LH> It highlights an interesting implication of our decision
> > >> >> >> to stick
> > >> >> >> >LH> with URI instead of switching to IRI.  URI encoding
> > >> >> >> requires that
> > >> >> >> >LH> any non-ASCII characters are included by the
> > "URI escaping
> > >> >> >> >LH> mechanism", see WebCGM
> > >> >> >> >3.1.1.4
> > >> >> >> >LH> [1], and the more detailed XML description [2].
> > >> >> >> Basically, get the
> > >> >> >> >LH> **UTF8** representation of the characters, and replace
> > >> >> >> each byte in
> > >> >> >> >LH> that representation by the 3-character string %HH, where
> > >> >> >> HH is the
> > >> >> >> >LH> hex representation of the byte.
> > >> >> >> >
> > >> >> >> >LH> So suppose consider for example the 2-character id of
> > >> >> >> the object in
> > >> >> >> >LH> the upper-left box, and its use in a link from the
> > >> >> object in the
> > >> >> >> >upper-right box.
> > >> >> >> >
> > >> >> >> >LH> If that id were the two characters c1c2, lets suppose
> > >> >> >> that it could
> > >> >> >> >LH> be represented by the 4 utf8 bytes b1b2b3b4 (I'm just
> > >> >> guessing
> > >> >> >> >LH> about "4", since UTF8 is variable length, it could be
> > >> >> >> more).  Then
> > >> >> >> >LH> to put that id
> > >> >> >> >into
> > >> >> >> >LH> a URI string, it would have to be the
> > 12-character string:
> > >> >> >> >
> > >> >> >> >LH> %hh%hh%hh%hh
> > >> >> >> >
> > >> >> >> >LH> where the hh are the are the 4 pairs of hex digits that
> > >> >> >> represent
> > >> >> >> >LH> the 4
> > >> >> >> >LH> utf16 bytes. I.e., the CGM URI for the link would be:
> > >> >> >> >
> > >> >> >> >LH> #id(%hh%hh%hh%hh, view_context)
> > >> >> >> >
> > >> >> >> >LH> Side question.  Does URI (rfc3986 [3]) restrict only the
> > >> >> >> character
> > >> >> >> >LH> repertoire of the URI, or does it restrict also the
> > >> >> >> encoding? I.e.,
> > >> >> >> >LH> can a URI be encoded in ascii, isoLatin1, or utf8, or
> > >> >> utf16, or
> > >> >> >> >LH> whatever, as
> > >> >> >> >long
> > >> >> >> >LH> as it restricts its repertoire to the URI repertoire?
> > >> >>  I suspect
> > >> >> >> >"yes", but
> > >> >> >> >LH> I don't know the answer.  It would be interesting for
> > >> >> someone to
> > >> >> >> >research it.
> > >> >> >> >
> > >> >> >> >LH> Thoughts?
> > >> >> >> >
> > >> >> >> >LH> Regards,
> > >> >> >> >LH> -Lofton.
> > >> >> >> >
> > >> >> >> >LH> [0]
> > >> >> >> >LH>
> > >> >> >>
> > >> >>
> > >> http://docs.oasis-open.org/webcgm/v2.0/WebCGM20-IC.html#webcgm_3_1_
> > >> >> >> >LH> 1_4 [1]
> > >> >> >> >LH>
> > >> >> http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent
> > >> >> >> >LH> [3] URI:  http://www.ietf.org/rfc/rfc3986.txt
> > >> >> >> >LH> [4] IRI:  http://www.ietf.org/rfc/rfc3987.txt
> >
> >
> >
> >




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]