OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: [relax-ng] Encoding declaration, MIME type




--On 15 April 2002 06:29 -0400 John Cowan <jcowan@reutershealth.com> wrote:

> James Clark scripsit:
>
>> However, "include" and
>> "externalRef" seem to me to present a problem for this approach.  They
>> take  URIs as arguments and dereferencing a URI gives me a entity
>> containing a  sequence of bytes.
>
> In fact dereferencing a URI is generally understood (though I cannot
> lay my hands on specific textual authority at the moment) to return a
> MIME entity-body; that is, a sequence of bytes plus a media type.
> Thus "http://www.w3.org"; when dereferenced returns not only
> 21540 bytes (at present) but also the media type "text/html;
> charset=us-ascii".

I certainly agree that we should use the media type when one is provided, 
but what about something like a "file:" or "ftp:" URL, where there is no 
media type?

Here's a strawman proposal:

1. If you get the RNC as a MIME entity including information about the 
charset, then use that charset.  Note that text/plain without a charset 
parameter is equivalent to "text/plain; charset=us-ascii".

2. Otherwise, the RNC is in UTF-8 or UTF-16.  If it has a UTF-16 BOM, it's 
UTF-16.  Otherwise it's UTF-8.

3. A system may provide a way to allow a user to specify an alternative 
encoding for local files.

4. After converting the sequence of bytes to a sequence of characters, any 
initial BOM is discarded.

(After this, the next stage is newline normalization, then \x 
interpretation.)

James


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC