[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [relax-ng] Encoding declaration, MIME type
--On 15 April 2002 06:29 -0400 John Cowan <jcowan@reutershealth.com> wrote: > James Clark scripsit: > >> However, "include" and >> "externalRef" seem to me to present a problem for this approach. They >> take URIs as arguments and dereferencing a URI gives me a entity >> containing a sequence of bytes. > > In fact dereferencing a URI is generally understood (though I cannot > lay my hands on specific textual authority at the moment) to return a > MIME entity-body; that is, a sequence of bytes plus a media type. > Thus "http://www.w3.org" when dereferenced returns not only > 21540 bytes (at present) but also the media type "text/html; > charset=us-ascii". I certainly agree that we should use the media type when one is provided, but what about something like a "file:" or "ftp:" URL, where there is no media type? Here's a strawman proposal: 1. If you get the RNC as a MIME entity including information about the charset, then use that charset. Note that text/plain without a charset parameter is equivalent to "text/plain; charset=us-ascii". 2. Otherwise, the RNC is in UTF-8 or UTF-16. If it has a UTF-16 BOM, it's UTF-16. Otherwise it's UTF-8. 3. A system may provide a way to allow a user to specify an alternative encoding for local files. 4. After converting the sequence of bytes to a sequence of characters, any initial BOM is discarded. (After this, the next stage is newline normalization, then \x interpretation.) James
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC