[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [relax-ng] Encoding declaration, MIME type
>> I certainly agree that we should use the media type when one is >> provided, but what about something like a "file:" or "ftp:" URL, where >> there is no media type? > > Some people believe that existing OSs should be revised so that they can > provide the charset parameter. That would be nice, but I don't think it's going to happen soon. The closest thing to this is to default to using the same encoding-detection approach as the system text editor for local files: i.e. on Windows 2000: - if there's a UTF-8 BOM, then it's UTF-8 - if there's a UTF-16 BOM, then it's UTF-16 - otherwise, it's the platform default encoding (windows-1252 or shift-jis or whatever) >> Here's a strawman proposal: >> >> 1. If you get the RNC as a MIME entity including information about the >> charset, then use that charset. Note that text/plain without a charset >> parameter is equivalent to "text/plain; charset=us-ascii". > > I am happy with this. By the way, if we stick to the HTTP RFC, the > default is ISO-8859-1. I certainly think that this default is > ridiculous. Doesn't the XML media type RFC use the standard default of US-ASCII (which I agree is ridiculous)? I guess the only advance of US-ASCII, is it makes it a little bit easier to reliably detect a missing charset parameter: the presence of any 8-bit byte tells you that something is wrong. >> 2. Otherwise, the RNC is in UTF-8 or UTF-16. If it has a UTF-16 BOM, >> it's UTF-16. Otherwise it's UTF-8. > > I can live with this. By the way, which UTF-8? With or without the > Unicode signature? Or, both? (Probably, both/) With or without. >> 3. A system may provide a way to allow a user to specify an alternative >> encoding for local files. > > Again, I can live with this. > >> 4. After converting the sequence of bytes to a sequence of characters, >> any initial BOM is discarded. > > Including the Unicode signature for UTF-8? Probably, yes. Including. Notepad on Windows 2000 puts a UTF-8 BOM automatically, and I don't want that to cause an error for RNC. > Non-ascii users will probably say that we should provide some in-band > encoding declarations. But I'm reluctant to do so. Me too. Imagine if every single programming language provided it's own different in-band encoding declaration. The result would be chaotic, and we would never see any better solution. > If we need a specialized media type for our compact syntax, I think that > application/vnd.oasis-open.rng with the charset parameter is probably > acceptable. Wouldn't "vnd.oasis-open.rnc" be preferable? Or maybe "vnd.oasis-open.relax-ng.rnc" (following the OASIS organizational structure)? It seems like there output to be a standard OASIS convention for MIME types in the vnd.oasis-open tree. What's the procedure for registration in the "vnd" tree? James
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC