[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: DOCBOOK-APPS: Bad Continuation of Multi-Byte UTF-8 Sequence
First of all, apologies for my misunderstanding. But in a way I'm glad because it let you expand your ideas to state: > [...] I can see the day when the encoding will need to change > within a single file. I have such a file. It's name is mbox. Not in XML, but the biggest problem with having a file in multiple encodings that I can see is not being able to grep and/or edit it easily. If such a day comes as you suggest, tools will have to be revised to deal with it better. (Yes, mail clients do deal with this particular issue very well.) I've considered using DocBook for multiple languages, where one document contains various languages. I can easily see this as being a case where multiple encodings would be necessary. (No, I haven't gotten it to work as getting the right combination of tags with lang="xx" with what the DTD allows for children isn't easy. SmartDoc was designed to handle this case better.) Nonetheless, isn't this where Unicode comes in to save the day? (I know about the faults in Unicode as some friends have to use more common gliphs for their names when registering with Unicode based software.) If one has the gliphs, typing in multiple languages (each normally with multiple encodings) becomes possible in a single file. I'm curious as to why you would prefer to use multiple encodings in a single file over UTF-8. Or am I misinterpreting your statement again? By the way, before I started to "get" Unicode, I also wanted to be able to specify multiple encodings in a given file. I don't like that some friends have gliphless names, but when everthing is converted to and processed in Unicode anyway, why fight it on the input file side? >> [Case of Shift_JIS encoded XML with EUC-JP encoded XSL(T) snipped] > > Fair judgement, with the case you state. I'm presuming that multiple > encoding fragments will become a norm rather than an exception. I guess > processors will gradually align as code becomes more available. Actually, while it is reasonable to have, for example, a Japanese based XSL set for dealing with the DocBook DTD in one of the major encodings, it makes more sence to have one encoding decided on for a given project, and use that encoding throughout the project. And I think that for projects developed in environments with a single language and multiple possible encodings, deciding on a single encoding to use is more the norm. (That reminds me, I need to fill out a bug report to have a font specification for the bullets. TM, Circle-R, Circle-C, and a few others cause errors using a Japanese Font-Family [the gliphs don't exist in them] with FOP. I'll try to do that today.) Where the fragmentation is more likely to take place is in database storage. Accessing multiple data sources may very well produce XML trees in different encodings. But there, too, a standard (UTF-8?) will most likely become the standard encoding. (Gee, I say "most likely" a lot. Am I that unsure? ;-) Thank you for the interesting ideas for a Monday morning to get the grey cells working. -- Michael Westbay Work: Beacon-IT http://www.beacon-it.co.jp/ Home: http://www.seaple.icc.ne.jp/~westbay Commentary: http://www.japanesebaseball.com/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC