[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook] invalid characters for ISO-8859-1 response
I see...I assumed the entity reference   was meant to be read by the browser in the xhtml output, not the internal xslt processor. I'll look into Saxon, but for now I think I'm going to have to customize en.xml to just use spaces instead of entity references. If I *did* want to use a reference for the browser only, would &#160; work? xhtml output =>   On 10/31/07, Bob Stayton <bobs@sagehill.net> wrote: > ----- Original Message ----- > From: "Anthony Ettinger" <anthony@chovy.com> > To: "Bob Stayton" <bobs@sagehill.net> > Cc: "Dave Pawson" <davep@dpawson.co.uk>; <docbook@lists.oasis-open.org> > Sent: Wednesday, October 31, 2007 1:09 PM > Subject: Re: [docbook] invalid characters for ISO-8859-1 response > > > > > > Sure, unicode makes sense...I could be missing something but I > > would've left entity references alone...I still don't see what is > > gained by converting Œ vs. just leaving it as Œ in the > > output...or simply leaving it as a space. > > > Ah, now I think I see what you are getting at. If you type   for a > non-breaking space, why doesn't it preserve that character as the string > " " in the output? The answer is that the input representation has no > direct connection to the output representation. > > When an input XML document is parsed into memory, all characters are > converted to Unicode internally, regardless of their initial > representation. There is no record in the loaded memory that the input was > " ", it is all Unicode in memory. After processing in memory, the XML > is output using a serializer whose job is to convert the Unicode strings > into an output string in some encoding. An encoding has to be chosen, and > it is not selected based on the input encoding (which is no longer known to > the processor). The default output encoding is UTF-8, but you can specify > any of several different encodings for the serializer to use. > > That said, one option you might look at is using Saxon instead of libxml2, > and use a Saxon extension to control how characters are represented in the > output. After all, even if your output encoding is UTF-8, you could still > output the six-character string " " for a non-breaking space instead > of the UTF-8 single hex character, and it would still be interpreted as a > non-breaking space. Saxon provides that choice. See: > > http://www.sagehill.net/docbookxsl/OutputEncoding.html#SaxonCharacter > > Bob Stayton > Sagehill Enterprises > DocBook Consulting > bobs@sagehill.net > > > > -- Anthony Ettinger Ph: 408-656-2473 var (bonita, farley) = new Dog; farley.barks("very loud"); bonita.barks("at strangers"); http://chovy.dyndns.org/resume/ http://utuxia.com/consulting
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]