OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook] invalid characters for ISO-8859-1 response


I see...I assumed the entity reference   was meant to be read by
the browser in the xhtml output, not the internal xslt processor.

I'll look into Saxon, but for now I think I'm going to have to
customize en.xml to just use spaces instead of entity references.

If I *did* want to use a reference for the browser only, would
  work? xhtml output =>  






On 10/31/07, Bob Stayton <bobs@sagehill.net> wrote:
> ----- Original Message -----
> From: "Anthony Ettinger" <anthony@chovy.com>
> To: "Bob Stayton" <bobs@sagehill.net>
> Cc: "Dave Pawson" <davep@dpawson.co.uk>; <docbook@lists.oasis-open.org>
> Sent: Wednesday, October 31, 2007 1:09 PM
> Subject: Re: [docbook] invalid characters for ISO-8859-1 response
>
>
> >
> > Sure, unicode makes sense...I could be missing something but I
> > would've left entity references alone...I still don't see what is
> > gained by converting &#140; vs. just leaving it as &#140; in the
> > output...or simply leaving it as a space.
>
>
> Ah, now I think I see what you are getting at.  If you type &#160; for a
> non-breaking space, why doesn't it preserve that character as the string
> "&#160;" in the output?  The answer is that the input representation has no
> direct connection to the output representation.
>
> When an input XML document is parsed into memory, all characters are
> converted to Unicode internally, regardless of their initial
> representation.  There is no record in the loaded memory that the input was
> "&#160;", it is all Unicode in memory.  After processing in memory, the XML
> is output using a serializer whose job is to convert the Unicode strings
> into an output string in some encoding.  An encoding has to be chosen, and
> it is not selected based on the input encoding (which is no longer known to
> the processor).  The default output encoding is UTF-8, but you can specify
> any of several different encodings for the serializer to use.
>
> That said, one option you might look at is using Saxon instead of libxml2,
> and use a Saxon extension to control how characters are represented in the
> output.  After all, even if your output encoding is UTF-8, you could still
> output the six-character string "&#160;" for a non-breaking space instead
> of the UTF-8  single hex character, and it would still be interpreted as a
> non-breaking space.  Saxon provides that choice.  See:
>
> http://www.sagehill.net/docbookxsl/OutputEncoding.html#SaxonCharacter
>
> Bob Stayton
> Sagehill Enterprises
> DocBook Consulting
> bobs@sagehill.net
>
>
>
>


-- 
Anthony Ettinger
Ph: 408-656-2473
var (bonita, farley) = new Dog;
farley.barks("very loud");
bonita.barks("at strangers");

http://chovy.dyndns.org/resume/
http://utuxia.com/consulting


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]