OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: DOCBOOK-APPS: xmllint and &


Daniel Veillard writes:
> On Tue, Dec 17, 2002 at 10:58:04AM -0500, Jeff Beal wrote:
> > I'm getting the following error when parsing my documentation with xmllint:
> [...]
> > When I edit my local copy of the DocBook DTD and remove the following line
> > from the iso-num.ent file, everything works:
> > <!ENTITY amp "&#x0026;"> <!-- AMPERSAND -->
> >  
> > Any comments or suggestions on how to fix this without messing with the DTD?
> > I have, by the way, verified that xmllint is reading the other character
> > entities just fine.  It seems only to be a problem with the &amp; entity.
> 
>   And I don't understand what's happening, no such problem on
> a smaller testcase:
> 
> paphio:~/XML -> cat tst.xml
> <?xml version="1.0" ?>
> <!DOCTYPE foobar SYSTEM "tst.dtd">
> <foobar></foobar>
> paphio:~/XML -> cat tst.dtd
> <!ENTITY amp "&#x0026;"> <!-- AMPERSAND -->
> paphio:~/XML -> xmllint --loaddtd --noout tst.xml
> paphio:~/XML ->
> 
>  and it's the first time I heard of such a problem.
> however I note that the DTDs installed on my system for DocBook have
> <!ENTITY amp    "&#38;#38;"> <!-- AMPERSAND -->
> instead in docbook/xml-dtd-4.2-1.0-14/ent/iso-num.ent
> but older version had the old style declaration but commented:
> 3.1.7/ent/iso-num.ent:
>   <!-- predeclared in XML <!ENTITY amp   "&#x0026;"--> <!-- AMPERSAND -->
> 
>   strange,
> 
There's nothing strange here.
It's just one of the reasons, why you don't like mixing SGML and XML 
applications on unix.

The reason why you don't see a problem in your test, is that you don't use
the entity. If you add a '&amp;' and use xmllint --loaddtd you will
get the error. So your test case is a bit too small.

XML *requires* amp to be declared as 
<!ENTITY amp    "&#38;#38;"> 
(or &#x26;#x26; if one prefers hex codes)
See section 4.6 of the xml spec.

I think the reason is, that reading the entity declaration makes
&#38; from &#38;#38;, which is read again when the entity is used giving
&. If you just declare it as &#38; reading the entity declaration gives
& and when the entity is used a single '&' is found.

Similar arguments apply to &lt; which must be declared as "&#38;#60;"
an not just &#60;.

For SGML &#38; or &#x0026; for amp is ok. But SGML even acepts 'abc & def'
in PCDATA.

So the answer to the initial question is, no, this cannot be fixed without
changing the DTD since it's the DTD that is broken.

The only thing one might consider in libxml is a warning whenever a 
predefined entity is defined in a way differing from what the xml spec
requires.
The spec says (again section 4.6):
 
... If the entities in question are declared, they must be declared as 
internal entities whose replacement text is the single character being 
escaped or a character reference to that character, as shown below. ...

greetings
	Morus


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC