OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook] dbl1mn.ent: "1058" is not a character ...

On 02/25/2016 08:04 PM, Matthias Apitz wrote:
> Hello,
> I'm new to this list and have zero knowledge about DocBook at all. 

Welcome aboard.

> I subscribed because I'm a maintainer of a port in FreeBSD which compiles
> some piece of software using DocBook for the manuals. See:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=207299
> for more background.

From Comment #9 on that page:
>> The files for the doc/manpages (*.pod) and for the doc/manual
>> (.sgml) are broken and should be fixed better "upstream", i.e. in
>> the software itself.

This is worrying. It's been a VERY long time since I touched any of the
Linux documentation. I did warn them at that time that they would need
to migrate from SGML to XML, but that made me _persona non grata_ :-)

> The make process spills out a lot of error messages, like:
> docbook2ps -d ../stylesheet.dsl manual-en-sed.sgml
> Using catalogs: /usr/local/share/sgml/catalog
> Using stylesheet:
> /usr/ports/print/muttprint/work/muttprint-0.73/doc/manual/en/../stylesheet.dsl
> Working on:
> /usr/ports/print/muttprint/work/muttprint-0.73/doc/manual/en/manual-en-sed.sgml
> jade:/usr/local/share/sgml/docbook/dsssl/modular/print/../common/../common/dbl1mn.ent:8:28:E:
> "1058" is not a character number in the document character set

If this reflects what I think it does, it's still in SGML, not XML.

> If one looks into the referenced file dbl1mn.ent it looks like this:
> <?xml version="1.0" encoding="US-ASCII"?>
> <!-- This file is generated automatically. -->
> <!-- Do not edit this file by hand! -->
> <!-- See http://docbook.sourceforge.net/ -->
> <!-- To update this file: edit the corresponding document at -->
> <!-- http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/docbook/gentext/locale/ -->
> <!ENTITY Abstract        "&#1058;&#1086;&#1074;&#1095; &#1072;&#1075;&#1091;&#1091;&#1083;&#1075;&#1072;">
> <!ENTITY abstract        "&#1090;&#1086;&#1074;&#1095; &#1072;&#1075;&#1091;&#1091;&#1083;&#1075;&#1072;">
> ...
> the Codepoints seems to be Russian language.

Indeed they are. The entity declarations are for Товч агуулга and товч
агуулга, which appears to be the Russian for an Abstract.

But they won't work in SGML without some very deep surgery on the SGML
Declaration (for DocBook, presumably). Maybe someone has done this and
forgotten to ship it with the toolkit.

> Questions:
> 1) What is this file good for?

Right now, nothing, unless you process the whole job with XML, not SGML.

> 2) What is the reason of the errors, a conflict between
>    encoding="US-ASCII" and the Unicode Codepoints?

SGML only supports the character ranges specified in the SGML
Declaration for the application you are using (here, DocBook).
The reference concrete declaration (in effect, a kind of default) only
defines ASCII, if I remember right, and others may extend this to
ISO-8859*...in any event, there is no real support for multibyte
characters, which is what you have here. SGML simply cannot recognise them.

XML has no problem with them. And this is an XML entity file, but it
appears that you are using an SGML processor.

> 3) How this should be fixed or at least suppressed, because the
>    generated PDF (...) is looking fine?

If the documentation is not in Russian, just omit this file. That means
finding out where in the master document (manual-en-sed.sgml?) the file
dbl1mn.ent is referred to, and deleting that line, or commenting it out.
Or just edit is and put comment markup around the entity declarations.

Can you get the people running the show to move everything to XML?


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]