OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] docbook sgml Java parsers ?

On 7 févr. 07, at 17:10, Markus Hoenicka wrote:

> Jean-Christophe Helary <fusion@mx6.tiki.ne.jp> was heard to say:
>> Are there any DocBook sgml parsers in Java, possible GPL compatible ?
> If you're looking for a general-purpose SGML parser implemented in  
> Java, the
> answer is probably no. The Cover pages
> (http://xml.coverpages.org/publicSW.html#parsers) do not list one.  
> The page was
> last updated in 2002, but I consider it unlikely that someone wrote  
> a SGML
> parser at a time when everyone was busy doing XML.
> However, if you're looking for a way to parse DocBook documents in  
> Java one way
> or another, consider transforming the SGML documents to XML (e.g.  
> by means of
> osx). You could then use any Java XML parser.


Thank you very much for your reply.

The application that would use this parser is OmegaT, a Java computer  
aided translation tool. It basically gets the translatable contents  
from the source texts (in the supported formats, which include  
DocBook xml) and displays them for translation. The translator types  
the translation and the application builds a translated text by  
filling the file "skeleton" with the new contents.

We used to have a very loose parser for HTML, but then, when we  
decided to support XML formats like DocBook, ODF, XHTML etc we were  
faced with the necessity to use a proper parser. So now we have  
support for a variety of XML based formats and that is not a problem.

But when DocBook  sgml is used as source file, the XML parser does  
not accept it.

Also, it is important to have the file skeleton correctly memorized  
so that the target file is also correct DocBook sgml, so converting  
sgml to xml is not an option. We need to be able to output the  
original parsed sgml

Not being a programmer myself (I am only on this list because I am  
writting the user manual in DocBook) what I am writing may not  
exactly reflect the technical issues we are facing, except for the  
fact that currently we can't parse DocBook sgml.

Any pointer would be appreciated.

Jean-Christophe Helary

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]