OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: referincing entities in office.dtd


Dear All,

Since last year we have been working on an application for layout-driven
document structure extraction. We set out from PostScript input which we
import by a much refined version of the method described by Craig G.
Nevill-Manning, Todd Reed and Ian Witten in Software Practice and
Experience, 28(5), 481-491. We already get a very consistent parsable
output (machine-readable document image) from nearly any source, and
plan to further improve our extraction stage by tapping Adobe Font
Metric files and writing a custom PostScript Printer Definition.

After preclassifying the output we parse it using a library of layout
grammars for various logical elements. In other words we're mapping
layout instances to XML elements. We are trying to define a set of
common document classes.

Since we plan to make our code available on CPAN (the Comprehensive Perl
Archive Network), interoperability is an issue for us. We therefore took
a look at the Office.DTD which is incredibly detailed and complete but
just as overwhelming.

I would like to define the document class set as a DTD by directly
referencing entities in the Office.DTD. This seems trivial but since I
have hardly ever worked with DTD's (I have some experience with schemas)
this is taking more time than I actually have available.

Has anybody on this list done something like this already. Does it
really make sense pursuing a strict modular way? Or should I just pick
out the entities I need and make my own Doc_Class.DTD

I'll be glad for any hints,
best regards from Cologne,

-- 
Gustav Vella

Institut für Sprachliche Informationsverarbeitung, Universität zu Köln
(Department of Linguistic Data Processing, University of Cologne, Germany)



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]