opendocument-users message

Subject: Re: [opendocument-users] extracting the text from an opendocumentfile

From: Chris Puttick <chris.puttick@thehumanjourney.net>
To: Vincenzo Morgante <enzom83@yahoo.it>
Date: Sat, 9 Jan 2010 07:49:43 +0000 (GMT)

Hi Vincenzo

AFAIK all text is contained within <office:text> tags in the "content.xml" file within the ODF container, regardless of version of ODF, and if you parse everything between > and < within the <office:txt> tags I think you'd get all displayed content. You might also want the metadata therein also, depending on your need, which I think should all be within meta.xml.

HTH

Chris

----- "Vincenzo Morgante" <enzom83@yahoo.it> wrote:

> Hi,
> I'm developing a java class which have to be able in reading an
> OpenDocument text file (with odt extension) in order to extract all
> the text contained in it.
> Some years ago I made a VB.NET library in following OpenDocument 1.0
> specifications. Now this library works still fine, but I'd like to be
> sure that not be substantial changes in the newer versions of the
> standard (1.1 and 1.2).
> Could I follow the old OpenDocument 1.0 specifications without any
> problems or would it be expedient to follow the newer specifications?
> In other words, if I follow the old OpenDocument 1.0 specifications,
> could I fall into problems in reading a file of the newer versions
> with regard to the text extraction?
> 
> Thanks a lot!
> 
> Vincenzo


------
Files attached to this email may be in ISO 26300 format (OASIS Open Document Format). If you have difficulty opening them, please visit http://iso26300.info for more information.