opendocument-users message

Subject: Re: [opendocument-users] extracting the text from an opendocument file

From: robert_weir@us.ibm.com
To: opendocument-users@lists.oasis-open.org
Date: Mon, 11 Jan 2010 10:00:06 -0500

Vincenzo Morgante <enzom83@yahoo.it> wrote on 01/08/2010 06:39:15 PM:


> 
> [opendocument-users] extracting the text from an opendocument file
> 
> Hi,
> I'm developing a java class which have to be able in reading an 
> OpenDocument text file (with odt extension) in order to extract all 
> the text contained in it.
> Some years ago I made a VB.NET library in following OpenDocument 1.0
> specifications. Now this library works still fine, but I'd like to 
> be sure that not be substantial changes in the newer versions of the
> standard (1.1 and 1.2).
> Could I follow the old OpenDocument 1.0 specifications without any 
> problems or would it be expedient to follow the newer specifications?
> In other words, if I follow the old OpenDocument 1.0 specifications,
> could I fall into problems in reading a file of the newer versions 
> with regard to the text extraction?
> 


Tt depends on what you mean by "the text" in a document.  Although the 
general text model remains the same through ODF 1.1 and 1.2, there are 
some enhancements.  For example, ODF 1.1 gives the ability to add an 
alternative text description to an OLE embedding.  Similarly, ODF 1.2 
enhances the metadata model.  I'm not sure if these are relevant to your 
text extraction problem, but those are the kinds of changes I would 
expect. 

In any case, if you look at the back of the ODF 1.1 spec, and the end of 
the draft of ODF 1.2 part I, you'll see a summary list of changes in each 
revision.  You can decide based on that whether any are substantial for 
your purposes. 

Regards,

-Rob

References:
- extracting the text from an opendocument file
  - From: Vincenzo Morgante <enzom83@yahoo.it>