OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

opendocument-users message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [opendocument-users] extracting the text from an opendocumentfile


Hi Vincenzo

AFAIK all text is contained within <office:text> tags in the "content.xml" file within the ODF container, regardless of version of ODF, and if you parse everything between > and < within the <office:txt> tags I think you'd get all displayed content. You might also want the metadata therein also, depending on your need, which I think should all be within meta.xml.

HTH

Chris

----- "Vincenzo Morgante" <enzom83@yahoo.it> wrote:

> Hi,
> I'm developing a java class which have to be able in reading an
> OpenDocument text file (with odt extension) in order to extract all
> the text contained in it.
> Some years ago I made a VB.NET library in following OpenDocument 1.0
> specifications. Now this library works still fine, but I'd like to be
> sure that not be substantial changes in the newer versions of the
> standard (1.1 and 1.2).
> Could I follow the old OpenDocument 1.0 specifications without any
> problems or would it be expedient to follow the newer specifications?
> In other words, if I follow the old OpenDocument 1.0 specifications,
> could I fall into problems in reading a file of the newer versions
> with regard to the text extraction?
> 
> Thanks a lot!
> 
> Vincenzo


------
Files attached to this email may be in ISO 26300 format (OASIS Open Document Format). If you have difficulty opening them, please visit http://iso26300.info for more information.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]