[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [opendocument-users] extracting the text from an opendocumentfile
Hi Vincenzo AFAIK all text is contained within <office:text> tags in the "content.xml" file within the ODF container, regardless of version of ODF, and if you parse everything between > and < within the <office:txt> tags I think you'd get all displayed content. You might also want the metadata therein also, depending on your need, which I think should all be within meta.xml. HTH Chris ----- "Vincenzo Morgante" <enzom83@yahoo.it> wrote: > Hi, > I'm developing a java class which have to be able in reading an > OpenDocument text file (with odt extension) in order to extract all > the text contained in it. > Some years ago I made a VB.NET library in following OpenDocument 1.0 > specifications. Now this library works still fine, but I'd like to be > sure that not be substantial changes in the newer versions of the > standard (1.1 and 1.2). > Could I follow the old OpenDocument 1.0 specifications without any > problems or would it be expedient to follow the newer specifications? > In other words, if I follow the old OpenDocument 1.0 specifications, > could I fall into problems in reading a file of the newer versions > with regard to the text extraction? > > Thanks a lot! > > Vincenzo ------ Files attached to this email may be in ISO 26300 format (OASIS Open Document Format). If you have difficulty opening them, please visit http://iso26300.info for more information.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]