[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Proof of Concept -- We can suck structured data out from ODF comment listarchives
When we discussed the quantity of recommendations that might come from our Call for Proposals for ODF-Next, one consideration was how we will handle all the incoming data. Manually transcribing data into a spreadsheet, as we do now did not sound fun. And setting up our own web form for comment submissions wasn't feasible, because it does not accord with OASIS IPR rules, which require public comments to come through the list. So, I wrote a Python script that goes through list archives and dumps out the URL to each post, the author, the subject of the post, and the date/time of the post. I've tested against the office and office-comments list, though it should work for any OASIS list archive. The only complication was a slight change in page structure that occurred back in January 2003, but I was able to conditionalize some of the logic to handle it both ways. Here is an example dump for the office-comment list: Although the output format here is less than inspiring, it would be easy to make the script output to an ODF spreadsheet file directly, or to a CSV file suitable for importing into JIRA. So I think we're good to go in that department. Regards, -Rob
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]