OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

oiic-formation-discuss message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [oiic-formation-discuss] Summary and Focus?

--- On Fri, 6/20/08, Dave Pawson <dave.pawson@gmail.com> wrote:

> > There may be some others (I need to go through the
> list traffic again), but
> > does that list give you a better idea?
> If I make time today I'll trawl the 350 emails to do
> that too.
> Pity the archives aren't retrievable as a text file
> (are they?)
> http://lists.oasis-open.org/archives/oiic-formation-discuss/

Hey, I sort of figured I might do something like that eventually, but since now someone else requests it.. I wrote a script to more or less give you your text archive. It should run on most Linux. [I use PCLOS2007]. You'll need perl and wget.

At the command line copy/paste the following, and when done, go all the way into the newly created directory tree (named after the date) and open the file named [date].allmsg.txt. For example, if you run this tonight, you will end up with 20080620.allmsg.txt at approx 1.3 MB.

Everything is inside subshell "(" ")" so that you don't mess up the environment and end up in the orig dir.

(export day3424532; day3424532=`date "+%F"`;  rm -rf tempoiic-"$day3424532"; mkdir tempoiic-"$day3424532" && cd tempoiic-"$day3424532" && wget -r -l 1 -A "msg*" http://lists.oasis-open.org/archives/oiic-formation-discuss/200806/maillist.html && cd lists.oasis-open.org/archives/oiic-formation-discuss/200806 && cat msg*.html | perl -e '$/=undef ;while (<>) {s,<p><em>Subject</em>: <b>(.+?)</b></p>,3457345634457Subject: $1,g; print}' | perl -e '$/=undef ;while (<>) { s,<li><em>From</em>: <b>(.+?)</b></li>,3457345634457From: $1,g; print}'| perl -e '$/=undef ;while (<>) { s,<li><em>To</em>:(.+?)</li>,3457345634457To: $1,g; print}' | perl -e '$/=undef ;while (<>) { s,<li><em>Date</em>:(.+?)</li>,3457345634457Date: $1,g; print}'| perl -e '$pre=0;$prenew=0;print "\n***********************************************\n***********************************************\n"; while (<>) {if (/(<pre>)|(3457345634457)/) {if ($1) {$pre=1; $prenew=1; print"****************\n"} }
 if ($2) {s,3457345634457,,; print; next} if (m,</pre>,) {$pre=0; print "\n***********************************************\n***********************************************\n"} if ($pre and !$prenew) {print}; if ($prenew) {$prenew=0}}'  > 20080620.allmsg.txt)

I don't have time to finish it off tonight, but it needs a few little things done (eg, "wget -c" might be much better so that you can update whenever you want without doing a full download.. in which case, you may not care about the date).

Overview: makes a tmp dir; wgets all emails from the website; pipes all the emails into a couple perl scripts (the emails should be in order by date but that depends on "cat".. if this is not right, it can be fixed later.. also, they would be from most recent to least); first perl script just cleans up a few lines to get Subject:... From:... Date: .. To: .. without the html markup; last perl script removes everything except these header lines just mentioned and the actual email text (found within "pre" tags); the output, which is basically text, goes into the file named at the top: [date].allmsg.txt.

Despite looking a bit ugly, it's fairly straightforward once you lookup the man pages (if want) and assuming you know some perl (regexp). The logic inside the perl is just a bit for basic bookkeeping.

Ideally, I'd use an xml stream processor, but I have no experience using any, so it would have taken me longer.

Yes, it can be improved, but I don't have any more time tonight.

Thanks for the request, since now I can use it too. ;-) .. just needed the right motivation.

Also, the irc #iic or whatever channel may also be a good idea. I don't have much experience, but it seemed I was able to create the channel/chatroom just by typing in the name (freenode and using xchat I think).

time for bed.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]