[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [oiic-formation-discuss] Summary and Focus?
--- On Fri, 6/20/08, jose lorenzo <hozelda@yahoo.com> wrote: > From: jose lorenzo <hozelda@yahoo.com> > Subject: Re: [oiic-formation-discuss] Summary and Focus? > To: oiic-formation-discuss@lists.oasis-open.org, "Dave Pawson" <dave.pawson@gmail.com> > Date: Friday, June 20, 2008, 10:07 PM > --- On Fri, 6/20/08, Dave Pawson > <dave.pawson@gmail.com> wrote: > > > > There may be some others (I need to go through > the > > list traffic again), but > > > does that list give you a better idea? > > > > If I make time today I'll trawl the 350 emails to > do > > that too. > > Pity the archives aren't retrievable as a text > file > > (are they?) > > > http://lists.oasis-open.org/archives/oiic-formation-discuss/ > > > > Hey, I sort of figured I might do something like that > eventually, but since now someone else requests it.. I > wrote a script to more or less give you your text archive. > It should run on most Linux. [I use PCLOS2007]. You'll > need perl and wget. > > At the command line copy/paste the following, and when > done, go all the way into the newly created directory tree > (named after the date) and open the file named > [date].allmsg.txt. For example, if you run this tonight, > you will end up with 20080620.allmsg.txt at approx 1.3 MB. > > Everything is inside subshell "(" ")" > so that you don't mess up the environment and end up in > the orig dir. > > (export day3424532; day3424532=`date "+%F"`; rm -rf tempoiic-"$day3424532"; mkdir tempoiic-"$day3424532" && cd tempoiic-"$day3424532" && wget -r -l 1 -A "msg*" http://lists.oasis-open.org/archives/oiic-formation-discuss/200806/maillist.html && cd lists.oasis-open.org/archives/oiic-formation-discuss/200806 && cat msg*.html | perl -e '$/=undef ;while (<>) {s,<p><em>Subject</em>: <b>(.+?)</b></p>,3457345634457Subject: $1,g; print}' | perl -e '$/=undef ;while (<>) { s,<li><em>From</em>: <b>(.+?)</b></li>,3457345634457From: $1,g; print}'| perl -e '$/=undef ;while (<>) { s,<li><em>To</em>:(.+?)</li>,3457345634457To: $1,g; print}' | perl -e '$/=undef ;while (<>) { s,<li><em>Date</em>:(.+?)</li>,3457345634457Date: $1,g; print}'| perl -e '$pre=0;$prenew=0;print "\n***********************************************\n***********************************************\n"; while (<>) {if (/(<pre>)|(3457345634457)/) {if ($1) {$pre=1; $prenew=1; print"****************\n"} } if ($2) {s,3457345634457,,; print; next} if (m,</pre>,) {$pre=0; print "\n***********************************************\n***********************************************\n"} if ($pre and !$prenew) {print}; if ($prenew) {$prenew=0}}' > 20080620.allmsg.txt) > ... > .. also, they would be from most recent to least); Ooops. From *oldest* to *most recent* is the order.. so the first message at the top of the text file will be Mary's test msg00001. Actually there is a little bug there in that this first message (as per the html on the website) has no "pre" section to capture that email's body text. Thus it appears empty within [date].allmsg.txt.. In general, the script above works such that any email that doesn't have a "pre" section will appear empty (the header info will blend into the header of the next email in line). There are probably a bunch more mistakes.. also, it can definitely be cleaned up more. It should still be at least a bit useful.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]