OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

oiic-formation-discuss message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [oiic-formation-discuss] Summary and Focus?





--- On Fri, 6/20/08, jose lorenzo <hozelda@yahoo.com> wrote:

> From: jose lorenzo <hozelda@yahoo.com>
> Subject: Re: [oiic-formation-discuss] Summary and Focus?
> To: oiic-formation-discuss@lists.oasis-open.org, "Dave Pawson" <dave.pawson@gmail.com>
> Date: Friday, June 20, 2008, 10:07 PM
> --- On Fri, 6/20/08, Dave Pawson
> <dave.pawson@gmail.com> wrote:
> 
> > > There may be some others (I need to go through
> the
> > list traffic again), but
> > > does that list give you a better idea?
> > 
> > If I make time today I'll trawl the 350 emails to
> do
> > that too.
> > Pity the archives aren't retrievable as a text
> file
> > (are they?)
> >
> http://lists.oasis-open.org/archives/oiic-formation-discuss/
> > 
> 
> Hey, I sort of figured I might do something like that
> eventually, but since now someone else requests it.. I
> wrote a script to more or less give you your text archive.
> It should run on most Linux. [I use PCLOS2007]. You'll
> need perl and wget.
> 
> At the command line copy/paste the following, and when
> done, go all the way into the newly created directory tree
> (named after the date) and open the file named
> [date].allmsg.txt. For example, if you run this tonight,
> you will end up with 20080620.allmsg.txt at approx 1.3 MB.
> 
> Everything is inside subshell "(" ")"
> so that you don't mess up the environment and end up in
> the orig dir.
> 
> 
(export day3424532; day3424532=`date "+%F"`;  rm -rf tempoiic-"$day3424532"; mkdir tempoiic-"$day3424532" && cd tempoiic-"$day3424532" && wget -r -l 1 -A "msg*" http://lists.oasis-open.org/archives/oiic-formation-discuss/200806/maillist.html && cd lists.oasis-open.org/archives/oiic-formation-discuss/200806 && cat msg*.html | perl -e '$/=undef ;while (<>) {s,<p><em>Subject</em>: <b>(.+?)</b></p>,3457345634457Subject: $1,g; print}' | perl -e '$/=undef ;while (<>) { s,<li><em>From</em>: <b>(.+?)</b></li>,3457345634457From: $1,g; print}'| perl -e '$/=undef ;while (<>) { s,<li><em>To</em>:(.+?)</li>,3457345634457To: $1,g; print}' | perl -e '$/=undef ;while (<>) { s,<li><em>Date</em>:(.+?)</li>,3457345634457Date: $1,g; print}'| perl -e '$pre=0;$prenew=0;print "\n***********************************************\n***********************************************\n"; while (<>) {if (/(<pre>)|(3457345634457)/) {if ($1) {$pre=1; $prenew=1; print"****************\n"} }
if ($2) {s,3457345634457,,; print; next} if (m,</pre>,) {$pre=0; print "\n***********************************************\n***********************************************\n"} if ($pre and !$prenew) {print}; if ($prenew) {$prenew=0}}'  > 20080620.allmsg.txt)
>
...
> .. also, they would be from most recent to least);

Ooops. From *oldest* to *most recent* is the order.. so the first message at the top of the text file will be Mary's test msg00001. Actually there is a little bug there in that this first message (as per the html on the website) has no "pre" section to capture that email's body text. Thus it appears empty within [date].allmsg.txt.. In general, the script above works such that any email that doesn't have a "pre" section will appear empty (the header info will blend into the header of the next email in line).

There are probably a bunch more mistakes.. also, it can definitely be cleaned up more.

It should still be at least a bit useful.



      


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]