OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

topicmaps-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: [xtm-wg] Sketch of a Possible Algorithm for Fragment Grabbing


Hi,

Here is a very rough sketch of a possible algorithm for grabbing fragments
of TMs n' stuff over the web. It's bound to be missing stuff -- I don't
claim to be any sort of expert with this.

Assume that there is a client machine that makes an initial topic name
request to a search server.

1. Client issues request to search server, e.g. the string "Flying Boats"
2. Two cases:
a) Either the search server uses a topic map in which case the string is
appended to an XPath style query, and the retrieval crawler starts with this
on the server's TM.
b) The search server is using a traditional compressed inverted index. 
b.1) take each href found in the trad index and crawl for TMs by reading
MIME headers and looking for clues in xml docs.
b.2) Meanwhile construct XPath style query and hold in memory
b.3) For each TM found, set the crawler off with the XPath query.

3. A "Flying Boats" topic is located by the crawler
4. The <topic>...</topic> string is determined and copied, and the assigned
characteristics of the "Flying Boats" topic analysed.
5. Topic analysis stages:
a) Get list of scopes, then for each scope, names in scope, occrs in scope.
b) (Hoping these are at the top or bottom of the doc, but not scattered) Get
any addthms and store.
c) Foreach href in the occrs within each scope request the page/object at
that location
d) Read MIME type in HTTP header
e) If text and xml scan for clues to TM-ness
f) If TM store in TMList
g) Store anything else into a bag under the relevant scope
h) Assuming a pre-specified crawl depth/distance:
h.1) Foreach TM in TMList:
          Foreach topic in TM:
               Return to (3) and repeat steps to here.

cheers
Peter

------------------------------------------------------------------------
Create professional forms and interactive web pages in less time 
with Mozquito(tm) technology.
Form the Web today - visit:
http://click.egroups.com/1/6342/4/_/337252/_/964002010/
------------------------------------------------------------------------

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC