regrep-query message

Subject: Re: ContentBasedQuery questions/ideas.

From: Matthew MacKenzie <matt@xmlglobal.com>
To: Len Gallagher <LGallagher@nist.gov>
Date: Thu, 06 Sep 2001 14:28:12 -0700

On Thursday 06 September 2001 12:59, Len Gallagher wrote:
>
> QUESTION-1
>
> If the <IndexCreationRequest> only has two input parameters, then the
> Client doesn't have very much control over how heavily the individual input
> repository item gets indexed. The Repository would have to provide choices
> for how much indexing is done. In your Word example, there might be several
> ContentHandlers for Word available:
>
>    HeavyWordHandler   -- indexes all headings level 1 to 4, indexes size,
>                           indexes all words in text paragraphs, etc.
>
>    LightweightWordHandler --  indexes level 1 headings only.
>
>    MiddleweightWordHandler -- indexes level 1 headings and words in Index.
>
> Would the <IndexCreationRequest> be attached to every submission of a Word
> document, or would a SubmittingOrganization make one <IndexCreationRequest>
> that applied to all subsequent submissions by that SO? Or could some
> ResponsibleOrganization make one <IndexCreationRequest> that applied to all
> submissions where that organization was identified as the RO?

My thinking was that we could expect that the mime type of the entry content 
would specify which indexing handler is used, for example, maybe one of my 
word documents has a mime type of application/msword.heavyindex.

Admittedly, this doesn't seem like a very elegant solution, so maybe some 
index attributes could be included at submission time to overide the default 
handler, e.g.

<ContentIndexParameters>
	<UseHandler>LightweightWordHandler</UseHandler>
	<ArgumentList>
		<Argument name="remove_smart_quotes" value="true" />
	</ArgumentList>
</ContentIndexParameters> 

> QUESTION-2
>
> Would it make sense to register the ContentHandlers in the Registry? If so,
> then a Client could issue a query to the Registry to find out what "Slot
> names" and "Slot name data types" to use for retrieving documents indexed
> by that handler. One advantage of what's in Appendix D of ebRS is that we
> already have a mechanism for defining and handling classifications. If we
> use Slots for storing the index fields we'll have to invent a way to convey
> that same kind of information to users.
>

Could you explain what you mean a little bit more here?  I would think that 
one could use SlotFilter to "prequalify" entries prior to testing them 
against the content query.  I humbly defer to you guys on this, as I am 
slightly behind the ball on Filter Query.

-- 
Matthew MacKenzie
XML Global

<quote>
I used to be an agnostic, but now I'm not so sure.
</quote>

References:
- Re: ContentBasedQuery questions/ideas.
  - From: Len Gallagher <LGallagher@nist.gov>
- Re: ContentBasedQuery questions/ideas.
  - From: Len Gallagher <LGallagher@nist.gov>