OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

regrep-query message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: ContentBasedQuery questions/ideas.



Matt,

Below I retain just two portions of your previous messages. After that I 
ask another question.


> >   a) An interface for a Client to tell a Registry how to "index" a
> > document. Let's call this an Index Creation Request.
>
>Yes, although I would like to clarify: the index creation request happens
>once, and required parameters would be mime type and handler.
>
>e.g.
>
><IndexCreationRequest
>         mimeType="application/x-pdf"
>         handler="vendorRecognizedString" />
>
>Ideally, the IndexCreationRequest would just signal the registry to begin
>creating the index, and not give any instructions on how to do so.  The
>handlers would hopefully be defined through server configuration so that we
>don't start having specific language issues poke their ugly heads (for
>instance, handler="com.xmlglobal.ebxml.registry.handlers.Pdf")  I suggest
>having the handler attribute just in case the vendor wants to offer
>alternative indexing methods for any given mime type.

-------------- Break to previous Word example ---------------


> > >
> > ><ContentBasedQuery>
> > >         <Type mime="application/msword" name="MS Word">
> > >                 <Mapping occurence="1" path="/Document/Wordcount"
> > > label="Word Count" />
> > >                 <Mapping occurence="*" path="/Document/h1" label="Level
> > > one heading" />
> > >                 ...
> > >         </Type>
> > >         <Type />
> > >         <Type />
> > ></ContentBasedQuery>
> > >
> > >The neat thing about this is that users could specify an arbitrary mime
> > > type when submitting an object to the registry, and have a custom handler
> > > deal with its indexing and queries for content based queries.

-----------------------------------------------------------

QUESTION-1

If the <IndexCreationRequest> only has two input parameters, then the 
Client doesn't have very much control over how heavily the individual input 
repository item gets indexed. The Repository would have to provide choices 
for how much indexing is done. In your Word example, there might be several 
ContentHandlers for Word available:

   HeavyWordHandler   -- indexes all headings level 1 to 4, indexes size,
                          indexes all words in text paragraphs, etc.

   LightweightWordHandler --  indexes level 1 headings only.

   MiddleweightWordHandler -- indexes level 1 headings and words in Index.

Would the <IndexCreationRequest> be attached to every submission of a Word 
document, or would a SubmittingOrganization make one <IndexCreationRequest> 
that applied to all subsequent submissions by that SO? Or could some 
ResponsibleOrganization make one <IndexCreationRequest> that applied to all 
submissions where that organization was identified as the RO?


QUESTION-2

Would it make sense to register the ContentHandlers in the Registry? If so, 
then a Client could issue a query to the Registry to find out what "Slot 
names" and "Slot name data types" to use for retrieving documents indexed 
by that handler. One advantage of what's in Appendix D of ebRS is that we 
already have a mechanism for defining and handling classifications. If we 
use Slots for storing the index fields we'll have to invent a way to convey 
that same kind of information to users.

-- Len




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC