OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

regrep-query message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: ContentBasedQuery questions/ideas.


Dan,

I've responded inline as well.

Cheers,

Matt

On Wednesday 05 September 2001 14:47, Dan Chang wrote:
<snipped />
>
> Team,
>
> I have started looking more closely at ContentBasedQuery, and have a few
> questions for those of you that may have been more closely involved with
> the
> registry specifications.  Please excuse my ignorance if I am off in left
> field, here are my observations and questions:
>
> 1.  Does ContentBasedQuery need to fall under the FilterQuery umbrella? (I
> think not, just checking.)
> ==> I think it should. Our focus is on the registry not the repository.
> That is,
> ==> a user is expected to query through the registry not directly on the
> repository.
> ==> Therefore, content-based query should supplement and be part of filter
> query, not be
> ==> used independently.

Fair enough.

>
> 2.  In the RS spec, appendix D talks about a syntax for defining
> "Classification Indexes".  I read this over and over and don't really see
> how
> these details relate to content based queries, they seem to relate more to
> defining how a registry implementer might  build side tables in their RDBMS
>
> so that a keyed query could take place (e.g. id-1 LIKE fo% AND id-2 LIKE
> %ar).  Does the ContentBasedQuery spec have to address the needs of the SQL
>
> implementer, or should it remain technology neutral with apendixes for
> implementation specific issues (if available)?
> ==> Our last discussion/agreement was that, to make things simpler and more
> uniform,
> ==> content-based query will be based on content index expressed in XPath.

And that is a great decision if you can guarantee that the content will 
always be XML, or that there will always be an XML mapping for content.  I 
would like to strike this requirement in favour of a content-type-neutral 
approach which I alluded to as being specified via a content handler 
interface.  Of course, XPath syntax can be extended to cover index expression 
for other formats, provided that the other formats are structured in some way 
(e.g. Images could utilize paths to refer to embedded metadata 
/Image/Metadata/DateTaken, Word processing files could use a path to 
represent sections, paragraphs, headings, metadata, etceteras).  The question 
is whether we would want to use XPath for addressing data that is not XML.


>
> 3.  How does everyone feel about a "Content Handler" architecture that
> would
> allow for a content based query to span more than just XML documents?  The
> registry's self describing CPP could contain a list of mime types that can
> be
> content searched, and the implementation of each content type other than
> XML/HTML and plain text could be left up to registry vendors.
> ==> Content handler sounds good.
>

In that case, I guess that the major decision is how we should express 
indexes for arbitrary content-types.  I have no problem with using XPath and 
having the available paths for content that is non-xml exposed via an entry 
in the registry CPP, e.g.

<ContentBasedQuery>
	<Type mime="application/msword" name="MS Word">
		<Mapping occurence="1" path="/Document/Wordcount" label="Word Count" />
		<Mapping occurence="*" path="/Document/h1" label="Level one heading" />
		...
	</Type>
	<Type />
	<Type />
</ContentBasedQuery>	

The neat thing about this is that users could specify an arbitrary mime type 
when submitting an object to the registry, and have a custom handler deal 
with its indexing and queries for content based queries.

Are non-XML content-types in the RE content in scope? 

-- 
Matthew MacKenzie
XML Global

<quote>
Canada Bill Jone's Motto:
  It's morally wrong to allow suckers to keep their money.
Supplement:
  A .44 magnum beats four aces.
</quote>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC