OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-metadata message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Enhanced Searching Proposal


Greetings!

I am not real sure where this should go on the Metadata wiki so I am 
posting it to the list and we can discuss where it should appear on the 
wiki.

Enhanced Searching Proposal

There are several issues with regard to enhanced searching and I have 
simply grouped them together for discussion purposes:

1. Referencing Metadata:

Obviously documents generated in a particular context or enterprise are 
likely to share the same set of metadata to enhance searching and so it 
would be wasteful to replicate such metadata within the document package 
per se. Rather than replicating such metadata, we will need to provide a 
mechanism that allows pointing to metadata that can be used to result in 
enhanced searching.

But, at the same time, we need to allow for users to record custom 
metadata to be used to enhance searching, whether that metadata is 
external to the document package or recorded as metadata inside that 
package.

2. What Metadata Applies?:

Even though a user may have chosen one or more sets of metadata to be 
used to enhance searching, it is very possible that some term or phrase 
should not be governed by a particular metadata set or perhaps the user 
wants to provide custom metadata for a particular term or phrase. Or to 
indicate that a particular set of metadata (whether inside or outside) 
the document package, should govern a particular term or phrase.

In general terms, I think a particular metadata set should be said to 
govern all the terms or phrases in a document, without the need to point 
to specific terms or phrases in the document. In other words, the only 
pointing into or out of a document should be in the particular cases 
where a user wishes to escape from the application of that general rule.

Reasoning that is it far less burdensome to require a pointing mechanism 
(in or out of document content) when it is necessary to escape from 
general metadata that is being used to enhance the searching for a 
particular term or phrase.

3. What Data Model for Metadata?

First, I don't think we should constrain the metadata that is being used 
to enhance searching just as we should not determine the nature of the 
enhancement by metadata for searching. Different metadata schemes will 
offer different enhancements for searching of the underlying data just 
as search engines may use that metadata differently and I don't think 
the Metadata SC nor the ODF TC possess the knowledge to proscribe how 
that should be done. It is sufficient that we provide the ability to 
enhance searching by the use of metadata and no more.

Second, given that terms or phrases will occur in multiple locations in 
a document, as far as identification of those terms or phrases that may 
occur in a document, it seems that use of any URI/IRI based model would 
be overly restrictive. It should be sufficient to identify the term or 
phrase (when there is no pointing out of document content) by use of a 
string value. Any properties associated with that particular string 
value should adhere to the current ODF model of name/value pairs.
This need not be seen as a departure from the general use of an RDF data 
model as any metadata wishing to use that model can simply include the 
term or phrase plus a URI/IRI as one of the properties and can be viewed 
by RDF based applications as RDF.

Note that I have no beef with anyone who wants to use RDF as a search 
enhancement mechanism but on the other hand, don't want to limit ODF to 
that being the only data model that can be used to enhance searches.

4. Premise of Enhanced Searching:

Not sure how much space this should get in our final product but thought 
I should say a word or two about how metadata can be used to enhance 
searching.

The difficulty with searching of any large resource collection is that 
the broader the collection, the more difficult it is to distinguish when 
the same terms or phrases mean the same thing and to find when different 
terms or phrases mean the same thing.

Search engines and services have done a valiant job of providing 
relatively useful search results, but mostly based on the premise that 
authors cannot be asked what they meant.

The enhanced search metadata of ODF voids that premise and enables 
authors to declare what they meant. Starting with the ability to declare 
a vocabulary of terms that have addition information that can be used by 
search engines to both distinguish the same terms or phrases that mean 
different things but that can also be used as the basis for determining 
when different terms or phrases mean the same thing. The ODF committee 
lacks the expertise to draft such vocabularies or to provide guidance on 
their best use but have provided the means to enable users to provide 
that information for use by search engines.

In addition to providing the ability to declare such vocabularies, 
OpenDocument has also provided a standard mechansim for users to declare 
that particular terms or phrases belong to particular vocabularies, for 
use when multiiple vocabularies have been associated with a particular 
document or simply when a term or phrase has some meaning that is 
different from that of one or more vocabularies associated with a document.

OpenDocument does not mandate a particular data model for metadata that 
is to be used to enhance searching other than the means by which 
metadata vocabularies are identified and how particular terms or phrases 
can be identitifed with particular vocabularies or with custom metadata. 
Metadata vocabularies for OpenDocument documents can be composed in the 
most formal ontological languages to the most informal folksonomies and 
interpreted by an equally broad range of search engines. The emphasis in 
OpenDocument in on enabling the use of evolving metadata by authors to 
enhance the searching of the documents they author.

*********

Sorry for the late posting!

Patrick


-- 
Patrick Durusau
Patrick@Durusau.net
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Member, Text Encoding Initiative Board of Directors, 2003-2005

Topic Maps: Human, not artificial, intelligence at work! 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]