[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Enhanced Searching Proposal
Greetings! I am not real sure where this should go on the Metadata wiki so I am posting it to the list and we can discuss where it should appear on the wiki. Enhanced Searching Proposal There are several issues with regard to enhanced searching and I have simply grouped them together for discussion purposes: 1. Referencing Metadata: Obviously documents generated in a particular context or enterprise are likely to share the same set of metadata to enhance searching and so it would be wasteful to replicate such metadata within the document package per se. Rather than replicating such metadata, we will need to provide a mechanism that allows pointing to metadata that can be used to result in enhanced searching. But, at the same time, we need to allow for users to record custom metadata to be used to enhance searching, whether that metadata is external to the document package or recorded as metadata inside that package. 2. What Metadata Applies?: Even though a user may have chosen one or more sets of metadata to be used to enhance searching, it is very possible that some term or phrase should not be governed by a particular metadata set or perhaps the user wants to provide custom metadata for a particular term or phrase. Or to indicate that a particular set of metadata (whether inside or outside) the document package, should govern a particular term or phrase. In general terms, I think a particular metadata set should be said to govern all the terms or phrases in a document, without the need to point to specific terms or phrases in the document. In other words, the only pointing into or out of a document should be in the particular cases where a user wishes to escape from the application of that general rule. Reasoning that is it far less burdensome to require a pointing mechanism (in or out of document content) when it is necessary to escape from general metadata that is being used to enhance the searching for a particular term or phrase. 3. What Data Model for Metadata? First, I don't think we should constrain the metadata that is being used to enhance searching just as we should not determine the nature of the enhancement by metadata for searching. Different metadata schemes will offer different enhancements for searching of the underlying data just as search engines may use that metadata differently and I don't think the Metadata SC nor the ODF TC possess the knowledge to proscribe how that should be done. It is sufficient that we provide the ability to enhance searching by the use of metadata and no more. Second, given that terms or phrases will occur in multiple locations in a document, as far as identification of those terms or phrases that may occur in a document, it seems that use of any URI/IRI based model would be overly restrictive. It should be sufficient to identify the term or phrase (when there is no pointing out of document content) by use of a string value. Any properties associated with that particular string value should adhere to the current ODF model of name/value pairs. This need not be seen as a departure from the general use of an RDF data model as any metadata wishing to use that model can simply include the term or phrase plus a URI/IRI as one of the properties and can be viewed by RDF based applications as RDF. Note that I have no beef with anyone who wants to use RDF as a search enhancement mechanism but on the other hand, don't want to limit ODF to that being the only data model that can be used to enhance searches. 4. Premise of Enhanced Searching: Not sure how much space this should get in our final product but thought I should say a word or two about how metadata can be used to enhance searching. The difficulty with searching of any large resource collection is that the broader the collection, the more difficult it is to distinguish when the same terms or phrases mean the same thing and to find when different terms or phrases mean the same thing. Search engines and services have done a valiant job of providing relatively useful search results, but mostly based on the premise that authors cannot be asked what they meant. The enhanced search metadata of ODF voids that premise and enables authors to declare what they meant. Starting with the ability to declare a vocabulary of terms that have addition information that can be used by search engines to both distinguish the same terms or phrases that mean different things but that can also be used as the basis for determining when different terms or phrases mean the same thing. The ODF committee lacks the expertise to draft such vocabularies or to provide guidance on their best use but have provided the means to enable users to provide that information for use by search engines. In addition to providing the ability to declare such vocabularies, OpenDocument has also provided a standard mechansim for users to declare that particular terms or phrases belong to particular vocabularies, for use when multiiple vocabularies have been associated with a particular document or simply when a term or phrase has some meaning that is different from that of one or more vocabularies associated with a document. OpenDocument does not mandate a particular data model for metadata that is to be used to enhance searching other than the means by which metadata vocabularies are identified and how particular terms or phrases can be identitifed with particular vocabularies or with custom metadata. Metadata vocabularies for OpenDocument documents can be composed in the most formal ontological languages to the most informal folksonomies and interpreted by an equally broad range of search engines. The emphasis in OpenDocument in on enabling the use of evolving metadata by authors to enhance the searching of the documents they author. ********* Sorry for the late posting! Patrick -- Patrick Durusau Patrick@Durusau.net Chair, V1 - Text Processing: Office and Publishing Systems Interface Co-Editor, ISO 13250, Topic Maps -- Reference Model Member, Text Encoding Initiative Board of Directors, 2003-2005 Topic Maps: Human, not artificial, intelligence at work!
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]