OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cmis message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [OASIS Issue Tracker] Commented: (CMIS-86) Provide a new servicethat will allow search crawlers to efficiently navigate a CMIS repository.



    [ http://tools.oasis-open.org/issues/browse/CMIS-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=10140#action_10140 ] 

David Caruana commented on CMIS-86:
-----------------------------------

From reading the proposal, it seems the 'get' API only allows a client to query for changes across the whole repository.

Has there been any consideration for allowing a client to scope the returned change set to a specific folder (including sub folders)? I think there are use cases where the search engine may not be required to crawl all content, or multiple search engines may be setup to crawl a single repository (each indexing a separate part).

This might also support other use cases outside of search indexing e.g. implementation of a poor mans (pull based) change event queue.

Does each document (and folder) in the returned change set include its folder path? This can at least allow a client to filter the change set by folder.

> Provide a new service that will allow search crawlers to efficiently navigate a CMIS repository.
> ------------------------------------------------------------------------------------------------
>
>                 Key: CMIS-86
>                 URL: http://tools.oasis-open.org/issues/browse/CMIS-86
>             Project: OASIS Content Management Interoperability Services TC
>          Issue Type: New Feature
>          Components: Domain Model, REST/AtomPub Binding, Schema, Web Services Binding
>    Affects Versions: Draft 0.50
>            Reporter: Gregory Melahn
>            Assignee: Ethan Gur-esh
>             Fix For: Draft 0.6
>
>
> CMIS needs to allow repositories to expose what information inside the repository has changed in an efficient manner for applications of interest, like search crawlers, to facilitate incremental indexing of a repository.
> In theory, a search crawler could index the content of a CMIS repository by using the navigation mechanisms already defined as part of the proposed specification. For example, a crawler engine could start at the root collection and, using the REST bindings, progressively navigate through the folders, get the document content and metadata, and index that content. It could use the CMIS date/time stamps to more efficiently do this by querying for documents modified since the last crawl.
> But there are problems with this approach. First, there is no mechanism for knowing what has been deleted from the repository, so the indexed content would contain 'dead' references. Second, there is no standard way to get the access control information needed to filter the search results so the search consumer only sees the content (s)he is supposed to see. Third, each indexer would solve the crawling of the repository in a different way (for example, one could use query and one could use navigation) causing different performance and scalability characteristics that would be hard to control in such  a system.  Finally, the cost of indexing an entire repository can be prohibitive for large content, or content that changes often, requiring support for incremental crawling and paging results.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]