chairs message

Subject: Re: [chairs] need your comments on DocMgmt system requirements
From: Norman Walsh <ndw@nwalsh.com>
To: karl.best@oasis-open.org
Date: Tue, 17 Feb 2004 17:22:29 -0500
/ "Karl F. Best" <karl.best@oasis-open.org> was heard to say:
| I've put together a draft functional requirements document for this
| doc mgmt system and would like to get your feedback. It is very
| important that we have the requirements correct and complete before we
| start development of the project -- many of you are developers so I'm
| sure that you understand the importance of this.

High level comments:

- I don't think these requirements adequately address the distinction
  between a development system (where TCs actively revise documents,
  schemas, etc.) and a publication system (where TCs post working
  drafts, standards, and other "finished" work products).

  Is the proposal to develop one or the other, or both. If it's one or
  the other, then I think some of these requirements are completely
  inappropriate. If it's both, I think it might be useful to specify
  them separately. (And whether you imagine having resources to do
  them in sequence, or at the same time?)

- There are several places where the requirements seem to be
  self-contradictory.

- I think meeting all of the requirements listed below will be a
  significant challenge. A more detailed roadmap, showing staged
  progress with realistic time estimates would be very helpful.

- A number of the features that you describe would seem to be at least
  partially addressed by open source efforts like G-Forge (an open
  source version of SourceForge). Are you considering a system like
  that, or are you expecting to "roll your own" from scratch.

> OASIS DocMgmt Functional Requirements
> 
> (17 February 2004)
> 
> General Description: A repository providing storage/management of
> files created by TCs, SCs, and other OASIS groups

Technical committees need to be able to store and manage a collection
of resources. Principal among these resources are documents, but it's
reasonable to consider other, related resources as well, including
issue lists, archives, news items, and syndicated content.

>  o Probably based on CVS

The requirements for a "development tree" are likely to be somewhat
different than the requirements for a "publishing tree". In
particular, I would expect published standards to be more-or-less
immutable, to have persistent URIs, etc. In a development tree, those
constraints might be quite stifling.

CVS supports a development system very well. It's not immediately
clear to me if it supports a publication system equally well.

>  o A separate area in the repository for each TC/SC/group; both
>    default and definable hierarchy within each TC area

Can you elaborate on what you mean by "both default and definable"?
What do you have in mind for "default"?

>  o All documents are permanently archived (only Admin has delete
>    rights)

In CVS terms, you can delete a document, but you can always recover
it. In a development tree, it's not uncommon to reorganize some code
or a document and want to remove modules from the current "head" of
the development tree. This goes back to my comment before that the
requirements for publication and development are somewhat different.

>  o All documents are publicly viewable, downloadable
>
>  o Repository has a web interface for uploading and tree browsing,
>    searching, and retrieval
> 
>      + Support for all major browsers
> 
>      + Listing of single files includes filename, title, description,
>        date, creator, and language; listing of packages includes the
>        list of single files in the package
> 
>      + Search by filename, title, date, creator, and language; and
>        full-text search of description and contents.

Does it have other interfaces? Are you describing a front-end for CVS
here, or something else? Does it support Web-DAV?

I think it would make sense to address searching as its own top-level
item. In particular, the description above suggests that every item
will have a set of metadata that can be searched. Where/when is this
metadata created? Can I add my own? Is it expressed in an open format,
an XML vocabulary or RDF or a topic map, or is it proprietary? How
does this metadata evolve as documents change in CVS?

As for searching the content, that's clearly going to depend on the
type of content. What types will the system support?

> Persistent URLs 
> 
>  o At file creation the document is assigned a URL according to the
>    OASIS file naming scheme. The URL will always resolve to the latest
>    version of the document, regardless of the documents (versioned)
>    filename; a URL will identify a specification throughout its entire
>    lifetime from working draft to OASIS Standard. Previous versions of
>    the document will be accessible via a variant of the URL containing
>    the version number.

This is fine for storing standards but it's in conflict with the use
of CVS and the reference above to a "definable hierarchy".

I think this should apply to published standards and work products,
but I don't think it can practically be applied to a development
space.

This suggests that the interface to the published standards space
might require more constraints. I hope that these constraints can be
imposed without requiring me to interact with the system only through
a web interface.

> Multiple file types supported 
> 
>  o TCs will store both source (e.g. MSWord or HTML) and compiled (e.g.
>    PDF) versions of each file; i.e. the repository should not allow a
>    PDF to be checked in without a matching .doc or .html file

Uhm, what about documents that have a source which is neither a
proprietary tool or HTML?

Imposing the requirement that the system check for classes of
dependencies between files of different types is going to be tricky,
especially as the specs evolve. Suppose I rebuild the PDF, can I check
it in without checking in a new source document? What if I only
corrected a formatting bug? If I check in a new source, what happens
to the PDF?

I think a lot more detail is required in this part of the
requirements.

>  o HTML files may include graphics which will be stored with the file
>    (use relative URLs?)

What about other cross-document links? What about XML files that refer
to both HTML and PDF presentations? What about document trees that
consist of multiple chapters in a hierarchy with a common set of
figures?

More detail, please.

>  o use MIME types
> 
> Packages
> 
>  o A specification may be composed of multiple documents. The entire
>    package may be uploaded or downloaded in a single operation.
>    Individual documents in the package may also be uploaded or
>    downloaded.

I don't understand what you mean here. Are you suggesting that I might
upload a package (as a ZIP file? as a MIME multi-part related stream?)
and then several days later upload a new version of one component in
that package. Having done so, what "version" does the package have?
Can I still download the original? Can I download the revised version?

>  o Support for chapters or parts of a multi-part document (with links
>    between parts); a package could have a ToC with links to the
>    individual files

I think any attempt to describe the size and shape of a package ("it will have
a ToC and chapters" or "it will have a starting page and parts") will be
problematic. Best just to accept that a multi-part document is a directed
graph (a web).

>  o Support for modular DTDs (e.g. DocBook)

What does this requirement mean? Do you also mean modular W3C XML
Schemas and RELAX NG grammars? Does this requirement differ from the
preceding one in a particular way?

>  o The entire package is addressable via a single URL, as are the
>    individual documents. The package URL will link to an HTML page
>    listing the package contents.

Is that an HTML page constructed by the author of the package, or
automatically from the content of the package? If it's the latter,
what constraints, if any, does that impose on the contents of the
package?

> Security
> 
>  o Check-in/out based on Kavi user authentication; different
>    permissions for public, TC members, chair/secretary, etc.
> 
>  o TC members have ??? rights (TBD)
> 
>  o TC Chair and Secretary have create, edit rights for folders and
>    checkin/out rights for documents in their respective TC area
> 
>  o Admin has admin rights (create, checkin/out, delete of all folders and files)
> 
>  o Public has read rights for all documents

How does "admin" differ from chair/secretary?

> Kavi integration
> 
>  o Kavi user acct/pswd used for authentication in doc mgmt system
> 
>  o Notification to the Kavi group when a document is uploaded (same as
>    current Kavi notification)
> 
>  o The current Kavi doc repository is disabled; links within Kavi will
>    go to this doc mgmt system instead (i.e. Kavi doc repository is
>    hidden, this one drops in to replace it).
>
>  o Docs currently in the Kavi repository will continue to be
>    addressable and viewable by their Kavi URL (allow for migration over
>    time)

This requirement and the previous requirement seem to be in conflict.
Can you explain how "the links within Kavi will go to this doc mgmt
system instead" supports the goal that "the Kavi repository will
continue to be addressable and viewable by their Kavi URL (allow for
migration over time)"?

>  o When new Kavi group (TC/SC) is created, a doc mgmt area for that
>    group and default folders are automatically created

This goes back to the question of defaults before. What hierarchy do
you have in mind, and what are your motivations for creating it? I
think it'll be easier in the long run to simply create an empty
hierarchy and let the TCs populate it.

If you have in mind that minutes should go in /minutes and press
clippings should go in /press, etc., then I think a detailed
description of the default hierarchy is required.

> File naming (automation of this done in a later phase; just do this manually at first?)
> 
>  o Naming and versioning of documents follows OASIS file naming scheme
> 
>  o When a new document is created it will be named according to the
>    scheme; automated helps to create/assign a name

This seems to duplicate the requirements expressed under "Persistent
URLs". Is it intended to be different? I believe my comments there
apply here as well.

> Localizable interface, with localization to occur in a later phase
> 
> Later phase: Count/traffic report of downloads (how many people have
> downloaded a particular doc?)

Other later phase items?

  - Issue tracking?
  - automatic generation of PDF/HTML from source formats?
  - validation?
  - interactive forms (e.g., the ability to support an interface that
    asks a number of questions and then builds an appropriate schema
    customization layer)?
  - Syndication of announcements
  - An informal "journal" space (or blog, if you will) for TC members
    to outline their thoughts and ideas?

                                        Be seeing you,
                                          norm

P.S. I'm happy to report that your requirements document can be nicely
presented in an open format (plain text, in this case) instead of a
proprietary format. I hope that its greater accessibility in this
format (and the fact that it's six times smaller) can be used to
demonstrate once again the value of open standards.

(For even more thoughts on this topic, see
 http://www.gnu.org/philosophy/no-word-attachments.html)

-- 
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
PGP signature
Follow-Ups:
- Re: [chairs] need your comments on DocMgmt system requirements
  - From: "Karl F. Best" <karl.best@oasis-open.org>
References:
- need your comments on DocMgmt system requirements
  - From: "Karl F. Best" <karl.best@oasis-open.org>