In my experience, CVS doesn't handle MS Word documents well. It is designed
for plain-text source code, and MS Word's file format doesn't allow it to
produce an economical diff between one version and the next. This means that it
wastes considerable space when versioning Word. I cannot comment on its ability
to version html, but I suspect it would do much better on that. Perhaps we
should all be using TeX, because that can be versioned more readily (ah, that
was a joke...)
Is there a tool that would be able to version MS Word more effectively? I
certainly don't know. Does that mean we shouldn't use Word? I hope not - our TC
has found Word's change tracking rather useful when working
co-chair UDDI TC
From: Karl F. Best
Sent: Thu 19-Feb-04 2:04
To: Norman Walsh
Cc: Chairs OASIS; Jeff Lomas
Subject: Re: [chairs] need your comments on DocMgmt system
Norman Walsh wrote:
> / "Karl F. Best"
<firstname.lastname@example.org> was heard to say:
> | I've put together
a draft functional requirements document for this
> | doc mgmt system
and would like to get your feedback. It is very
> | important that we
have the requirements correct and complete before we
> | start
development of the project -- many of you are developers so I'm
> | sure
that you understand the importance of this.
> High level
> - I don't think these requirements adequately
address the distinction
> between a development system
(where TCs actively revise documents,
> schemas, etc.) and a
publication system (where TCs post working
standards, and other "finished" work
> Is the proposal to develop
one or the other, or both. If it's one or
> the other,
then I think some of these requirements are
> inappropriate. If it's both, I think it
might be useful to specify
> them separately. (And
whether you imagine having resources to do
> them in
sequence, or at the same time?)
I've previously thought of having a
two-phase system, the first of which
would provide a "sandbox" for the TC
members to collaborate in
developing a document. Then once the doc reached
a certain stage it
would then go into a more controlled environment with
and edited only by the TC. I've gotten the impression that
would only use the second phase, but I could be
Chairs: would you prefer having both of these phases built into
mgmt system (open collaboration, followed by more rigourous
would you only use the second?
> - There are several
places where the requirements seem to be
Specifics? This is obviously a draft so needs
polishing, so suggestions
> - I think meeting all of
the requirements listed below will be a
challenge. A more detailed roadmap, showing staged
progress with realistic time estimates would be very helpful.
That's the next step. But right now I'm just gathering
can't very well write a development schedule until I
know what it is that
we're trying to build.
I'd also like suggestions on which parts of this
are most important. I'm
debating whether we should try a phased development
provide base functionality now then add a more functionality
Looking through the requirements that I have now, though, I'm
which ones we could put off until later.
> - A number of the features that you describe
would seem to be at least
> partially addressed by open
source efforts like G-Forge (an open
> source version of
SourceForge). Are you considering a system like
> that, or
are you expecting to "roll your own" from scratch.
I'm intending for us
to build on top of an existing system. That's why I
said "probably CVS".
We'd be silly to build something from scratch when
the engine already
exists. We'll build some sort of customized web
interface on top of the
engine. Once we have the requirements we'll know
what it is that we need to
build. I'd also like suggestions for the
engine; is CVS the way to go, or
do people recommend something else?
>>OASIS DocMgmt Functional
>>General Description: A repository providing
>>files created by TCs, SCs, and other OASIS
> Technical committees need to be able to store and
manage a collection
> of resources. Principal among these resources are
documents, but it's
> reasonable to consider other, related resources as
> issue lists, archives, news items, and syndicated
The doc mgmt system would store any type of file. Not just
also the other doc types you mention.
Would some of these
stored objects be links and not files?
>> o Probably based on
> The requirements for a "development tree" are likely to be
> different than the requirements for a "publishing tree".
> particular, I would expect published standards to be
> immutable, to have persistent URIs, etc. In a development
> constraints might be quite stifling.
supports a development system very well. It's not immediately
> clear to
me if it supports a publication system equally well.
I'm certainly not
a CVS expert, though I'm aware that it was built for
than documents. So it may not be ideal for what we want.
have suggestions for a better engine, better suited for doc
publishing, upon which to build our system?
>> o A separate area
in the repository for each TC/SC/group; both
and definable hierarchy within each TC area
> Can you elaborate
on what you mean by "both default and definable"?
> What do you have in
mind for "default"?
When we create a new TC we would define hierarchy
branches for such
things as e.g. "drafts", "minutes", "contributions" etc.
(TBD). Then the
TC chair could define additional branches as required. We'd
want to keep
the hierarchy as flat as possible to keep the URLs short, and
some consistency, but I want to give the TCs some control over
>> o All documents are permanently archived (only
Admin has delete
> In CVS terms,
you can delete a document, but you can always recover
> it. In a
development tree, it's not uncommon to reorganize some code
> or a
document and want to remove modules from the current "head" of
development tree. This goes back to my comment before that the
requirements for publication and development are somewhat
Maybe this is where the "sandbox" (above) comes in. I don't
see the need
of permanently archiving early drafts, but once a doc is
the permanent repository it should be
>> o All documents are publicly viewable,
>> o Repository has a web interface for
uploading and tree browsing,
>> searching, and
>> + Support for all
>> + Listing of
single files includes filename, title,
>> date, creator,
and language; listing of packages includes
>> list of single files in
>> + Search by
filename, title, date, creator, and language;
>> full-text search of
description and contents.
> Does it have other interfaces? Are
you describing a front-end for CVS
> here, or something else? Does it
I would expect that most people would want to use a
web interface, but I
suppose that power users may want to deal more
directly with the engine.
But there's also certain safeguards
(permissions, restrictions on
naming, etc.) that may require that we use an
interface. I don't know
yet; this may depend on the engine.
the benefits of Web-DAV? (I'm not an expert on this.)
> I think it
would make sense to address searching as its own top-level
> item. In
particular, the description above suggests that every item
> will have a
set of metadata that can be searched. Where/when is this
created? Can I add my own? Is it expressed in an open format,
> an XML
vocabulary or RDF or a topic map, or is it proprietary? How
> does this
metadata evolve as documents change in CVS?
I see the metadata as
comprised of the fields listed above. TBD. I don't
know yet how this would
be expressed because we havne't selected an
this matter? Yes, we should use XML on principle, but I don't
see it as a
> As for searching the content, that's clearly going to
depend on the
> type of content. What types will the system
Obviously not all content will be searchable. If somebody
uploads a blob
there's not much we'll be able to do with it besides just
We will store whatever types of files the TCs need to
>> o At file
creation the document is assigned a URL according to
>> OASIS file naming scheme. The URL will always
resolve to the latest
>> version of the document,
regardless of the documents (versioned)
>> filename; a
URL will identify a specification throughout its
>> lifetime from working draft to OASIS Standard.
Previous versions of
>> the document will be accessible
via a variant of the URL containing
>> the version
> This is fine for storing standards but it's in
conflict with the use
> of CVS and the reference above to a "definable
Again, I'm not an expert on what you can and can't do with
> I think this should apply to
published standards and work products,
> but I don't think it can
practically be applied to a development
If we have a
"sandbox" phase then we wouldn't expect a persistent URL
for those items.
Only once a doc is checked into the permanent
repository would we do
> This suggests that the interface to the published standards
> might require more constraints. I hope that these constraints
> imposed without requiring me to interact with the system only
> a web interface.
As above, power users like yourself
may wish to talk directly to the
engine, but there will be some constraints
for security and consistency.
If it is practical to enforce those
constraints via both a web interface
as well as a native interface then we
will. But if it's not practical
then we'll have to do everything through a
>>Multiple file types supported
o TCs will store both source (e.g. MSWord or HTML) and compiled
>> PDF) versions of each file; i.e. the repository
should not allow a
>> PDF to be checked in without a
matching .doc or .html file
> Uhm, what about documents that
have a source which is neither a
> proprietary tool or HTML?
above is not an exhaustive list. I'm just suggesting that both
compiled versions should be in the repository. Any
should agree with this philosophy.
> Imposing the requirement that
the system check for classes of
> dependencies between files of
different types is going to be tricky,
> especially as the specs evolve.
Suppose I rebuild the PDF, can I check
> it in without checking in a new
source document? What if I only
> corrected a formatting bug? If I check
in a new source, what happens
> to the PDF?
Yeah, we'll have to
figure this out. How do you do it when you write code?
> I think a
lot more detail is required in this part of the
That's why I'm asking for input.
>> o HTML
files may include graphics which will be stored with the
>> (use relative URLs?)
> What about
other cross-document links? What about XML files that refer
> to both
HTML and PDF presentations? What about document trees that
> consist of
multiple chapters in a hierarchy with a common set of
> More detail, please.
>> o use MIME
>> o A
specification may be composed of multiple documents. The
>> package may be uploaded or downloaded in a
>> Individual documents in the package
may also be uploaded or
don't understand what you mean here. Are you suggesting that I might
upload a package (as a ZIP file? as a MIME multi-part related stream?)
and then several days later upload a new version of one component in
that package. Having done so, what "version" does the package have?
Can I still download the original? Can I download the revised
Probably the package will just be an HTML file with links to
all of the
components. In that case the package is updated by editing the
the package file. Each of the components are maintained by editing
individually. Each component, as well as the package file, could
its own version number or date, but the entire set would
have to be versioned. Would this work?
Support for chapters or parts of a multi-part document (with
>> between parts); a package could have a ToC with
links to the
>> individual files
> I think
any attempt to describe the size and shape of a package ("it will have
a ToC and chapters" or "it will have a starting page and parts") will
> problematic. Best just to accept that a multi-part document is a
> graph (a web).
Would my description (above) of a
package work for this? The TC can
decide how it wants to structure the
>> o Support for modular DTDs (e.g.
> What does this requirement mean? Do you also mean
modular W3C XML
> Schemas and RELAX NG grammars? Does this requirement
differ from the
> preceding one in a particular way?
the same, I think, but I'd be happy to hear other
requirements not met by
>> o The entire package is addressable via a single
URL, as are the
>> individual documents. The package URL
will link to an HTML page
>> listing the package
> Is that an HTML page constructed by the author of
the package, or
> automatically from the content of the package? If it's
> what constraints, if any, does that impose on the contents
Check-in/out based on Kavi user authentication;
>> permissions for public, TC members,
>> o TC members have ??? rights
>> o TC Chair and Secretary have create, edit
rights for folders and
>> checkin/out rights for
documents in their respective TC area
>> o Admin has
admin rights (create, checkin/out, delete of all folders and
>> o Public has read rights for all
> How does "admin" differ from
"Admin" is the OASIS staff administrator of the dc
>> o Kavi
user acct/pswd used for authentication in doc mgmt
>> o Notification to the Kavi group when a
document is uploaded (same as
>> current Kavi
>> o The current Kavi doc repository is
disabled; links within Kavi will
>> go to this doc mgmt
system instead (i.e. Kavi doc repository is
this one drops in to replace it).
>> o Docs currently in
the Kavi repository will continue to be
and viewable by their Kavi URL (allow for migration
> This requirement and the
previous requirement seem to be in conflict.
> Can you explain how "the
links within Kavi will go to this doc mgmt
> system instead" supports
the goal that "the Kavi repository will
> continue to be addressable and
viewable by their Kavi URL (allow for
> migration over
Right now when you're in Kavi you can click on a link for
repository" and it will take you to that page in Kavi. I'd like it to
to the new doc mgmt system instead. But we should allow current docs
the Kavi repository to stay where they're at until the TC wants to
them, so these docs need to remain addressable by the current
We'll have to keep the Kavi search/browse accessible, but the
would go to the new doc mgmt system.
>> o When new
Kavi group (TC/SC) is created, a doc mgmt area for
>> group and default folders are automatically
> This goes back to the question of defaults before.
What hierarchy do
> you have in mind, and what are your motivations for
creating it? I
> think it'll be easier in the long run to simply create
> hierarchy and let the TCs populate it.
> If you
have in mind that minutes should go in /minutes and press
should go in /press, etc., then I think a detailed
> description of the
default hierarchy is required.
See above. Still TBD, but we need both
consistency as well as flexibility.
>>File naming (automation of
this done in a later phase; just do this manually at
>> o Naming and versioning of documents follows
OASIS file naming scheme
>> o When a new document is
created it will be named according to the
automated helps to create/assign a name
> This seems to
duplicate the requirements expressed under "Persistent
> URLs". Is it
intended to be different? I believe my comments there
> apply here as
Th eintent is to provide (eventually, maybe a bit later) a GUI to
name new files conformant with the OASIS doc naming scheme. I
pull-downs to select each of the components of the name. But this
probably be later; the file creator would have to manually name the
>>Localizable interface, with localization to
occur in a later phase
>>Later phase: Count/traffic
report of downloads (how many people have
>>downloaded a particular
> Other later phase
> - Issue tracking?
Sounds like a
separate tool. Yes, we need this. Suggestions?
automatic generation of PDF/HTML from source formats?
Yeah, we could
add this, but is there a need? Can't people do this
> - validation?
Ditto. Can't you do this
But, yes, I see the utility of having validation on checkin,
publishing, as part of a doc mgmt system.
interactive forms (e.g., the ability to support an interface
> asks a number of questions and then
builds an appropriate schema
That's the sort of interface I had in mind for the file naming
But I see this as a separate tool for
> - Syndication of
> - An informal "journal" space (or blog, if
you will) for TC members
> to outline their
thoughts and ideas?
Both of those are separate tools. Not sure how
those would be part of a
doc mgmt system.
Thanks for the feedback.
> P.S. I'm happy to report that your requirements document
can be nicely
> presented in an open format (plain text, in this case)
instead of a
> proprietary format. I hope that its greater accessibility
> format (and the fact that it's six times smaller) can be used
> demonstrate once again the value of open standards.
(For even more thoughts on this topic, see
Vice President, OASIS
office +1 978.667.5115
x206 mobile +1