In my experience, CVS doesn't handle MS Word documents well. It is designed
for plain-text source code, and MS Word's file format doesn't allow it to
produce an economical diff between one version and the next. This means that it
wastes considerable space when versioning Word. I cannot comment on its ability
to version html, but I suspect it would do much better on that. Perhaps we
should all be using TeX, because that can be versioned more readily (ah, that
was a joke...)
Is there a tool that would be able to version MS Word more effectively? I
certainly don't know. Does that mean we shouldn't use Word? I hope not - our TC
has found Word's change tracking rather useful when working
collaboratively.
Tony Rogers
co-chair UDDI TC
-----Original Message----- From: Karl F. Best
[mailto:karl.best@oasis-open.org] Sent: Thu 19-Feb-04 2:04
To: Norman Walsh Cc: Chairs OASIS; Jeff Lomas
Subject: Re: [chairs] need your comments on DocMgmt system
requirements
Norman Walsh wrote: > / "Karl F. Best"
<karl.best@oasis-open.org> was heard to say: > | I've put together
a draft functional requirements document for this > | doc mgmt system
and would like to get your feedback. It is very > | important that we
have the requirements correct and complete before we > | start
development of the project -- many of you are developers so I'm > | sure
that you understand the importance of this. > > High level
comments: > > - I don't think these requirements adequately
address the distinction > between a development system
(where TCs actively revise documents, > schemas, etc.) and a
publication system (where TCs post working > drafts,
standards, and other "finished" work
products). > > Is the proposal to develop
one or the other, or both. If it's one or > the other,
then I think some of these requirements are
completely > inappropriate. If it's both, I think it
might be useful to specify > them separately. (And
whether you imagine having resources to do > them in
sequence, or at the same time?)
I've previously thought of having a
two-phase system, the first of which would provide a "sandbox" for the TC
members to collaborate in developing a document. Then once the doc reached
a certain stage it would then go into a more controlled environment with
e.g. versioning and edited only by the TC. I've gotten the impression that
most TCs would only use the second phase, but I could be
wrong.
Chairs: would you prefer having both of these phases built into
the doc mgmt system (open collaboration, followed by more rigourous
control)? or would you only use the second?
> - There are several
places where the requirements seem to be >
self-contradictory.
Specifics? This is obviously a draft so needs
polishing, so suggestions are welcome.
> - I think meeting all of
the requirements listed below will be a > significant
challenge. A more detailed roadmap, showing staged >
progress with realistic time estimates would be very helpful.
Yeah.
That's the next step. But right now I'm just gathering requirements. I
can't very well write a development schedule until I know what it is that
we're trying to build.
I'd also like suggestions on which parts of this
are most important. I'm debating whether we should try a phased development
approach (i.e. provide base functionality now then add a more functionality
over time). Looking through the requirements that I have now, though, I'm
not sure which ones we could put off until later.
Chairs:
suggestions please.
> - A number of the features that you describe
would seem to be at least > partially addressed by open
source efforts like G-Forge (an open > source version of
SourceForge). Are you considering a system like > that, or
are you expecting to "roll your own" from scratch.
I'm intending for us
to build on top of an existing system. That's why I said "probably CVS".
We'd be silly to build something from scratch when the engine already
exists. We'll build some sort of customized web interface on top of the
engine. Once we have the requirements we'll know what it is that we need to
build. I'd also like suggestions for the engine; is CVS the way to go, or
do people recommend something else?
>>OASIS DocMgmt Functional
Requirements >> >>(17 February
2004) >> >>General Description: A repository providing
storage/management of >>files created by TCs, SCs, and other OASIS
groups > > Technical committees need to be able to store and
manage a collection > of resources. Principal among these resources are
documents, but it's > reasonable to consider other, related resources as
well, including > issue lists, archives, news items, and syndicated
content.
The doc mgmt system would store any type of file. Not just
specs, but also the other doc types you mention.
Would some of these
stored objects be links and not files?
>> o Probably based on
CVS > > The requirements for a "development tree" are likely to be
somewhat > different than the requirements for a "publishing tree".
In > particular, I would expect published standards to be
more-or-less > immutable, to have persistent URIs, etc. In a development
tree, those > constraints might be quite stifling. > > CVS
supports a development system very well. It's not immediately > clear to
me if it supports a publication system equally well.
I'm certainly not
a CVS expert, though I'm aware that it was built for development rather
than documents. So it may not be ideal for what we want.
Does anyone
have suggestions for a better engine, better suited for doc development and
publishing, upon which to build our system?
>> o A separate area
in the repository for each TC/SC/group; both >> default
and definable hierarchy within each TC area > > Can you elaborate
on what you mean by "both default and definable"? > What do you have in
mind for "default"?
When we create a new TC we would define hierarchy
branches for such things as e.g. "drafts", "minutes", "contributions" etc.
(TBD). Then the TC chair could define additional branches as required. We'd
want to keep the hierarchy as flat as possible to keep the URLs short, and
we'd want some consistency, but I want to give the TCs some control over
there space.
>> o All documents are permanently archived (only
Admin has delete >> rights) > > In CVS terms,
you can delete a document, but you can always recover > it. In a
development tree, it's not uncommon to reorganize some code > or a
document and want to remove modules from the current "head" of > the
development tree. This goes back to my comment before that the >
requirements for publication and development are somewhat
different.
Maybe this is where the "sandbox" (above) comes in. I don't
see the need of permanently archiving early drafts, but once a doc is
checked into the permanent repository it should be
permanent.
>> o All documents are publicly viewable,
downloadable >> >> o Repository has a web interface for
uploading and tree browsing, >> searching, and
retrieval >> >> + Support for all
major browsers >> >> + Listing of
single files includes filename, title,
description, >> date, creator,
and language; listing of packages includes
the >> list of single files in
the package >> >> + Search by
filename, title, date, creator, and language;
and >> full-text search of
description and contents. > > Does it have other interfaces? Are
you describing a front-end for CVS > here, or something else? Does it
support Web-DAV?
I would expect that most people would want to use a
web interface, but I suppose that power users may want to deal more
directly with the engine. But there's also certain safeguards
(permissions, restrictions on naming, etc.) that may require that we use an
interface. I don't know yet; this may depend on the engine.
What are
the benefits of Web-DAV? (I'm not an expert on this.)
> I think it
would make sense to address searching as its own top-level > item. In
particular, the description above suggests that every item > will have a
set of metadata that can be searched. Where/when is this > metadata
created? Can I add my own? Is it expressed in an open format, > an XML
vocabulary or RDF or a topic map, or is it proprietary? How > does this
metadata evolve as documents change in CVS?
I see the metadata as
comprised of the fields listed above. TBD. I don't know yet how this would
be expressed because we havne't selected an engine yet.
How does
this matter? Yes, we should use XML on principle, but I don't see it as a
requirement.
> As for searching the content, that's clearly going to
depend on the > type of content. What types will the system
support?
Obviously not all content will be searchable. If somebody
uploads a blob there's not much we'll be able to do with it besides just
store it.
We will store whatever types of files the TCs need to
store.
>>Persistent URLs >> >> o At file
creation the document is assigned a URL according to
the >> OASIS file naming scheme. The URL will always
resolve to the latest >> version of the document,
regardless of the documents (versioned) >> filename; a
URL will identify a specification throughout its
entire >> lifetime from working draft to OASIS Standard.
Previous versions of >> the document will be accessible
via a variant of the URL containing >> the version
number. > > This is fine for storing standards but it's in
conflict with the use > of CVS and the reference above to a "definable
hierarchy".
Again, I'm not an expert on what you can and can't do with
CVS. Suggestions welcome.
> I think this should apply to
published standards and work products, > but I don't think it can
practically be applied to a development > space.
If we have a
"sandbox" phase then we wouldn't expect a persistent URL for those items.
Only once a doc is checked into the permanent repository would we do
this.
> This suggests that the interface to the published standards
space > might require more constraints. I hope that these constraints
can be > imposed without requiring me to interact with the system only
through > a web interface.
As above, power users like yourself
may wish to talk directly to the engine, but there will be some constraints
for security and consistency. If it is practical to enforce those
constraints via both a web interface as well as a native interface then we
will. But if it's not practical then we'll have to do everything through a
browser.
>>Multiple file types supported >> >>
o TCs will store both source (e.g. MSWord or HTML) and compiled
(e.g. >> PDF) versions of each file; i.e. the repository
should not allow a >> PDF to be checked in without a
matching .doc or .html file > > Uhm, what about documents that
have a source which is neither a > proprietary tool or HTML?
The
above is not an exhaustive list. I'm just suggesting that both source and
compiled versions should be in the repository. Any responsible developer
should agree with this philosophy.
> Imposing the requirement that
the system check for classes of > dependencies between files of
different types is going to be tricky, > especially as the specs evolve.
Suppose I rebuild the PDF, can I check > it in without checking in a new
source document? What if I only > corrected a formatting bug? If I check
in a new source, what happens > to the PDF?
Yeah, we'll have to
figure this out. How do you do it when you write code?
> I think a
lot more detail is required in this part of the >
requirements.
That's why I'm asking for input.
>> o HTML
files may include graphics which will be stored with the
file >> (use relative URLs?) > > What about
other cross-document links? What about XML files that refer > to both
HTML and PDF presentations? What about document trees that > consist of
multiple chapters in a hierarchy with a common set of >
figures? > > More detail, please.
More input,
please.
>> o use MIME
types >> >>Packages >> >> o A
specification may be composed of multiple documents. The
entire >> package may be uploaded or downloaded in a
single operation. >> Individual documents in the package
may also be uploaded or >> downloaded. > > I
don't understand what you mean here. Are you suggesting that I might >
upload a package (as a ZIP file? as a MIME multi-part related stream?) >
and then several days later upload a new version of one component in >
that package. Having done so, what "version" does the package have? >
Can I still download the original? Can I download the revised
version?
Probably the package will just be an HTML file with links to
all of the components. In that case the package is updated by editing the
links in the package file. Each of the components are maintained by editing
them individually. Each component, as well as the package file, could
have its own version number or date, but the entire set would
collectively have to be versioned. Would this work?
>> o
Support for chapters or parts of a multi-part document (with
links >> between parts); a package could have a ToC with
links to the >> individual files > > I think
any attempt to describe the size and shape of a package ("it will have >
a ToC and chapters" or "it will have a starting page and parts") will
be > problematic. Best just to accept that a multi-part document is a
directed > graph (a web).
Would my description (above) of a
package work for this? The TC can decide how it wants to structure the
multi-part spec.
>> o Support for modular DTDs (e.g.
DocBook) > > What does this requirement mean? Do you also mean
modular W3C XML > Schemas and RELAX NG grammars? Does this requirement
differ from the > preceding one in a particular way?
Pretty much
the same, I think, but I'd be happy to hear other requirements not met by
the above.
>> o The entire package is addressable via a single
URL, as are the >> individual documents. The package URL
will link to an HTML page >> listing the package
contents. > > Is that an HTML page constructed by the author of
the package, or > automatically from the content of the package? If it's
the latter, > what constraints, if any, does that impose on the contents
of the >
package? > > >>Security >> >> o
Check-in/out based on Kavi user authentication;
different >> permissions for public, TC members,
chair/secretary, etc. >> >> o TC members have ??? rights
(TBD) >> >> o TC Chair and Secretary have create, edit
rights for folders and >> checkin/out rights for
documents in their respective TC area >> >> o Admin has
admin rights (create, checkin/out, delete of all folders and
files) >> >> o Public has read rights for all
documents > > > How does "admin" differ from
chair/secretary?
"Admin" is the OASIS staff administrator of the dc
mgmt system.
>>Kavi integration >> >> o Kavi
user acct/pswd used for authentication in doc mgmt
system >> >> o Notification to the Kavi group when a
document is uploaded (same as >> current Kavi
notification) >> >> o The current Kavi doc repository is
disabled; links within Kavi will >> go to this doc mgmt
system instead (i.e. Kavi doc repository is >> hidden,
this one drops in to replace it). >> >> o Docs currently in
the Kavi repository will continue to be >> addressable
and viewable by their Kavi URL (allow for migration
over >> time) > > This requirement and the
previous requirement seem to be in conflict. > Can you explain how "the
links within Kavi will go to this doc mgmt > system instead" supports
the goal that "the Kavi repository will > continue to be addressable and
viewable by their Kavi URL (allow for > migration over
time)"?
Right now when you're in Kavi you can click on a link for
"doc repository" and it will take you to that page in Kavi. I'd like it to
go to the new doc mgmt system instead. But we should allow current docs
in the Kavi repository to stay where they're at until the TC wants to
move them, so these docs need to remain addressable by the current
URLs. We'll have to keep the Kavi search/browse accessible, but the
default would go to the new doc mgmt system.
>> o When new
Kavi group (TC/SC) is created, a doc mgmt area for
that >> group and default folders are automatically
created > > This goes back to the question of defaults before.
What hierarchy do > you have in mind, and what are your motivations for
creating it? I > think it'll be easier in the long run to simply create
an empty > hierarchy and let the TCs populate it. > > If you
have in mind that minutes should go in /minutes and press > clippings
should go in /press, etc., then I think a detailed > description of the
default hierarchy is required.
See above. Still TBD, but we need both
consistency as well as flexibility.
>>File naming (automation of
this done in a later phase; just do this manually at
first?) >> >> o Naming and versioning of documents follows
OASIS file naming scheme >> >> o When a new document is
created it will be named according to the >> scheme;
automated helps to create/assign a name > > This seems to
duplicate the requirements expressed under "Persistent > URLs". Is it
intended to be different? I believe my comments there > apply here as
well.
Th eintent is to provide (eventually, maybe a bit later) a GUI to
help name new files conformant with the OASIS doc naming scheme. I
envision pull-downs to select each of the components of the name. But this
will probably be later; the file creator would have to manually name the
file for now.
>>Localizable interface, with localization to
occur in a later phase >> >>Later phase: Count/traffic
report of downloads (how many people have >>downloaded a particular
doc?) > > > Other later phase
items? > > - Issue tracking?
Sounds like a
separate tool. Yes, we need this. Suggestions?
> -
automatic generation of PDF/HTML from source formats?
Yeah, we could
add this, but is there a need? Can't people do this
already?
> - validation?
Ditto. Can't you do this
already?
But, yes, I see the utility of having validation on checkin,
and publishing, as part of a doc mgmt system.
> -
interactive forms (e.g., the ability to support an interface
that > asks a number of questions and then
builds an appropriate schema > customization
layer)?
That's the sort of interface I had in mind for the file naming
(above). But I see this as a separate tool for
later.
> - Syndication of
announcements > - An informal "journal" space (or blog, if
you will) for TC members > to outline their
thoughts and ideas?
Both of those are separate tools. Not sure how
those would be part of a doc mgmt system.
Thanks for the feedback.
Much
appreciated.
-Karl
> >
Be seeing
you, >
norm > > P.S. I'm happy to report that your requirements document
can be nicely > presented in an open format (plain text, in this case)
instead of a > proprietary format. I hope that its greater accessibility
in this > format (and the fact that it's six times smaller) can be used
to > demonstrate once again the value of open standards. > >
(For even more thoughts on this topic, see > http://www.gnu.org/philosophy/no-word-attachments.html) >
-- ================================================================= Karl
F. Best Vice President, OASIS office +1 978.667.5115
x206 mobile +1
978.761.1648 karl.best@oasis-open.org http://www.oasis-open.org
|