[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [chairs] need your comments on DocMgmt system requirements
IMO, CVS is no more inefficient than Kavi for storing versions of things like Word files. Cheers, Christopher Ferris STSM, Emerging e-business Industry Architecture email: chrisfer@us.ibm.com blog: http://webpages.charter.net/chrisfer/blog.html phone: +1 508 377 9295 "Rogers, Tony" <Tony.Rogers@ca.com> wrote on 02/18/2004 04:09:29 PM: > Make a couple of changes in a Word document and store it in CVS (yeah, binary is probably how > you'd do it) - it stores the entire file again, rather than the changes - that's what I'm calling > inefficient, meaning "using more storage than is necessary to store the changes". I didn't say > "bad". I didn't say "unusable". I said "inefficient", and I cordially disagree with your statement > that "It is not.". > > I have used CVS for over 10 years, and it's a useful place to store source code. It's a lot less > useful when storing non-text files, however. > > One of the features of CVS and ancilliary programs that I use frequently is a display of the > differences between two versions of a file. I don't get that facility when CVS is storing Word > documents - all I can do is retrieve the two Word documents and look at them (anyone got a good > Diff for Word?). That's a big loss, especially in this environment, where we have multiple > authors. To make matters worse, it encourages use of the "track changes" feature of Word, and that > produces much larger Word documents... > > So what I am asking is: is there a system which will give us this valuable feature for Word > documents? (Ideally, also for PDF files) Something that will allow us to see things like "this > paragraph was added by Mr Slowsteady on 13 July, and modified by Ms Quicksmart on 14 August". If > the answer to that is an expensive document management system, then let's consider it. If it can > work on top of CVS so we can use native CVS facilities for text and html files, then that's a bonus. > > Tony Rogers > -----Original Message----- > From: Matthew MacKenzie [mailto:mattm@adobe.com] > Sent: Thu 19-Feb-04 7:34 > To: Christopher B Ferris > Cc: Rogers, Tony; karl.best@oasis-open.org; Chairs OASIS > Subject: Re: [chairs] need your comments on DocMgmt system requirements > cvs -z9 add -kb mydoc.doc > > You need to mark the document as "binary", and to not expand keywords. The -z flag tells the > client the level of compression to use. I've been using CVS almost daily for 5 years, and there > are several binary files in there (jars, zips, docs, pdfs, ps, exe, gz, ...). > > Is CVS inefficient in storing and versioning MS Word, or other binary documents? No. It is not. > Does CVS integrate with MS Word to make the cvs diff command and conflict resolution work? No. If > we want that, OASIS will probably want to pony up big bucks for a high end content management system. > > On Feb 18, 2004, at 4:23 PM, Christopher B Ferris wrote: > > Right, but you can store word docs in CVS... it's just inefficient. As for > HTML/XML, > it works just fine. > > Cheers, > > Christopher Ferris > STSM, Emerging e-business Industry Architecture > email: chrisfer@us.ibm.com > blog: http://webpages.charter.net/chrisfer/blog.html > phone: +1 508 377 9295 > > "Rogers, Tony" <Tony.Rogers@ca.com> wrote on 02/18/2004 03:13:55 PM: > > In my experience, CVS doesn't handle MS Word documents well. It is > designed for plain-text source > code, and MS Word's file format doesn't allow it to produce an > economical diff between one version > and the next. This means that it wastes considerable space when > versioning Word. I cannot comment > on its ability to version html, but I suspect it would do much better on > that. Perhaps we should > all be using TeX, because that can be versioned more readily (ah, that > was a joke...) > > Is there a tool that would be able to version MS Word more effectively? > I certainly don't know. > Does that mean we shouldn't use Word? I hope not - our TC has found > Word's change tracking rather > useful when working collaboratively. > > Tony Rogers > tony.rogers@ca.com > co-chair UDDI TC > -----Original Message----- > From: Karl F. Best [mailto:karl.best@oasis-open.org] > Sent: Thu 19-Feb-04 2:04 > To: Norman Walsh > Cc: Chairs OASIS; Jeff Lomas > Subject: Re: [chairs] need your comments on DocMgmt system requirements > > Norman Walsh wrote: > / "Karl F. Best" <karl.best@oasis-open.org> was heard to say: > | I've put together a draft functional requirements document for this > | doc mgmt system and would like to get your feedback. It is very > | important that we have the requirements correct and complete before > we > | start development of the project -- many of you are developers so > I'm > | sure that you understand the importance of this. > > High level comments: > > - I don't think these requirements adequately address the distinction > between a development system (where TCs actively revise documents, > schemas, etc.) and a publication system (where TCs post working > drafts, standards, and other "finished" work products). > > Is the proposal to develop one or the other, or both. If it's one > or > the other, then I think some of these requirements are completely > inappropriate. If it's both, I think it might be useful to specify > them separately. (And whether you imagine having resources to do > them in sequence, or at the same time?) > > I've previously thought of having a two-phase system, the first of which > would provide a "sandbox" for the TC members to collaborate in > developing a document. Then once the doc reached a certain stage it > would then go into a more controlled environment with e.g. versioning > and edited only by the TC. I've gotten the impression that most TCs > would only use the second phase, but I could be wrong. > > Chairs: would you prefer having both of these phases built into the doc > mgmt system (open collaboration, followed by more rigourous control)? or > would you only use the second? > > - There are several places where the requirements seem to be > self-contradictory. > > Specifics? This is obviously a draft so needs polishing, so suggestions > are welcome. > > - I think meeting all of the requirements listed below will be a > significant challenge. A more detailed roadmap, showing staged > progress with realistic time estimates would be very helpful. > > Yeah. That's the next step. But right now I'm just gathering > requirements. I can't very well write a development schedule until I > know what it is that we're trying to build. > > I'd also like suggestions on which parts of this are most important. I'm > debating whether we should try a phased development approach (i.e. > provide base functionality now then add a more functionality over time). > Looking through the requirements that I have now, though, I'm not sure > which ones we could put off until later. > > Chairs: suggestions please. > > - A number of the features that you describe would seem to be at least > partially addressed by open source efforts like G-Forge (an open > source version of SourceForge). Are you considering a system like > that, or are you expecting to "roll your own" from scratch. > > I'm intending for us to build on top of an existing system. That's why I > said "probably CVS". We'd be silly to build something from scratch when > the engine already exists. We'll build some sort of customized web > interface on top of the engine. Once we have the requirements we'll know > what it is that we need to build. I'd also like suggestions for the > engine; is CVS the way to go, or do people recommend something else? > > OASIS DocMgmt Functional Requirements > > (17 February 2004) > > General Description: A repository providing storage/management of > files created by TCs, SCs, and other OASIS groups > > Technical committees need to be able to store and manage a collection > of resources. Principal among these resources are documents, but it's > reasonable to consider other, related resources as well, including > issue lists, archives, news items, and syndicated content. > > The doc mgmt system would store any type of file. Not just specs, but > also the other doc types you mention. > > Would some of these stored objects be links and not files? > > o Probably based on CVS > > The requirements for a "development tree" are likely to be somewhat > different than the requirements for a "publishing tree". In > particular, I would expect published standards to be more-or-less > immutable, to have persistent URIs, etc. In a development tree, those > constraints might be quite stifling. > > CVS supports a development system very well. It's not immediately > clear to me if it supports a publication system equally well. > > I'm certainly not a CVS expert, though I'm aware that it was built for > development rather than documents. So it may not be ideal for what we > want. > > Does anyone have suggestions for a better engine, better suited for doc > development and publishing, upon which to build our system? > > o A separate area in the repository for each TC/SC/group; both > default and definable hierarchy within each TC area > > Can you elaborate on what you mean by "both default and definable"? > What do you have in mind for "default"? > > When we create a new TC we would define hierarchy branches for such > things as e.g. "drafts", "minutes", "contributions" etc. (TBD). Then the > TC chair could define additional branches as required. We'd want to keep > the hierarchy as flat as possible to keep the URLs short, and we'd want > some consistency, but I want to give the TCs some control over there > space. > > o All documents are permanently archived (only Admin has delete > rights) > > In CVS terms, you can delete a document, but you can always recover > it. In a development tree, it's not uncommon to reorganize some code > or a document and want to remove modules from the current "head" of > the development tree. This goes back to my comment before that the > requirements for publication and development are somewhat different. > > Maybe this is where the "sandbox" (above) comes in. I don't see the need > of permanently archiving early drafts, but once a doc is checked into > the permanent repository it should be permanent. > > o All documents are publicly viewable, downloadable > > o Repository has a web interface for uploading and tree browsing, > searching, and retrieval > > + Support for all major browsers > > + Listing of single files includes filename, title, description, > date, creator, and language; listing of packages includes the > list of single files in the package > > + Search by filename, title, date, creator, and language; and > full-text search of description and contents. > > Does it have other interfaces? Are you describing a front-end for CVS > here, or something else? Does it support Web-DAV? > > I would expect that most people would want to use a web interface, but I > suppose that power users may want to deal more directly with the engine. > But there's also certain safeguards (permissions, restrictions on > naming, etc.) that may require that we use an interface. I don't know > yet; this may depend on the engine. > > What are the benefits of Web-DAV? (I'm not an expert on this.) > > I think it would make sense to address searching as its own top-level > item. In particular, the description above suggests that every item > will have a set of metadata that can be searched. Where/when is this > metadata created? Can I add my own? Is it expressed in an open format, > an XML vocabulary or RDF or a topic map, or is it proprietary? How > does this metadata evolve as documents change in CVS? > > I see the metadata as comprised of the fields listed above. TBD. I don't > know yet how this would be expressed because we havne't selected an > engine yet. > > How does this matter? Yes, we should use XML on principle, but I don't > see it as a requirement. > > As for searching the content, that's clearly going to depend on the > type of content. What types will the system support? > > Obviously not all content will be searchable. If somebody uploads a blob > there's not much we'll be able to do with it besides just store it. > > We will store whatever types of files the TCs need to store. > > Persistent URLs > > o At file creation the document is assigned a URL according to the > OASIS file naming scheme. The URL will always resolve to the latest > version of the document, regardless of the documents (versioned) > filename; a URL will identify a specification throughout its entire > lifetime from working draft to OASIS Standard. Previous versions of > the document will be accessible via a variant of the URL containing > the version number. > > This is fine for storing standards but it's in conflict with the use > of CVS and the reference above to a "definable hierarchy". > > Again, I'm not an expert on what you can and can't do with CVS. > Suggestions welcome. > > I think this should apply to published standards and work products, > but I don't think it can practically be applied to a development > space. > > If we have a "sandbox" phase then we wouldn't expect a persistent URL > for those items. Only once a doc is checked into the permanent > repository would we do this. > > This suggests that the interface to the published standards space > might require more constraints. I hope that these constraints can be > imposed without requiring me to interact with the system only through > a web interface. > > As above, power users like yourself may wish to talk directly to the > engine, but there will be some constraints for security and consistency. > If it is practical to enforce those constraints via both a web interface > as well as a native interface then we will. But if it's not practical > then we'll have to do everything through a browser. > > Multiple file types supported > > o TCs will store both source (e.g. MSWord or HTML) and compiled (e.g. > PDF) versions of each file; i.e. the repository should not allow a > PDF to be checked in without a matching .doc or .html file > > Uhm, what about documents that have a source which is neither a > proprietary tool or HTML? > > The above is not an exhaustive list. I'm just suggesting that both > source and compiled versions should be in the repository. Any > responsible developer should agree with this philosophy. > > Imposing the requirement that the system check for classes of > dependencies between files of different types is going to be tricky, > especially as the specs evolve. Suppose I rebuild the PDF, can I check > it in without checking in a new source document? What if I only > corrected a formatting bug? If I check in a new source, what happens > to the PDF? > > Yeah, we'll have to figure this out. How do you do it when you write > code? > > I think a lot more detail is required in this part of the > requirements. > > That's why I'm asking for input. > > o HTML files may include graphics which will be stored with the file > (use relative URLs?) > > What about other cross-document links? What about XML files that refer > to both HTML and PDF presentations? What about document trees that > consist of multiple chapters in a hierarchy with a common set of > figures? > > More detail, please. > > More input, please. > > o use MIME types > > Packages > > o A specification may be composed of multiple documents. The entire > package may be uploaded or downloaded in a single operation. > Individual documents in the package may also be uploaded or > downloaded. > > I don't understand what you mean here. Are you suggesting that I might > upload a package (as a ZIP file? as a MIME multi-part related stream?) > and then several days later upload a new version of one component in > that package. Having done so, what "version" does the package have? > Can I still download the original? Can I download the revised version? > > Probably the package will just be an HTML file with links to all of the > components. In that case the package is updated by editing the links in > the package file. Each of the components are maintained by editing them > individually. Each component, as well as the package file, could have > its own version number or date, but the entire set would collectively > have to be versioned. Would this work? > > o Support for chapters or parts of a multi-part document (with links > between parts); a package could have a ToC with links to the > individual files > > I think any attempt to describe the size and shape of a package ("it > will have > a ToC and chapters" or "it will have a starting page and parts") will > be > problematic. Best just to accept that a multi-part document is a > directed > graph (a web). > > Would my description (above) of a package work for this? The TC can > decide how it wants to structure the multi-part spec. > > o Support for modular DTDs (e.g. DocBook) > > What does this requirement mean? Do you also mean modular W3C XML > Schemas and RELAX NG grammars? Does this requirement differ from the > preceding one in a particular way? > > Pretty much the same, I think, but I'd be happy to hear other > requirements not met by the above. > > o The entire package is addressable via a single URL, as are the > individual documents. The package URL will link to an HTML page > listing the package contents. > > Is that an HTML page constructed by the author of the package, or > automatically from the content of the package? If it's the latter, > what constraints, if any, does that impose on the contents of the > package? > > Security > > o Check-in/out based on Kavi user authentication; different > permissions for public, TC members, chair/secretary, etc. > > o TC members have ??? rights (TBD) > > o TC Chair and Secretary have create, edit rights for folders and > checkin/out rights for documents in their respective TC area > > o Admin has admin rights (create, checkin/out, delete of all folders > and files) > > o Public has read rights for all documents > > How does "admin" differ from chair/secretary? > > "Admin" is the OASIS staff administrator of the dc mgmt system. > > Kavi integration > > o Kavi user acct/pswd used for authentication in doc mgmt system > > o Notification to the Kavi group when a document is uploaded (same as > current Kavi notification) > > o The current Kavi doc repository is disabled; links within Kavi will > go to this doc mgmt system instead (i.e. Kavi doc repository is > hidden, this one drops in to replace it). > > o Docs currently in the Kavi repository will continue to be > addressable and viewable by their Kavi URL (allow for migration > over > time) > > This requirement and the previous requirement seem to be in conflict. > Can you explain how "the links within Kavi will go to this doc mgmt > system instead" supports the goal that "the Kavi repository will > continue to be addressable and viewable by their Kavi URL (allow for > migration over time)"? > > Right now when you're in Kavi you can click on a link for "doc > repository" and it will take you to that page in Kavi. I'd like it to go > to the new doc mgmt system instead. But we should allow current docs in > the Kavi repository to stay where they're at until the TC wants to move > them, so these docs need to remain addressable by the current URLs. > We'll have to keep the Kavi search/browse accessible, but the default > would go to the new doc mgmt system. > > o When new Kavi group (TC/SC) is created, a doc mgmt area for that > group and default folders are automatically created > > This goes back to the question of defaults before. What hierarchy do > you have in mind, and what are your motivations for creating it? I > think it'll be easier in the long run to simply create an empty > hierarchy and let the TCs populate it. > > If you have in mind that minutes should go in /minutes and press > clippings should go in /press, etc., then I think a detailed > description of the default hierarchy is required. > > See above. Still TBD, but we need both consistency as well as > flexibility. > > File naming (automation of this done in a later phase; just do this > manually at first?) > > o Naming and versioning of documents follows OASIS file naming scheme > > o When a new document is created it will be named according to the > scheme; automated helps to create/assign a name > > This seems to duplicate the requirements expressed under "Persistent > URLs". Is it intended to be different? I believe my comments there > apply here as well. > > Th eintent is to provide (eventually, maybe a bit later) a GUI to help > name new files conformant with the OASIS doc naming scheme. I envision > pull-downs to select each of the components of the name. But this will > probably be later; the file creator would have to manually name the file > for now. > > Localizable interface, with localization to occur in a later phase > > Later phase: Count/traffic report of downloads (how many people have > downloaded a particular doc?) > > Other later phase items? > > - Issue tracking? > > Sounds like a separate tool. Yes, we need this. Suggestions? > > - automatic generation of PDF/HTML from source formats? > > Yeah, we could add this, but is there a need? Can't people do this > already? > > - validation? > > Ditto. Can't you do this already? > > But, yes, I see the utility of having validation on checkin, and > publishing, as part of a doc mgmt system. > > - interactive forms (e.g., the ability to support an interface that > asks a number of questions and then builds an appropriate schema > customization layer)? > > That's the sort of interface I had in mind for the file naming (above). > But I see this as a separate tool for later. > > - Syndication of announcements > - An informal "journal" space (or blog, if you will) for TC members > to outline their thoughts and ideas? > > Both of those are separate tools. Not sure how those would be part of a > doc mgmt system. > > Thanks for the feedback. Much appreciated. > > -Karl > > > > > > > Be seeing you, > norm > > P.S. I'm happy to report that your requirements document can be nicely > presented in an open format (plain text, in this case) instead of a > proprietary format. I hope that its greater accessibility in this > format (and the fact that it's six times smaller) can be used to > demonstrate once again the value of open standards. > > (For even more thoughts on this topic, see > http://www.gnu.org/philosophy/no-word-attachments.html) > > > -- > ================================================================= > Karl F. Best > Vice President, OASIS > office +1 978.667.5115 x206 mobile +1 978.761.1648 > karl.best@oasis-open.org http://www.oasis-open.org > > > ___________________________ > Matthew MacKenzie > Senior Architect > Intelligent Documents Business Unit > Adobe Systems Canada Inc. > http://www.adobe.com/ > 506 869.0949
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]