[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [ubl-lcsc] Document instance UUID
On Fri, 7 Mar 2003, Chiusano Joseph wrote: >>It sounds like there are 2 levels here: >> >> (1) Document (as an entity) >> (2) Transmission of document Right, we've identified these 2, and 2 additional areas that look like independent variables of one another and that have implications to the "identity" of document instances: (3) Storage naming (4) Versioning (3) gotta do with naming individual UBL files (or in our trial case, the XIP files) in the storage file system. In XIP, we just conveniently chose FTP as the transport mechanism. The transport (FTP) itself does not have an ID assigned to the transported content (assuming session number != content ID). As a transport issue, all files received have to be distinct from one another without looking at the content. We also conveniently define a simplistic file instance naming using date/time. We also imagine that the transport means could be different in practical use, such as Purchasing manager carrying a diskette of UBL (or XIP) files to the factory. Now, we could end up in a situation having 3 files named differently in storage, but all having same content (if a UBL-processor peers into them): File_20030301_101023_af81.ubl -- (A) File_20030301_101024_39FZ.ubl -- (B) PO_1.ubl -- (C) (A) is sent first and named by the receiving FTP server instantiating the file with date/time and a 4-character random string to distinguish "nearby" files within 1 second of reception. (B) is received a second later (having same content as (A)), and (C) is human-transported via diskette to the receiving server. The UBL-processor cannot perform some kind of auto-syncing and conclude that the 3 files are the same, without having an equality function defined (relates to your (1) above). By looking at out-of-band info like the filenames generated as distinguishing IDs from the transport level, a UBL-processor has to keep the 3 separate as if they have some kind of important business meaning. I'm coming from the point that UBL-processor can do better than that, since distinguishing document instances appears as a need universal enough to be in UBL. (4) has to do with the process of creating UBL (or XIP) documents. While working on XIP, we imagine that documents do not get instantiating like copying a file. A paper purchase order on manufacturing floor needs to get routed via a few persons before it is finalised as approved. Along this work flow process, a UBL-processor (or an equivalent workflow system which has a UBL-processing submodule) will need to pinpoint a particular document and pull it out for continuance or further downstream processing, perhaps linking to a database as well. (2) & (4) points to a need for (1), which is asking for an identification function that when given a document instance content, generates a convenient string, number or something easily manipulatable: docID = DOC_ID(document-instance) (3) points to a need for an equality-test (or comparison) function that takes 2 document instance and says whether they are equal or not: equalP = DOC_EQ(docID_1, docID_2) The ideal case is that we can do equalP = DOC_EQ(DOC_ID(document-instance-1), DOC_ID(document-instance-2)) >>For (1), I believe that a set of "key" information from a document (much >>like a relational database table) should be used to unique an "instance" >>of a document as an entity. Yes, that's to find DOC_ID(document-instance). But what is "key" is the subject of discussion. We want to avoid having vendor-specific definition to what is "key" to avoid interoperability issues later. >>For example (speaking very generically >>here), if we assume that a PO Number is unique through time, and a PO >>can be modified, then the PO Number would (of course) not be sufficient >>to uniquely identify the PO document. Rather, we would need an >>additional field/element (please pick favorite word) that would signify >>"iteration". So the [PO Number + Iteration Number] would be unique over >>time. Yes, but as elaborated in my previous emails, the need for this document-instance ID to be unique across platforms, ERP/MRP/DP systems and their respective versions, across branches (that may run similar PO numbers together), and in general, to avoid arbitrary definition of DOC_ID() function. We honestly dont really know how many variables we have to include in DOC_ID() in order that the document instance will be uniquely comparable to all documents in the world in all industries across all networks, systems and software, country and time. But that work has been done by UUID, so it's easy to just use it. >>For (2), I think the question is: If the same document is transmitted >>more than once (in the example above, the two transmitted documents >>would have the same [PO Number + Iteration Number]), what are the >>ramifications? Are there legal ramifications which would require one of >>the transmissions to be identified as the "binding" transmission? Good question. Subject of great debate with great implications apparently. I'd like to say on the basis that I don't really know what the real answer is, but just to try to find out what the answers might look, we could always discuss about it from all aspects, take inputs from different parties like yourself, and carry out trials (that led to our XIP project). Just restricting to discussing DOC_ID() and DOC_EQ() and ignoring other issues such as security, my take #1 is probably that legal discussions lean more towards security and related non-repudiation issues, which indirectly builds on the robustness of distinguishing document instances from one another. That leads us to think of hashing functions for UBL documents as an ID, which basically is again finding an appropriate number of parameters to feed the hashing function. I think there's already some work going on signing XML documents in general, but that's for the purpose of not changing the document instances, ie. freeze-framing the document as final. As mentioned earlier, the suggestion of UUID doesn't do so much on the security side or the freeze-framing. It's on the other issues (1)-(4) that I think we're looking at. >>I am honestly not sure of the answer to that. But if there are no >>ramifications (except additional transmission time/money), then I wonder >>whether it is indeed an issue or not. Dunno, perhaps it depends on whether at UBL, we want to look at it as an issue or not. Ultimately, the effort of finding DOC_ID() function has to be done; it's just whether it's in UBL or not. As you mentioned, if document instances are saved in ebXML registry, then your DOC_ID() is equated with ebXML registry's definition. But my take #2 is that if all we need (assuming we need) is a DOC_ID() function and that UUID() is universally guaranteed to serve the DOC_ID() purpose, is inexpensive to compute, and provides the uniqueness guarantee across space-time, why look elsewhere? Best Regards, Chin Chee-Kai SoftML
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]