sdo message

Subject: ChangeSummary and OrphanHolder properties

From: "Barack, Ron" <ron.barack@sap.com>
To: <sdo@lists.oasis-open.org>
Date: Tue, 12 May 2009 14:12:08 +0200

Title: ChangeSummary and OrphanHolder properties

Hi Everyone,

I think it is slowly time to move the discussion of the ChangeSummary from the DAS group to this TC. I see two major questions here: how do orphanHolder properties effect the change summary, and how does projection effect change summary. Here is one approach to the orphanHolder question.

MOTIVATION AND BACKGROUND:

Containmainment is a central concept in SDO, corresponding to UML aggregation. However, in practice, containment is often used not to define the characteristics of the business model, but rather to control SDO functionalities such as XML serialization and ChangeSummary. In order to use these capabilities, a defined containment structure be imposed on the data. Depending on the data source, this could be very unnatural and arbitrary. At least from our perspective, almost all data really comes from relational databases, and in this case, containment is very unnatural indeed. Even in cases where the data is structured, this is often only an (unwanted) by-product of the need to go over a WebService wire, and does not reflect the nature of the underlying data model.

SDO 3 provides two methods of serializing non-closed data graphs to XML. First, the transitive closure may be included by packing the data graph in an envolope object that has "orphanHolder" properties. During XML serialization, orphanHolders collect any referenced objects that are not otherwise contained in the XML document. This results in the transitive closure being included in the XML document…which may be what the user wants, but is potentially very much a performance killer. The other approach is impose (or remove) a containment structure on-the-fly, using the "project" method.

Providing ChangeSummary is one SDO's main talking points, but in SDO 2, ChangeSummaries are useful only if the data graph is hierarchically structured, because the set of data object's tracked by the change summary (it's "scope") is defined using containment. Since we've decided that SDO 3 will solve the serialization of non-closed data graphs and remove the restrictions that "closed" is the normative state of data graphs, it makes sense to similarly loosen the restrictions on containment wrt ChangeSummary, that is, to find a definition of ChangeSummary that is meaningful when the graph does not have a containment structure.

REQUIREMENTS

The solution SHOULD provide a meaning definition of the scope of the change summary, that can be applied to models where no containment structure has been defined.

The solution MUST be backwards compatible, in regard to both functional and non-functional requirements. By non-functional requirements, I mean the performance hit implied by the solution should be minimal, at least in cases where only SDO 2 features are used.

APPROACH

The basic approach is to use the new SDO 3 structures, projection and orphanHolders, as a basis of a solution. This has the advantage of not unneccessarily further complicating our model, and also helps us achive backwards compatibility: since the new behavior is defined only in regard to these SDO 3 constructs, applications that use only SDO 2 constructs will continue to behave as before.

The change summary is actually defined as a delta between the before-state and after-state of the data graph. As I formulate the approach, I will stick to this definition, and come back later to a discussion of more practical implementations. The spec talks about the "scope" of a change summary, I think this is confusing terminology, because it sounds as if "scope" is something that exists outside of operations (eg, "beginLogging") on change summary. We only have to calculate what is in-scope of a ChangeSummary in order to determine the "before" image and the "after" image.

In SDO 2.1, the "before-image" of the change summary is the containment tree at the point of time that beginLogging() is called. Our proposal is that this be extended in SDO 3.0 to include the contents of orphanHolder properties in the containment tree. If any orphanHolder properties are found any DataObjects that are referenced but not contained by the containment tree are also part of the "before-image" of the change summary. If there are no orphanHolder properties, the behavior should be identical to SDO 2.1.

In cases where orphanHolder properties are present, then it is clear that the beginLogging operation can be an expensive operation. However, I believe this functionally can be implemented such that, for the 2.1 cases, where no orphanHolder properties are present, and also in cases where the graph itself is closed, performance of SDO 3 should be comperable with that of SDO 2. This is of course something that I would need a prototype to verify.

The expense involved in calculating the set of DataObjects to be included in the "after" image is a bit of a problem, because SDO 2.1 is pretty loose about ChangeSummary lifecycle. In particular, at least as I read the spec, the user is not really required to call "endLogging". Clearly, if the user does call "endLogging" we have a concrete point at which to calculate the scope, and, in particular to calculate the list to be returned by ChangeSummary.getChangedDataObjects(), and the set of objects to for which isModified, isCreated and isDeleted will return "true". If the user is not required to call endLogging(), then each of the ChangeSummary methods becomes potentially very expensive, which I think is bad design. I'm going to raise an issue in the SDO TC, to discuss how implementations interpret the endLogging() call. I think it's actually reasonable to required it, and define it as the time at which ChangeSummary is calculated. If this is considered a breaking change, then we can always say that the list of orphan nodes is only calculated when endLogging is called.

GETTING THE CHANGE SUMMARY

In SDO 2.1, the DataObject.getChangeSummary() method can simply walk up the containment tree looking for a DataObject with a ChangeSummary property. Under the approach I'm outlining here, this won't be possible for objects that are included via orphanHolder properties. There are two possible approaches here: First, when CS.beginLogging is called, an implementation could find all the orphans and call some (internal) "setChangeSummary()" method. This has a major drawback in that it will increase the memory footprint of the objects. I would actually prefer to say that getChangeSummary should be unchanged from 2.1, meaning that orphan objects may return "null". I think this is not a significant limitation, since DAS's will typically know where the ChangeSummary is (namely, on the DataGraph envelope), and use "getChangedObjects" to find the changes to process. In fact, I wonder if we should consider deprecating getChangeSummary, since the change summary should be found through calling a normal getter.

XML

As I described above, I think the approach requires a slightly better defined ChangeSummary lifecycle, namely, it requires something like "endLogging" that tells the implementation when to walk the tree and calculate the nodes that are in the "after" image, used to calculate the ChangeSummary. I've defined everything so far in terms of the API only, that is, in-memory use cases. I think that when XMLHelper.save is called, the after-image should be updated, and the created, modified and deleted lists updated. When an XML document that contains a CS is loaded, these lists are current, and all changeSummary methods should reflect the state of the change summary as read from the XML. It's as if the user has just called "endLogging".

IMPLEMENTATION IDEAS

Although the "snapshot" mode is useful for defining the behavior of ChangeSummary, I imagine that most implementations do not "make images" of the before state, but rather, when a setter is called, do some sort of calculation of whether the node is "in scope" of a change summary, and, if it is, somehow remember the old value. As with getChangeSummary(), we have a problem here when orphans are included in the scope. For such implementations, it will be necessary to traverse the scope of the CS, including orphans, and set a bit indicating that the object is "in scope" of a change summary. Even if this requires storing an additional boolean object, this would in all likelihood not increase the memory footprint of the data graph, at least not in Java, since objects are aligned on word boundries. And, of course, it is possible to do better, combinding several such flags into a single byte. So I think the costs here are very much acceptable. In fact, there's also an upside to the approach: going up the containment tree to find out if an object is in-scope will necessarily be slower than checking a bit.

CONCLUSION AND FURTHER WORK

Again, the ideas here are intended to represent only a potential approach, prototyping the solution will definitely be necessary. However, I think the ideas are appealing, because they address the issue without breaking backwards compatibility. If these ideas find acceptance, I would like next week to issue a similar approach that uses projection.

Comments welcome!

Best Regards,
Ron

Follow-Ups:
- Re: [sdo] ChangeSummary and OrphanHolder properties
  - From: Radu Preotiuc <radu.preotiuc-pietro@oracle.com>