sdo message

Subject: Re: [sdo] ChangeSummary and OrphanHolder properties
From: Radu Preotiuc <radu.preotiuc-pietro@oracle.com>
To: "Barack, Ron" <ron.barack@sap.com>
Date: Mon, 18 May 2009 18:06:34 -0400
Hi Ron, thanks for the write-up. A couple of initial observations:

One thing about orphans is that because they are not (by definition)
contained anywhere, they can be reference from two different containment
trees, each with its own ChangeSummary. Is the intention of the proposal
that the change be tracked in both places? That would seem necessary, in
case the orphan is "removed" from one of the trees.

The second observation is in regard to calling endLogging(). Do you
propose that methods on the ChangeSummary interface like
ChangeSummary.isModified/Created/Deleted and
ChangeSummary.getChangedDataObjects can only be called after
endLogging()? That seems rather limiting.

Radu

On Tue, 2009-05-12 at 14:12 +0200, Barack, Ron wrote:
> Hi Everyone, 
> 
> I think it is slowly time to move the discussion of the ChangeSummary
> from the DAS group to this TC.  I see two major questions here:  how
> do orphanHolder properties effect the change summary, and how does
> projection effect change summary.  Here is one approach to the
> orphanHolder question.
> 
> 
> MOTIVATION AND BACKGROUND: 
> 
> Containmainment is a central concept in SDO, corresponding to UML
> aggregation.  However, in practice, containment is often used not to
> define the characteristics of the business model, but rather to
> control SDO functionalities such as XML serialization and
> ChangeSummary.  In order to use these capabilities, a defined
> containment structure be imposed on the data.  Depending on the data
> source, this could be very unnatural and arbitrary.  At least from our
> perspective, almost all data really comes from relational databases,
> and in this case, containment is very unnatural indeed.  Even in cases
> where the data is structured, this is often only an (unwanted)
> by-product of the need to go over a WebService wire, and does not
> reflect the nature of the underlying data model.  
> 
> SDO 3 provides two methods of serializing non-closed data graphs to
> XML.  First, the transitive closure may be included by packing the
> data graph in an envolope object that has "orphanHolder" properties.
> During XML serialization, orphanHolders collect any referenced objects
> that are not otherwise contained in the XML document.   This results
> in the transitive closure being included in the XML document…which may
> be what the user wants, but is potentially very much a performance
> killer.  The other approach is impose (or remove) a containment
> structure on-the-fly, using the "project" method.
> 
> Providing ChangeSummary is one SDO's main talking points, but in SDO
> 2, ChangeSummaries are useful only if the data graph  is
> hierarchically structured, because the set of data object's tracked by
> the change summary (it's "scope") is defined using containment.
> Since we've decided that SDO 3 will solve the serialization of
> non-closed data graphs and remove the restrictions that "closed" is
> the normative state of data graphs, it makes sense to similarly loosen
> the restrictions on containment wrt ChangeSummary, that is, to find a
> definition of ChangeSummary that is meaningful when the graph does not
> have a containment structure.
> 
> REQUIREMENTS 
> 
> The solution SHOULD provide a meaning definition of the scope of the
> change summary, that can be applied to models where no containment
> structure has been defined.
> 
> The solution MUST be backwards compatible, in regard to both
> functional and non-functional requirements.  By non-functional
> requirements, I mean the performance hit implied by the solution
> should be minimal, at least in cases where only SDO 2 features are
> used.
> 
> 
> APPROACH 
> 
> The basic approach is to use the new SDO 3 structures, projection and
> orphanHolders, as a basis of a solution.  This has the advantage of
> not unneccessarily further complicating our model, and also helps us
> achive backwards compatibility:  since the new behavior is defined
> only in regard to these SDO 3 constructs, applications that use only
> SDO 2 constructs will continue to behave as before.
> 
> The change summary is actually defined as a delta between the
> before-state and after-state of the data graph.  As I formulate the
> approach, I will stick to this definition, and come back later to a
> discussion of more practical implementations.  The spec talks about
> the "scope" of a change summary, I think this is confusing
> terminology, because it sounds as if "scope" is something that exists
> outside of operations (eg, "beginLogging") on change summary.  We only
> have to calculate what is in-scope of a ChangeSummary in order to
> determine the "before" image and the "after" image.
> 
> In SDO 2.1, the "before-image" of the change summary is the
> containment tree at the point of time that beginLogging() is called.
> Our proposal is that this be extended in SDO 3.0 to include the
> contents of orphanHolder properties in the containment tree.  If any
> orphanHolder properties are found any DataObjects that are referenced
> but not contained by the containment tree are also part of the
> "before-image" of the change summary.  If there are no orphanHolder
> properties, the behavior should be identical to SDO 2.1.
> 
> In cases where orphanHolder properties are present, then it is clear
> that the beginLogging operation can be an expensive operation.
> However, I believe this functionally can be implemented such that, for
> the 2.1 cases, where no orphanHolder properties are present, and also
> in cases where the graph itself is closed, performance of SDO 3 should
> be comperable with that of SDO 2.  This is of course something that I
> would need a prototype to verify.
> 
> The expense involved in calculating the set of DataObjects to be
> included in the "after" image is a bit of a problem, because SDO 2.1
> is pretty loose about ChangeSummary lifecycle.  In particular, at
> least as I read the spec, the user is not really required to call
> "endLogging".  Clearly, if the user does call "endLogging" we have a
> concrete point at which to calculate the scope, and, in particular to
> calculate the list to be returned by
> ChangeSummary.getChangedDataObjects(), and the set of objects to for
> which isModified, isCreated and isDeleted will return "true".   If the
> user is not required to call endLogging(), then each of the
> ChangeSummary methods becomes potentially very expensive, which I
> think is bad design.  I'm going to raise an issue in the SDO TC, to
> discuss how implementations interpret the endLogging() call.  I think
> it's actually reasonable to required it, and define it as the time at
> which ChangeSummary is calculated.  If this is considered a breaking
> change, then we can always say that the list of orphan nodes is only
> calculated when endLogging is called.
> 
> GETTING THE CHANGE SUMMARY 
> 
> In SDO 2.1, the DataObject.getChangeSummary() method can simply walk
> up the containment tree looking for a DataObject with a ChangeSummary
> property.  Under the approach I'm outlining here, this won't be
> possible for objects that are included via orphanHolder properties.
> There are two possible approaches here:  First, when CS.beginLogging
> is called, an implementation could find all the orphans and call some
> (internal) "setChangeSummary()" method.  This has a major drawback in
> that it will increase the memory footprint of the objects.  I would
> actually prefer to say that getChangeSummary should be unchanged from
> 2.1, meaning that orphan objects may return "null".  I think this is
> not a significant limitation, since DAS's will typically know where
> the ChangeSummary is (namely, on the DataGraph envelope), and use
> "getChangedObjects" to find the changes to process.  In fact, I wonder
> if we should consider deprecating getChangeSummary, since the change
> summary should be found through calling a normal getter.
> 
> XML 
> 
> As I described above, I think the approach requires a slightly better
> defined ChangeSummary lifecycle, namely, it requires something like
> "endLogging" that tells the implementation when to walk the tree and
> calculate the nodes that are in the "after" image, used to calculate
> the ChangeSummary.  I've defined everything so far in terms of the API
> only, that is, in-memory use cases.  I think that when XMLHelper.save
> is called, the after-image should be updated, and the created,
> modified and deleted lists updated.  When an XML document that
> contains a CS is loaded, these lists are current, and all
> changeSummary methods should reflect the state of the change summary
> as read from the XML.  It's as if the user has just called
> "endLogging".
> 
> IMPLEMENTATION IDEAS 
> 
> Although the "snapshot" mode is useful for defining the behavior of
> ChangeSummary, I imagine that most implementations do not "make
> images" of the before state, but rather, when a setter is called, do
> some sort of calculation of whether the node is "in scope" of a change
> summary, and, if it is, somehow remember the old value.  As with
> getChangeSummary(), we have a problem here when orphans are included
> in the scope.   For such implementations, it will be necessary to
> traverse the scope of the CS, including orphans, and set a bit
> indicating that the object is "in scope" of a change summary.  Even if
> this requires storing an additional boolean object, this would in all
> likelihood not increase the memory footprint of the data graph, at
> least not in Java, since objects are aligned on word boundries.  And,
> of course, it is possible to do better, combinding several such flags
> into a single byte.  So I think the costs here are very much
> acceptable.  In fact, there's also an upside to the approach:  going
> up the containment tree to find out if an object is in-scope will
> necessarily be slower than checking a bit.
> 
> CONCLUSION AND FURTHER WORK 
> 
> Again, the ideas here are intended to represent only a potential
> approach, prototyping the solution will definitely be necessary.
> However, I think the ideas are appealing, because they address the
> issue without breaking backwards compatibility.  If these ideas find
> acceptance, I would like next week to issue a similar approach that
> uses projection.
> 
> Comments welcome! 
> 
> Best Regards, 
> Ron 
>
Follow-Ups:
- AW: [sdo] ChangeSummary and OrphanHolder properties
  - From: "Barack, Ron" <ron.barack@sap.com>
References:
- ChangeSummary and OrphanHolder properties
  - From: "Barack, Ron" <ron.barack@sap.com>