sdo message

Subject: Re: AW: [sdo] ChangeSummary and OrphanHolder properties

From: Blaise Doughan <blaise.doughan@oracle.com>
To: "Barack, Ron" <ron.barack@sap.com>
Date: Tue, 19 May 2009 10:59:27 -0400

Hi Ron,

I interpret your initial description as: "Containment still represents the ChangeSummary scope, and orphan properties can be thought of as containment properties in terms of this scoping". This is consistent with the role of orphan properties wrt the XML representation.

An orphan DataObject could become part of a ChangeSummary as demonstrated in the following example: When an Address DataObject is set on a non-containment property on a Customer DataObject it looks at the Customer DataObject and recursively through its containers until it found a DataObject with a suitable orphan property, and once found it would take the ChangeSummary referenced by the DataObject with the requisite orphan property. Also, an orphan DataObject can and should return the ChangeSummary monitoring its changes from its getChangeSummary method.

It seems unnecessary to call endLogging before the ChangeSummary can be interrogated. If calculations need to be done they could be triggered by the individual ChangeSummary calls.

I agree that DataObjects belong in only one ChangeSummary, I believe this also means that DataObjects are also referenced by only one orphan property.

-Blaise

Barack, Ron wrote:

7C3EF93EEBC6EB4A8B4470853DE86566BEFC62@dewdfe18.wdf.sap.corp" type="cite">

Hi Radu,

Thanks for the comments. 

Regarding endLogging():  my concern is that the consideration of orphanHolder properties would make the calculation of the changeSummary much more expensive.  Certainly it is much harder to calculate ChangeSummary.isModified(orphanObject) than ChangeSummary.isModified(containedObject), since when I want to determine if the contained object is in scope, in the latter I can simply go up the containment treee, but in the former I have to search the entire tree, starting at the root.  Rather than making each call to isModified expensive, it seems to me that the only way to get reasonable performance without increasing memory footprint is for the list of changed objects to be calculated once, so that isModified can simply check if the object is contained in the list.  Obviously, when "beginLogging" is called, we need to traverse the containment tree, and if any orphanHolder properties are present, we need to include any matching orphans into the scope of the change summary.  We a
re going to have to do this again, when calculating the end-state of the change summary.  But change summary is really poorly defined, it doesn't really define an "endState", only a "currentState".  So the question is, when should the "end-state", and therefore, the list of changed objects be calculated?  Since I'd anyway be happy to have some kind of reasonable meaning to associate with endLogging(), I thought, maybe, we could use endLogging for this purpose. Lately, however, I've been thinking about simply using getChangedObjects() for this purpose.  After all, I don't think the normal use of isModified is that the client takes some arbitrary object and asks "is this object modified in reference to this change summary".  I think the normal use is that the user (probably a DAS) first calls getChangedObjects, then iterates over the list, using isModified, isCreated and isDeleted to determine the nature of the changes.  Since getChangedObjects has to traverse the tree anyway, 
it seems a natural place to consider orphanHolders in traversal algorithm.

On the first question:  no, I don't see the need to allow tracking of objects from multiple change summaries.  This would add a lot of complexity, and I don't really see the use case.  If we consider the standard SDO story, with a DAS providing disconnected data to a client who makes updates to the graphs, then I don't see orphan objects as floating between the graphs as some sort of shared objects.  I don't think orphanHolders necessarilly bring us into a world where the association of an object with the DAS call that retrieved it is weaker than in 2.1.  I guess the point of the proposal is that ophaned objects are as much "owned" by the graph as contained objects, it's just that they are owned via references and not via containment properties.  All orphanHolder properties allow us to do is avoid imposing a containment structure just because we want to serialized to XML or have a change summary.  Even in 2.1, obejcts could potentially be in scope of multiple change summarie
s, but we say this is an error.  I don't really see the motivation to relax this restriction.

Best Regards,
Ron 

-----Ursprüngliche Nachricht-----
Von: Radu Preotiuc [mailto:radu.preotiuc-pietro@oracle.com] 
Gesendet: Dienstag, 19. Mai 2009 00:07
An: Barack, Ron
Cc: sdo@lists.oasis-open.org
Betreff: Re: [sdo] ChangeSummary and OrphanHolder properties

Hi Ron, thanks for the write-up. A couple of initial observations:

One thing about orphans is that because they are not (by definition)
contained anywhere, they can be reference from two different containment
trees, each with its own ChangeSummary. Is the intention of the proposal
that the change be tracked in both places? That would seem necessary, in
case the orphan is "removed" from one of the trees.

The second observation is in regard to calling endLogging(). Do you
propose that methods on the ChangeSummary interface like
ChangeSummary.isModified/Created/Deleted and
ChangeSummary.getChangedDataObjects can only be called after
endLogging()? That seems rather limiting.

Radu

On Tue, 2009-05-12 at 14:12 +0200, Barack, Ron wrote:

Hi Everyone, 

I think it is slowly time to move the discussion of the ChangeSummary
from the DAS group to this TC.  I see two major questions here:  how
do orphanHolder properties effect the change summary, and how does
projection effect change summary.  Here is one approach to the
orphanHolder question.


MOTIVATION AND BACKGROUND: 

Containmainment is a central concept in SDO, corresponding to UML
aggregation.  However, in practice, containment is often used not to
define the characteristics of the business model, but rather to
control SDO functionalities such as XML serialization and
ChangeSummary.  In order to use these capabilities, a defined
containment structure be imposed on the data.  Depending on the data
source, this could be very unnatural and arbitrary.  At least from our
perspective, almost all data really comes from relational databases,
and in this case, containment is very unnatural indeed.  Even in cases
where the data is structured, this is often only an (unwanted)
by-product of the need to go over a WebService wire, and does not
reflect the nature of the underlying data model.  

SDO 3 provides two methods of serializing non-closed data graphs to
XML.  First, the transitive closure may be included by packing the
data graph in an envolope object that has "orphanHolder" properties.
During XML serialization, orphanHolders collect any referenced objects
that are not otherwise contained in the XML document.   This results
in the transitive closure being included in the XML document...which may
be what the user wants, but is potentially very much a performance
killer.  The other approach is impose (or remove) a containment
structure on-the-fly, using the "project" method.

Providing ChangeSummary is one SDO's main talking points, but in SDO
2, ChangeSummaries are useful only if the data graph  is
hierarchically structured, because the set of data object's tracked by
the change summary (it's "scope") is defined using containment.
Since we've decided that SDO 3 will solve the serialization of
non-closed data graphs and remove the restrictions that "closed" is
the normative state of data graphs, it makes sense to similarly loosen
the restrictions on containment wrt ChangeSummary, that is, to find a
definition of ChangeSummary that is meaningful when the graph does not
have a containment structure.

REQUIREMENTS 

The solution SHOULD provide a meaning definition of the scope of the
change summary, that can be applied to models where no containment
structure has been defined.

The solution MUST be backwards compatible, in regard to both
functional and non-functional requirements.  By non-functional
requirements, I mean the performance hit implied by the solution
should be minimal, at least in cases where only SDO 2 features are
used.


APPROACH 

The basic approach is to use the new SDO 3 structures, projection and
orphanHolders, as a basis of a solution.  This has the advantage of
not unneccessarily further complicating our model, and also helps us
achive backwards compatibility:  since the new behavior is defined
only in regard to these SDO 3 constructs, applications that use only
SDO 2 constructs will continue to behave as before.

The change summary is actually defined as a delta between the
before-state and after-state of the data graph.  As I formulate the
approach, I will stick to this definition, and come back later to a
discussion of more practical implementations.  The spec talks about
the "scope" of a change summary, I think this is confusing
terminology, because it sounds as if "scope" is something that exists
outside of operations (eg, "beginLogging") on change summary.  We only
have to calculate what is in-scope of a ChangeSummary in order to
determine the "before" image and the "after" image.

In SDO 2.1, the "before-image" of the change summary is the
containment tree at the point of time that beginLogging() is called.
Our proposal is that this be extended in SDO 3.0 to include the
contents of orphanHolder properties in the containment tree.  If any
orphanHolder properties are found any DataObjects that are referenced
but not contained by the containment tree are also part of the
"before-image" of the change summary.  If there are no orphanHolder
properties, the behavior should be identical to SDO 2.1.

In cases where orphanHolder properties are present, then it is clear
that the beginLogging operation can be an expensive operation.
However, I believe this functionally can be implemented such that, for
the 2.1 cases, where no orphanHolder properties are present, and also
in cases where the graph itself is closed, performance of SDO 3 should
be comperable with that of SDO 2.  This is of course something that I
would need a prototype to verify.

The expense involved in calculating the set of DataObjects to be
included in the "after" image is a bit of a problem, because SDO 2.1
is pretty loose about ChangeSummary lifecycle.  In particular, at
least as I read the spec, the user is not really required to call
"endLogging".  Clearly, if the user does call "endLogging" we have a
concrete point at which to calculate the scope, and, in particular to
calculate the list to be returned by
ChangeSummary.getChangedDataObjects(), and the set of objects to for
which isModified, isCreated and isDeleted will return "true".   If the
user is not required to call endLogging(), then each of the
ChangeSummary methods becomes potentially very expensive, which I
think is bad design.  I'm going to raise an issue in the SDO TC, to
discuss how implementations interpret the endLogging() call.  I think
it's actually reasonable to required it, and define it as the time at
which ChangeSummary is calculated.  If this is considered a breaking
change, then we can always say that the list of orphan nodes is only
calculated when endLogging is called.

GETTING THE CHANGE SUMMARY 

In SDO 2.1, the DataObject.getChangeSummary() method can simply walk
up the containment tree looking for a DataObject with a ChangeSummary
property.  Under the approach I'm outlining here, this won't be
possible for objects that are included via orphanHolder properties.
There are two possible approaches here:  First, when CS.beginLogging
is called, an implementation could find all the orphans and call some
(internal) "setChangeSummary()" method.  This has a major drawback in
that it will increase the memory footprint of the objects.  I would
actually prefer to say that getChangeSummary should be unchanged from
2.1, meaning that orphan objects may return "null".  I think this is
not a significant limitation, since DAS's will typically know where
the ChangeSummary is (namely, on the DataGraph envelope), and use
"getChangedObjects" to find the changes to process.  In fact, I wonder
if we should consider deprecating getChangeSummary, since the change
summary should be found through calling a normal getter.

XML 

As I described above, I think the approach requires a slightly better
defined ChangeSummary lifecycle, namely, it requires something like
"endLogging" that tells the implementation when to walk the tree and
calculate the nodes that are in the "after" image, used to calculate
the ChangeSummary.  I've defined everything so far in terms of the API
only, that is, in-memory use cases.  I think that when XMLHelper.save
is called, the after-image should be updated, and the created,
modified and deleted lists updated.  When an XML document that
contains a CS is loaded, these lists are current, and all
changeSummary methods should reflect the state of the change summary
as read from the XML.  It's as if the user has just called
"endLogging".

IMPLEMENTATION IDEAS 

Although the "snapshot" mode is useful for defining the behavior of
ChangeSummary, I imagine that most implementations do not "make
images" of the before state, but rather, when a setter is called, do
some sort of calculation of whether the node is "in scope" of a change
summary, and, if it is, somehow remember the old value.  As with
getChangeSummary(), we have a problem here when orphans are included
in the scope.   For such implementations, it will be necessary to
traverse the scope of the CS, including orphans, and set a bit
indicating that the object is "in scope" of a change summary.  Even if
this requires storing an additional boolean object, this would in all
likelihood not increase the memory footprint of the data graph, at
least not in Java, since objects are aligned on word boundries.  And,
of course, it is possible to do better, combinding several such flags
into a single byte.  So I think the costs here are very much
acceptable.  In fact, there's also an upside to the approach:  going
up the containment tree to find out if an object is in-scope will
necessarily be slower than checking a bit.

CONCLUSION AND FURTHER WORK 

Again, the ideas here are intended to represent only a potential
approach, prototyping the solution will definitely be necessary.
However, I think the ideas are appealing, because they address the
issue without breaking backwards compatibility.  If these ideas find
acceptance, I would like next week to issue a similar approach that
uses projection.

Comments welcome! 

Best Regards, 
Ron



---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

Follow-Ups:
- AW: AW: [sdo] ChangeSummary and OrphanHolder properties
  - From: "Barack, Ron" <ron.barack@sap.com>

References:
- ChangeSummary and OrphanHolder properties
  - From: "Barack, Ron" <ron.barack@sap.com>
- Re: [sdo] ChangeSummary and OrphanHolder properties
  - From: Radu Preotiuc <radu.preotiuc-pietro@oracle.com>
- AW: [sdo] ChangeSummary and OrphanHolder properties
  - From: "Barack, Ron" <ron.barack@sap.com>