Subject: RE: [wsbpel] Issue - 226 - Clarification of lifecycle of compensation handler and its fault handling
I wanted to comment on a couple of aspects of this issue: forgive me if I am repeating some of your thoughts – I am going to explore a train of argument to see if I understand some of what you are raising, and to outline some ramifications.
There is a reference here to the idea of a “scope snapshot instance”.
The term “scope snapshot” seems to occur only in the section of the working draft spec that refers to isolated scopes (what used to be serializable scopes if I’m not mistaken).
I believe that this notion of a “scope snapshot” is ill-defined.
If an isolated scope runs to completion then the changes made during its execution flow on ancestor scope variables are revealed to the world (i.e. to sibling and cousin scopes) only at the end of the flow (after the terminal activity of the scope). Descendant scopes see the changes as they occur. A non-isolated scope reveals its changes as it goes along: it does nothing to block access to process variables by sibling and cousin scopes.
The implication of this is (I believe) that any non-isolated scope will stall on any attempt to read or modify a process variable being modified by an isolated scope (absent sophisticated non-contention detection; assuming serializable = locking). Or (and the example in the spec leaves this unclear): can a non-isolated scope see changes, therefore permitting dirty reading?
To the outer world (all blocked scopes, isolated or otherwise) the set of variables being modified has frozen, and may be released to view on completion of the isolated scope.
In the inner world of the scope and its descendants, the set member’s values are moving, and are always viewable.
We assume that scope completion is tantamount to an atomic transaction commit (that’s what makes its effects visible, in one common implementation scheme). Therefore a compensation handler is potentially a new atomic transaction. (If this option were desired, I presume this can be achieved by enclosing the handler’s activities in an isolated scope, meaning that the use of atomic units of work for normal scopes, and scopes-inside-handlers is a design choice by the user.)
On fault handlers: there seems to be no obvious way of saying (in effect): “rollback”. The fault handler of an isolated scope is stated to be within the isolation domain of the original scope, implying visibility of partial effects to it. It seems absurd to ask the designer to reverse all the known variable changes by hand when the underlying transactional resource is tooled up for rollback.
This lack seems ungenerous to implementers: the obvious way of achieving all this isolation behaviour is a combination of database transactions and atomic transaction managers (aka existing technology): but the spec seems to demand isolation without its usual transactional concomitants. (I agree that we might wish to have very long-lived behaviour, therefore implying other implementation schemes which allow recoverable resource-owned transactions with checkpoints to achieve connection-independent persistent isolation. But if we had such a thing, we would only be improving the QoS of the overall isolated group of related operations, which would still look awfully like a unit of work, and be subject to reversion.) So perhaps we should examine further the conceptualization of an isolated scope as the scope of an underlying atomic transaction. This might imply in a particular implementation, that failure of the process will cause rollback to the start of the isolated scope (which is indeed how existing atomic transaction support appears to work in several BPM managers), for example.
How can I get back to the original state, in the event of a failed attempt to complete the scope?
Is a scope snapshot the original state, or the final state? I find the definition unclear because absent.
I believe we should allow isolated scopes to “revert” before they are completed, i.e. allow an explicit rollback in their fault handlers, or indeed in their normal activity flow. It is also interesting to consider whether such a revert verb would be useable in termination handlers of isolated scopes. (Presumably the meaning of revert is: return all modified PVs to their original state, which would be dangerous to seek generally without isolation.)
I cannot see any useful special meaning for the term scope snapshot in non-isolated scopes, and I cannot see a good use for the term unless the ability to revert to original state is present in isolated scopes.
Incidentally, I believe that the idea of keeping a frozen view of the whole world as it was at the point of completion of a scope was long ago abandoned: the compensation handler of scope A must be able to handle foo’’’ not foo’ (assuming that A creates foo’, B creates foo’’ and C creates foo’’’).
Issue - 226 - Clarification of lifecycle of compensation handler and its fault handling
· When a fault happens with a compensationHandler, what should happen?
o Of course, one can put another scope with a faultHandler within a compensationHandler to catch the fault to either fail the compensation silently or to enable certain retry logic.
· What if a fault has really propagated to the boundary of compensationHandler?
o The fault will stop propagating further? (simiar to the treatment of terminationHandler)
The fault will propagate to the caller of the CH?
· In this situation, can we also call the state of that scope "faulted"? or another term "compensation-faulted'? (It is unlikely that the state of that sope remains "completed")
o Do we uninstall the faulted compensationHandler instance and discard the scope snapshot instance? Or ... ?
o Do we keep the faulted compensationHandler instance? and reuse the scope snapshot instance (without modification from logic in CH?) and allow to reentry a faulted compensationHandler from the parent scope?
A (maybe non-normative) diagram would be nice to help
spec reader to understand the lifecycle of a scope and its related handlers.
I tend to think we should resolve all these 3 issues
together with one single proposal.
To comment on this issue (including whether it should be accepted), please follow-up to this announcement on the firstname.lastname@example.org list (replying to this message should automatically send your message to that list), or ensure the subject line as you send it starts "Issue - 226 - [anything]" or is a reply to such a message. If you want to formally propose a resolution to an open issue, please start the subject line "Issue - 226 - Proposed resolution", without any Re: or similar.
To add a new issue, see the issues procedures document (but the address for new issue submission is the sender of this announcement).
Choreology Anti virus scan completed