RE: [wsbpel] Issue 190 - BPEL Internal Faults (New Proposed Issue Announ

The only substantive difference between what Dieter originally suggested and my subsequent embellishment is that I suggested we allow “exit” (formerly terminate) to carry fault-like data and then treat all current “internal” faults (except joinFailure) as exits (as Dieter suggested) but with such data. This allows engines dealing with such exited aka frozen process instances to privately support notions such as “convert an exit to a fault and proceed” or “repair process and continue”.

From: Alex Yiu [mailto:alex.yiu@oracle.com]
Sent: Monday, February 14, 2005 12:58 PM
To: Alex Yiu
Cc: Satish Thatte; ygoland@bea.com; Francisco Curbera; Prasad Yendluri; Danny van der Rijn; wsbpel@lists.oasis-open.org; alex.yiu@oracle.com
Subject: Re: [wsbpel] Issue 190 - BPEL Internal Faults (New Proposed Issue Announcement)

Hi

Just want to make it clearer for our standardization process:

Since the current tone projected by Satish and Paco are quite different from the one first described by Dieter, I think we need to a restated issue description in writing first for Issue 190. Then, we can decide and vote whether it is a feature or a bug to the spec.

After/if the issue is opened, we need to have a new proposal with exact wordings to vote on.

Thanks.

Regards,
Alex Yiu

Alex Yiu wrote:

Great.
Then, it seems to me that we are converging.
What we need now is a new proposal with exact wordings and clear description of the new semantics.

Thanks.

Regards,
Alex Yiu

Satish Thatte wrote:

If I understand your first question correctly, that was my notion of the convert-terminate-to-fault-and-continue behavior.  And then yes, the failure could be capped to a scope, since the "modeling" fault at that point will be treated like any other ordinary fault.

________________________________

From: Alex Yiu [mailto:alex.yiu@oracle.com]

Sent: Fri 2/11/2005 11:49 AM

To: Satish Thatte

Cc: ygoland@bea.com; Francisco Curbera; Prasad Yendluri; Danny van der Rijn; wsbpel@lists.oasis-open.org; alex.yiu@oracle.com

Subject: Re: [wsbpel] Issue 190 - BPEL Internal Faults (New Proposed Issue Announcement)

Hi, Satish,

I guess I undestand you point ...

More questions: Say:

If we don't call it "fault", after the process is "freezed" ...

and after user-inspection, he/she consider the fault should not affect

the compensation logic,  can the user select the action to activate a

fault handler of a related scope which does the compensation logic and

marked the scope faulted and continue rest of the process?

It is important to cap the system failure to one of child scopes, not

the whole process, for fault-tolerant design [ Oh my ..... the term

"fault" comes again ... do we really want to avoid that term? ]

Thinking out loud again: maybe we should still call them as fault and

have a clear explanation on how system failure will be handled

differently from an application fault?

Regards,

Alex Yiu

Satish Thatte wrote:

Alex,

I agree with what you say except I would rather not call it "fault" because a normal fault does not cause a process to freeze.  Our terminate semantics is as close to a freeze as possible already.  But if we want to rename terminate as something else (actually didn't we rename it exit already?) that captures the intent better I have no issues with that.

As for how the intention is expressed, that will clearly have to be platform specific.  We don't have any official notion of deployment descriptor, but it would have to be some sort of extension or external configuration parameter, which I think is what you intended to say.

Satish

________________________________

From: Alex Yiu [mailto:alex.yiu@oracle.com]

Sent: Thu 2/10/2005 8:45 PM

To: Satish Thatte

Cc: ygoland@bea.com; Francisco Curbera; Prasad Yendluri; Danny van der Rijn; wsbpel@lists.oasis-open.org; alex.yiu@oracle.com

Subject: Re: [wsbpel] Issue 190 - BPEL Internal Faults (New Proposed Issue Announcement)

Hi, Satish,

If I read Satish's comments correctly, then I would say it is more fair to say:

The semantics on how to handle a BPEL fault no longer is "exit"/"quit"/"terminate".

The process basically "freezes" / "suspend" before any further code execution. Then, it is up to the BPEL implementation / BPEL site admin / BPEL developer to decide what to do with this "freezed" or "suspended" process.

And, I may add more question: May their decision be just the plain old default "compensate and rethrow" semantics in BPEL 1.1? Can their decision be expressed by a deployment descriptor? or extension attribute in BPEL?

Regards,

Alex Yiu

Satish Thatte wrote:

      There are two points at issue here.

      1.  Are undefined-runtime-semantics "faults" really faults in the sense

      that one would write specific catch handlers for things like

      conflictingReceive, or correlationViolation in the same way as one would

      write catch handlers for approvalDenied?

      2.  Admitting that undefined-runtime-semantics "faults" will occur since

      we do not mandate pessimistic static analysis to prevent them, what

      exactly is a reasonable way to deal with these "faults"?

      I would hope that we have no disagreement that specific handlers for

      correlationViolation and such would be extremely rare.  CatchAll is the

      way these "faults" would be intercepted if at all.  And in that context

      there is very little one can do except suppress the fault, i.e., limit

      its impact, and possibly notify someone that it happened.  I have not

      seen anyone argue otherwise.

      The primay disagreement seems to be about the second question, and

      especially about the tradeoff between the approaches of

      A.  Explicitly define impact boundaries ("modularity" entered the

      discussion as an example for such boundaries) even for

      undefined-runtime-semantics "faults" and within those boundaries apply

      the usual unravel and compensate logic that gets applied by default.

      B.  There is no reasonable way to define the impact boundaries in most

      cases and in a lot of important processes the usual unravel and

      compensate logic would create unintended havoc and destroy years of work

      if blindly allowed to proceed by default and oversight.

      By the way, neither approach helps as far as letting a partner know what

      is going on in cases like missingReply.  For that we would have to go

      back to my suggestion of explicitly declaring MEP instances in scopes

      and then defining standard wire-faults in case an MEP instance went out

      of scope without completing.  To be clear, I am *not* suggesting we go

      down that road at this point.

      I don't think we can settle this with arguments based on examples

      because "allowing ordinary compensation to proceed" can be viewed as

      being either desirable or disastrous depending on the scenario you have

      in mind.

      I disagree with Yaron that his setting#1 which corresponds to my

      approach B is possible today without preventing the BPEL engine from

      actually carrying out prescribed runtime semantics.  But I agree with

      him that the two approaches need to be made possible via some

      platform-specific switch, i.e., made compatible with BPEL normative

      semantics.  One way is to extend our notion of "terminate" to include

      optional fault data.  I would then argue that a BPEL engine is free to

      provide a (private) switch that chooses between

      terminate-then-optionally-repair-and-continue behavior as well as

      auto-convert-terminate-to-fault-and-continue behavior.

      Satish

      -----Original Message-----

      From: Yaron Y. Goland [mailto:ygoland@bea.com]

      Sent: Monday, February 07, 2005 12:13 PM

      To: Francisco Curbera

      Cc: Prasad Yendluri; Danny van der Rijn; wsbpel@lists.oasis-open.org

      Subject: Re: [wsbpel] Issue 190 - BPEL Internal Faults (New Proposed

      Issue Announcement)

      I think the core of the problem is another part of our ever increasing

      elephant.

      Lots of systems are going to have a magic switch that I strongly

      encourage us not to attempt to specify in BPEL both because it's at

      least 80% out of scope and because it will take a long time to agree on

      the semantics.

      That switch will specify (either on a process level or perhaps a scope

      level) what to do if certain kinds of faults are thrown. One of the key

      faults this switch will focus on are system faults.

      This switch will typically have at least two settings.

      Setting #1 - If a system fault is thrown immediately freeze the process

      and call the admin for help who can then edit the process to fix things.

      Setting #2 - If a system fault is thrown then send a note to the admin

      but let the fault go through the normal fault handlers.

      Both the first and second settings are possible with the existing spec.

      The first behavior through an out of scope operational override and the

      second behavior is pretty much our default behavior.

      Issue 190 would make the second setting effectively impossible since it

      would be illegal to ever allow system faults to go through normal fault

      handling. But as Alex and others have convincingly argued there are many

      interesting cases in which it makes sense to allow system faults to go

      through normal fault handling.

      In terms of maximizing portability I think we should stick with our

      current behavior and leave the 190 style behavior to out of scope

      extensions.

              Yaron

      Francisco Curbera wrote:

              I guess one of the points of the immediate termination condition is

      that

              termination is essentially always invisible to partners of the

      process. The

              net effect of this change (and from my perspective the actual aim of

      this

              proposal) would be to allow engines the flexibility to deciding how to

      deal

              with these situations, termination being an option. Any form of

      standard

              fault semantics limit that flexibility because the engine would be

      forced

              to follow the usual scope termination/fault propagation behavior with

              likely the result of discarding many recoverable process instances -

and

              posisble days or months of process work.

              Paco

                                    Prasad

              Yendluri

                                    <pyendluri@webmet        To:       Francisco

              Curbera/Watson/IBM@IBMUS

                                    hods.com> <mailto:pyendluri@webmetTo:FranciscoCurbera/Watson/IBM@IBMUShods.com>                 cc:       Danny van der

      Rijn

              <dannyv@tibco.com> <mailto:dannyv@tibco.com> , wsbpel@lists.oasis-open.org

                                                             Subject:  Re: [wsbpel]

      Issue 190

              - BPEL Internal Faults (New Proposed Issue

                                    02/04/2005 02:30

              Announcement)

PM

Hi,

              1. Isn't this the same issue as the one raised by issue 187 where we

      ask if

              there are any constraints in handling of the standard faults? This is

              proposing a specific resolution where it is recommended that the

      process

              always terminates immediately.

              2.  I tend to side with Danny on this. I don't think we should require

      that

              the process terminates immediately always. IMO in at least certain

      cases

              this may not be a fatal situation for the whole process (it could be

              confined to the scope) and other parts of the process may be able to

              continue by compensating for pertinent. Perhaps the impact could

      limited to

              the immediately confining scope and the process could continue,

      perhaps the

              area the fault occurred could be non-fatal to whole process (e.g.

      related

              look-up rather than modification of any information) or caused by some

              transient condition that could go away on a retry etc. I think the

      process

              (fault handler) should be given a chance to handle the situation

      rather

              than terminate always.

              3. If we do end-up going the "terminate" always way, we must minimally

              *not* preclude logging the condition, which could be more intelligent

if

              the faults could be attached some "fault data" (ref issues 187 and

      185).

              Regards, Prasad

              -------- Original Message --------

               Subject Re: [wsbpel] Issue 190 - BPEL Internal Faults (New Proposed

      Issue

                     : Announcement

                 Date: Fri, 4 Feb 2005 13:23:17 -0500

                 From: Francisco Curbera <curbera@us.ibm.com> <mailto:curbera@us.ibm.com>

                   To: Danny van der Rijn <dannyv@tibco.com> <mailto:dannyv@tibco.com>

                   CC: wsbpel@lists.oasis-open.org

              Hi Danny,

              BPEL so far does not support any technique for modularizing process

              authoring, so the situation you describe is a bit out of scope right

      now.

              In any case, my view is that the idea that authors of business process

are

              going to be adding code to deal with things like unsupportedReference

is

              just not realistic. I would even argue that those faults don't

      actually

              belong at the BP modeling level and need to be dealt with in a

      different

              way.

              Dieter's suggestion allows implementations to manage these situations

in

              the best possible way.  This is specially important in the case of

      long

              running processes, where months or years of work can be thrown out the

              window when one of these faults is encountered (the current semantics

              require the complete unwinding of the execution stack if the fault is

not

              caught and a generic catch all is essentially good for nothing).

      Typically

              you want to allow manual intervention to figure out whether the

      process can

              be repaired, terminated if not.

              Paco

               >From: Danny van der Rijn

               >To:       wsbpel@lists.oasis-open.org

               >cc:

               >Subject:  Re: [wsbpel] Issue 190 - BPEL Internal Faults (New

      Proposed

              Issue Announcement

                      02/03/2005 01:47 PM

              [Resending this with appropriate header to save Tony/Peter the

      trouble]

-1

              As I pointed out in our last face to face, this kind of approach will

      make

              any kind of modularization extremely difficult.  It will give no way

      for a

              developer of a piece of BPEL code to protect against the "modelling

      error"

              (legacy term: "programming error") of another modeller whose attempt

to

              model the real world failed in a tangible instance.

              Danny

              Tony Fletcher wrote:

                    This issue has been added to the wsbpel issue list with a status

of

                    "received". The status will be changed to "open" if the TC

      accepts it

                    as identifying a bug in the spec or decides it should be

      accepted

                    specially. Otherwise it will be closed without further

      consideration

                    (but will be marked as "Revisitable")

                    The issues list is posted as a Technical Committee document to

the

                    OASIS WSBPEL TC pages on a regular basis. The current edition,

      as a

                    TC document, is the most recent version of the document entitled

in

                    the "Issues" folder of the WSBPEL TC document list - the next

      posting

                    as a TC document will include this issue. The list editor's

      working

                    copy, which will normally include an issue when it is announced,

is

                    available at this constant URL.

                    Issue 190: BPEL Internal Faults

                    Status: received

                    Date added: 3 Feb 2005

                    Categories: Fault handling

                    Date submitted: 3 February 2005

                    Submitter: Dieter Koenig1

                    Document: WS-BPEL Working Draft, December, 2004

                    Related Issues: Issue 163 : languageExecutionFault, Issue 169 :

                    Transition condition error handling clarification, and Issue 187

                    Legality of Explicitly throwing or rethrowing Standard faults.

                    Description:

                    There are a number of cases in the current spec where the

      behavior of

                    a process is described as *undefined*, in particular, after

                    recognizing internal errors described as standard faults.

                    With the exception of "bpel:joinFailure", *all* of these

      situations

                    represent modelling errors that cannot be dealt with by the

      business

                    process itself in a meaningful way. This behavior becomes even

      more

                    questionable for catchAll handlers that try to deal with

      multiple

                    application faults and unexpectedly encounter a standard fault.

                    Submitter's proposal: Instead of allowing processes to catch

      these as

                    standard faults, we propose that the process instance must

                    *terminate* immediately when such a situation is encountered.

                    The behavior of terminate is well-defined in BPEL -- as far as

      BPEL

                    is concerned the instance execution ends when terminate is

                    encountered without any fault handling behavior. Any additional

                    facilities for extended support for, e.g., repair and continue,

is

                    definitely out of scope.

                    This approach would also create a clear direction for dealing

      with

                    any pathological situation within an inlined language (Issue

      163) and

                    therefore also for errors within transition conditions (Issue

      169).

                    Changes: 3 Feb 2005 - new issue

                    Best Regards,

                    Tony

              To unsubscribe from this mailing list (and be removed from the roster

      of the

              OASIS TC), go to

      http://www.oasis-open.org/apps/org/workgroup/wsbpel/members/leave_workgr

      oup.php.

      To unsubscribe from this mailing list (and be removed from the roster of

      the OASIS TC), go to

      http://www.oasis-open.org/apps/org/workgroup/wsbpel/members/leave_workgr

      oup.php.

      To unsubscribe from this mailing list (and be removed from the roster of the OASIS TC), go to http://www.oasis-open.org/apps/org/workgroup/wsbpel/members/leave_workgroup.php.

To unsubscribe from this mailing list (and be removed from the roster of the OASIS TC), go to http://www.oasis-open.org/apps/org/workgroup/wsbpel/members/leave_workgroup.php.

wsbpel message