Preliminary Minutes WSRM Face To Face Meeting

Thursday May 29

Next F2F proposed for Sept 16, 17, 18 in Boston Area (Mitre host).

May vote on this Friday

Dock moved to approve minutes of last meeting. Seconded by Alan.

No opposition , minutes approved.

Peter Furniss joined the afternoon teleconference.

3 Wednesday WS-Rel Specification Discussions

We did a walk thru the WS-Rel V1 document.

Page: 4

Item: 1

Dock: Crash tolerance is not mentioned in the Abstract. Issue is under what conditions the protocol may work.

The protocol could announce failure to deliver to the sender application.

Identifying Fault cases the protocol.

Guaranteed delivery is not guarantee, since in some cases it announces failure.

Payits, the crash tolerance is to determine how robust the system running the protocol has to be.

Magdonlna: Reliability levels can be indicated in Fault message.

One clarification to the abstract is that these are protocol features.

Optionally of Time to live is at issue.

Mandatory of Timestame is at issue.

Page: 7

Item: 1

Spec issue on how to determine the configuration parameters for time out interval and number of retries that the sender is using.

Mark: Retry interval cannot be relied on by receiver, since the sender might go down.

Time to live parameter would be better to use.

Persistence in receiver also may include how long it holds its Ack messages, for future transmission.

Issue: how are timeout, retry count, and time to live (when not present in header) made known between sender and receiver.

Question on why time to live parameter is optional in header.

Question: why is time stamp mandatory

Some parameters may be included in our headers because we do not want to rely on another protocol to provide them.

Item: 2

1.1.1 bullet 2 - protocol does not state what the lack of an Ack means.

Item: 3

Asynchronous messaging should be changed to oneway MEP at WSRM user level.

Include architectural diagram of levels.

Now need to add Request Response MEP to TC spec.

Item: 4

out of scope - App level synch messaging should now be Request /response MEP. We now have it in scope.

Page: 8

Item: 1

Issue , relation of routing and the from/to fields of protocol headers

Item: 2

Need to be careful about using the term application level. Perhaps the WSRM user is what is meant by application.

The architecture section needs to clarify this.

Item: 3

1.1.1 Need to clarify in conformance clause which feature of protocol are mandatory to implement.

Use either negotiation of features, advertising capabilities out of band, or reliance of faults to indicate lack of implementation

support.

Requirement issue is that each side needs to be able to determine the protocol features supported by the other side.

Is is a spec issue on how to do it. (e.g. use UDDI t-models for advertising each feature, or have an active negotiation mechanism).

We do not have a session for use of negotiation.

Item: 4

1.1.2 - spec issue - meaning of Guarantee needs to be clarified (i.e., it is ok to announce failure of meeting guarantee in a

particular case).

Getting ack you know it got there, however if you do not get ack you are not sure.

This is related to the definition of Crash Tolerance.

Sysbolcs stated we could add a protocol to get state of other side. There was push back that this might be too much for the first spec.

Item: 5

1.3: Sunil asked if we are ready to move to making our stuff usable with soap 1.2 as well as soap 1.1.

This would affect our schema and our fault codes.

Add a requirements issue:

Soap version support. Could make support of soap 1.1 mandatory, but provide option to support soap 1.2 in the spec.

Page: 9

Item: 1

1.3 - Pete Wenzel would volunteer to be our liaison rep to WSS.

Need to make a motion to seek formal liaison with WSS TC at some time.

This will allow each TC to express concerns about incompatibilities between the two specs.

Page: 12

Item: 1

1.5 Terminology - we need to add the new terms from the requirements issue resolutions here.

Page: 13

Item: 1

Acknowledgment Message - What triggers the sending of the ack by the receiving side.

The layer between WSRM user and soap/wsrm layer is implementation dependent. Thus the trigger of the ack is also

implementation dependent.

Sunil: Ack indicates that Message has been received by Soap/WSRM layer and the message is persisted by the SoapWSRM

layer. or consumed by the WSRM user.

Syzbolcs stated it is enough to trigger the Ack when the soap/wsrm layer receives the message. There were several concerns

expressed that this is not enough for all soap/wsrm layer implementations.

Requirement issue: what are the semantics indicated by the sending of an ack by the receiver.

Syzbolcs gave an example of a mobile phone which has no persistence capabilities. It should be able to ack when the phone

software receives the message.

This is related to the question of the assumed crash tolerance of the endpoints particpating in the protocol.

Paolo: ack is related to some level of qos.

The definition of consume is application dependent. For Syzbolcs case the phone manufacture would have their own definition of

consuming the message by the application. In this case, they could define consume as, received by the software above the comm

layer.

Pete suggested "when consumed or made available to the user".

Marc G suggested we could have different levels of acknowledgments.

One position: whatever the contract for making available to the user is between the soap/wsrm layer is, is what is used to trigger

the ack.

Could have the ack message indicate what qos level is being assured by the soap/wsrm layer making the message available to the

user is.

Syzbolcs stated that the ack can be sent whenever the WSRM/soap layer has taken responsibility for delivering to its ultimate user.

Persistence definition is tied into this semantics of and acknowledgement.

These qos levels are not testable.

Page: 15

Item: 1

2.2.3 We need to refine this section, as a spec issue, after we have a definition for persistence.

Page: 17

Item: 1

Figure 4: the final grouping/containment and optionality of these elements into header types needs refinement.

Spec issue.

Orthogonality concerns. Some of these might be in future WS standards, e.g. WSS security.

Timestamps are used elsewhere.

Also the order of the headers might be important for encryption done by WSS security.

Spec issue; header processing order when encryption or signatures are involved.

Page: 19

Item: 1

Which elements we define for WSRM vs ones we use from other specs. This is a spec issue.

Page: 20

Item: 1

3.1.1, 3.1.2 Spec issue; the need of these optional from and to elements to be in the WSRM protocol.

Item: 2

3.1.3 - Spec issue, what is need for service element. Is it just because it is in EBMS?

Page: 21

Item: 1

3.1.4 - Spec issue on overlap of messageID and GroupID/Seq#

another spec issue is the use of MessagID in RFC2822. Alternative could be URI.

Doug B state the reason this exists is that software to generate e unique ids by RFC2822 is very common. This guarantees

uniqueness. URI does not generate uniqueness.

Pete W: why not just require a string with guarantee of uniqueness.

Item: 2

3.1.5 Timestamp - spec issue on why it is required. Need to specify that it should not be changed between retries.

Page: 22

Item: 1

3.2.2 - replyTo - Spec issue - why is it mandatory even if callback ack binding pattern is not in use?

Need to explain use of Reply to in request/response MEP.

Item: 2

3.2.3 - spec issue, why is time to live not mandatory element.

If not there, what are the semantics required for WSRM/soap layer.

Item: 3

3.2.4 - ack requested has an attribute which indicates which ack binding pattern is requested for use by the sender.

If more than two binding patterns we would need to change boolean to an enum.

Spec issue: 3.2.4 Needs semantics to be refined.

4 Requirements Issues

4.1 Rel 19 continued

Sunil suggested we add a third requirement.

Proposal to have requirement for spec having a solution for Polling Ack Binding pattern, for both one-way and request/response MEP.

Pete stated this might be difficult for request/response MEP.

Payrits stated he does not need the polling Ack binding pattern for any MEP.

Payrits asked for a use case to justify this feature.

Sunil gave use case with client without listener. It sends message , then wants to poll for ack.

Marc G gave use case of a remote satellite offices which do not have a dedicated connection, with intermittent dialup access.

Paolo suggested a new inquiry message to synchronize views of state of message exchange might satisfy those use cased.

This is alternative way to convey that message has been acked.

Payrits stated that we kept state synchronization out of scope of protocol.

This could also be used to ensure duplicate rejection.

Alan asked how state synch would save anything. Avoiding resend.

Paulo suggested that a new message from sender to receiver be defined for this purpose as well as additional purposes.

Sunil: once an ack is lost, we have a problem with WS-rel spec as is. There is no other way of the ack being resent, if duplicate elimination is done prior to sending the ack. Spec needs to be clarified on this point. Ack must be sent before the dup is eliminated. Make this a new spec issue.

Payrits, http server not in client can be solved by response ack binding. He stated this use case is for when the ack takes “too long” to wait for http reply.

Tom asked if there were concerns on this as a requirement for a feature of the protocol, which is optional for conformance.

Payrits stated that Delayed ack solution could also satisfy state synch use cases.

Raise new requirements issue: Protocol support for message state inquiry.

Alan stated the inquiry could return, at a minimum the messageID for the last message received correctly. Sunil stated it could give other information as well.

Doug wanted to clarify that some requirements may be met by use of other existing protocols. As long as we go forward, we do not assume that each requirement results in additional wsrm header elements. No oppositiong.

Sunil moved, Marc G seconded.

Add new requirement for spec having a solution for sender to receive delayed Acks, when it is not willing to receive underlying protocol requests, for one-way MEP.

No opposition, requirement added. Closes rel 19.

4.2 Rel 43 – Protocol support for message state inquiry

Proposed requirement:

Spec must support ability of sender to inquire the receiver about the status of its sent message(s) as to whether they been received.

Sunil asked what the status values are.

Doug – what are the semantics on responding to this inquiry regarding the semantics of acknowledgement.

Consensus: This is a performance optimization to avoid resending a large message to determine the status of that message.

Pete moved for new requirement, Venket seconded:

Spec must support ability of sender to ask the receiver if one or more of its sent messages have been received.

Discussion of global uniqueness of message ID being violated. How will protocol receiver react when uniqueness is violated.

Discussion of Lifetime of message ID.

Jeff gave new spec issue: duplicate elimination is directly related to time to live of message. The semantics of this need to be clarified in the spec. What happens if sender resends message which has expired?

Scott y, Esisaku y, Venkat y, Jeff y, Pete y, Paolo y, Doug n, Mark n, sunil n, Iwasa n, Payrits y, Dock y, Mark y, Tom a, Alan y

10 yes

4 no

1 abs

Leave vote open to the phone call to reach quorum 16.

Marc G stated we will fine uses for this requirement, once we refine our semantics for Acknowledgement.

Tom Rutt solicited email discussions on spec mechanisms to meet both of the requirements.

4.3 Movement of issues to Spec Issues

4.3.1 Rel 20 Negotiation

Drop this requirement issue as superceded by new requirement Rel 30.

The examples seem misleading.

4.3.2 Rel 21 from/to

Agreed to categorize as a spec issue

4.3.3 Rel 22 Optionality

Agreed to categorize a spec issue. This is to ensure spec uses the terms optional properly.

4.3.4 Rel 23

Sunil moved to close this issue without change to requirements or spec. Pete seconded.

No opposition, issue rel 23 dropped.

4.4 Rel 05 Crash Tolerance

Orginal Proposal:

Crash tolerance:
We say that an implementation of a specification is crash tolerant, if it is able to resume sensibly or continue operation in case of a hardware failure.

Fault tolerance:
We say that an implementation of a specification is fault tolerant, if it is able to resume sensibly or continue operation in case of an application or hardware failure.

Paolo proposed, on email:

Rel 5: Crash Tolerance Definitions

___________________________________

In order to clarify the following definitions, I think it's

worthy introducing some fundamental fault folerance terminology

(reference: "Faul Tolerant Computer System Design" Dhiraj K. Pradhan).

Fault: A fault is a physical defect, imperfection, or flaw that occurs

within some hardware or software component.

Error: An error is the manifestation of a fault. Specifically, an error is

a deviation from accuracy or correctness.

Failure: If an error results in the system performing one of its functions

incorrectly then a system failure has occured.

_________________________________________________________________

Next before defining Crash Tolerance, I think we should formalize what is

a Crash failure.

Crash failure (or simply Crash): Any failure that is consequence of a

fail-stop fault.

Fail-stop fault model: A fault is said to be fail-stop if whenever it

occurs, the only visible effect is that the affected component stops

functioning. Thus, any component affected by a fail-stop failure can show

no incorrect or arbitrary behavior.

Byzantine fault model: A failure is said to be byzantine if whenever it

occurs, the affected component can show any arbitrary, thus possibly

malicious, behavior.

Crash Tolerance: Crash Tolerance is the ability of a system (either only

specified or a software/hardware implementation) to ensure predetermined

properties despite the occurence of one or more unpredictable crash failure.

Non destructive crash (failure): Any crash, which does not

compromise the persistent state ( i.e. the state of an application stored

on a persistent storage) of an application.

Definition of Reliable Messaging: (freely inspired and rearranged from

WS-Glossary of W3C...)

The ability:

1. of the intended receiver of the message to be assured that it receives

and delivers a given message once and only once, i.e. exactly one time.

2. of a sender of a message to be able to determine whether a given

message has been already received by its intended receiver.

3. of a sender to be assured that the messages are received and delivered

by the intended receiver in the same order in which they were sent.

4. of both sender and receiver of a message to carry out (1), (2) and (3)

in the face of inevitable, yet often unpredictable, non-destructive

crashes which are eventually recovered.

Failure Recovery: Failure recovery is the process of regaining operational

status or restoring the system's integrity after the occurance of a

failure.

___________________________________________________________________

I am also not fully satisfied by the current persistent storage

definition.

I hope I'll find the time to reword the current definition before the F2F

meeting.

Apart from the above considerations, as I wrote in one of my past mails, I

believe that WS-RM suffers from the lack of multi-cast features. I can

imagine several important use cases where such a feature would be useful

(in general every time that some data has to be reliably exchanged between

more than two endpoints). Practically, WS-RM may either directly exploit

the multi-cast ability of multi-cast enabled transport protocols like

SMTP, or for the common case of HTTP binding, WS-RM should take care of

managing the correct data exchange over several TCP connections carrying

the HTTP POST requests and corresponding responses.

I want to make a motion to include multicast support in the requirements

list, but I would appreciate any idea/comments from you.

Looking forward to meeting you all at the F2F,

Paolo

Paolo Romano

Scott, an ack should carry the semantics that the sender can discard the message, for it has been delivered successfully.

The definition of success is application dependent.

Payrits: there are levels of guarantees. Do we indicate the level in the ack or not is a question we can answer.

Sunil asked what is important about this in our protocol design.

Payrits: The semantics of Ack is an important thing to clarify by this discussion.

Paulo: Different semantics for different levels of crash tolerance.

Payrits suggested that we put these definitions in the requirements doc.

If we find ourselves using the terms in the spec, we could also put the terms in the spec later.

Straw poll: any opposition to Payrits proposal to add the definitions from Paolo email to requirements document.

Concensus is to add these definitions to requirements.

Subject to final vote of approval later on for the Requirements document.

4.5 Rel 06 Persistence Definitions

Original Proposal:

Persistant data
A message or part of a message is considered to be persistent at a given node if it is stored in a persistent storage during its lifecycle at that node.

Persistant storage:
Persistant storage is a repository for statically stored data.

Static data storage
A static data storage is a data storage that retains information even while power is turned off.

Payrits stated these defs are Taken from WAP definitions.

Dock suggestion: collapse last two

Persistent storage: a data storage which retains data even while system power is not available.

Paulo suggested we don not need persistent data definition.

Sunil suggested we accept Dock definition. No objection.

Add definition, subject to TC approval of Requirements doc.

4.6 Rel 17 Persistence Requirements in Spec

Sunil stated that the spec is clear on persistence requirements for sender, but is unclear for receiver.

This is a spec issue.

Sunil: For Receiver persistence we could have two levels of persistence required:

· Once ack is sent, receiver can remove the message contents from the persistent storage and keep the status information for the message.

· Once time to live expires, receiver can remove from the persistent storage both the contents and the status information for the message.

Payrits wants to add one more:

· Do not store any message or message status information in a persistent storage.

Paolo stated that if we remove the word “persistent” from the first two bullets, this would alleviate the need for the third from Payrits.

If there is not persistent storage, the protocol would not work, which would leave the sender unaware (forever) of the state of delivery.

Mark suggested:

· Receiver can only remove message from persistent store when it is consumed by the WSRM user.

Sunil concern about Paolo suggestion is that it breaks the entire charter.

Jeff – if the server cannot persist, how can it state it is a reliable message receiver?

Jeff gave example, if receiver machine crashes, and comes back up, if the message status is lost, duplicate elimination cannot be accomplished by that receiver.

Payrits: a receiver should be able to state it is not tolerant to power failure.

Jeff – how can we define every level of fault tolerance.

Scott could define minimal levels of fault tolerance.

Jeff – if spec is too complicated, people will not implement it or use it. Different levels of reliability makes the spec too complicated.

Venkat: we need semantics for at what time messages and message status information can be deleted.

Jeff - Persistent storage failure makes you no longer a reliable message receiver.

Sunil – Once persistent storage is unavailable, you can no longer guarantee reliability.

Mark – If we can say something meaningful about what a reliable cell phone is.

Payrits: 1) how to protect from comm Failure between two parties. 2) How to protect ourself from our own implementation failures.

Consensus: If persistent storage fails, the receiver can no longer meet the requirements for being a reliable receiver.

Payrits: users need to know under what conditions the receiver can satisfy the requirements.

Perhaps a failure notification could be used to indicate when persistence has been lost.

Payrits: receiver can state under which conditions It can satisfy the reliability requirements:

- no crash

- power off, data storage not compromised.

- data storage is compromised.

Marc G – might use different ack status codes for different levels

Sunil – timeline for spec is 4 more months. Say we assume persistent storage is available while the receiver is acting as a reliable Message receiver.

Payrits: why is it not enough to state the receiver must use as persistent storage as possible?

Tom Rutt – is there a need for an error message for a recently rebooted receiver to indicate it cannot fulfill a guarantee.

This was considered unnecessary.

Agree to concensus statement.

However there is no way to indicate this to the sender.

Payrits proposed requirement: Data must be stored as securely as possible by the receiver implementation.

Several people express concerns about it.

Dock stated that power failure is such a common occurrence, that is warrants a special status. That is why our definition of persistence is keyed to power.

Payrits: If system memory fails, the receiver can no longer meet the requirements for being a reliable receiver.

When do we say a system is in the state of being a reliable message receiver.

Ebxml spec: Persistence is storage available after system failure or interruption.

Payrits: persistence of a storage is a state for that storage.

If system moves into state of non-persistence of storage, it no longer meets the requirements to be a reliable message receiver.

Mark G – expectations on reliability of system varies with what the sender is sending the data to. We need way to identify something about receiver’s persistence storage level.

Classes of persistence:

- persistent until permanent storage mechanism failure.

- Persistent until system power loss

Support two type of classes.

Potential requirement: Protocol must support the ability for a sender to ask the receiver what its level of persistence is.

This would give the sender the ability to not send messages to a system which does not persist after power loss.

Junichi: could be a service level agreement issue.

Scott stated that persistent loss does not take away guaranteed delivery capabilities.

Proposed statements

If persistent storage fails, the receiver can no longer meet the requirements for duplicate elimination and message ordering. (this is generally agreed)

Define two Classes of persistence (requires a change to definition of persistence):

- persistent until permanent storage mechanism failure.

- Persistent until system power loss

Less agreement about requirement for query capability in protocol to ask what class of persistence is implemented by a reliable message endpoint.

wsrm message

3 Wednesday WS-Rel Specification Discussions

4 Requirements Issues

4.1 Rel 19 continued

4.2 Rel 43 – Protocol support for message state inquiry

4.3 Movement of issues to Spec Issues

4.3.1 Rel 20 Negotiation

4.3.2 Rel 21 from/to

4.3.3 Rel 22 Optionality

4.3.4 Rel 23

4.4 Rel 05 Crash Tolerance

4.5 Rel 06 Persistence Definitions

4.6 Rel 17 Persistence Requirements in Spec