[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Crash Tolerance Definition + Multicast
I tried to write down some definitions related to Crash Tolerance (Rel 5). As I said during the teleconf, I think we should be as formal and precise as possible when we define the guarantees people can expect when using WS-RM. Unfortunately the definitions grew quite long... Rel 5: Crash Tolerance Definitions ___________________________________ In order to clarify the following definitions, I think it's worthy introducing some fundamental fault folerance terminology (reference: "Faul Tolerant Computer System Design" Dhiraj K. Pradhan). Fault: A fault is a physical defect, imperfection, or flaw that occurs within some hardware or software component. Error: An error is the manifestation of a fault. Specifically, an error is a deviation from accuracy or correctness. Failure: If an error results in the system performing one of its functions incorrectly then a system failure has occured. _________________________________________________________________ Next before defining Crash Tolerance, I think we should formalize what is a Crash failure. Crash failure (or simply Crash): Any failure that is consequence of a fail-stop fault. Fail-stop fault model: A fault is said to be fail-stop if whenever it occurs, the only visible effect is that the affected component stops functioning. Thus, any component affected by a fail-stop failure can show no incorrect or arbitrary behavior. Byzantine fault model: A failure is said to be byzantine if whenever it occurs, the affected component can show any arbitrary, thus possibly malicious, behavior. Crash Tolerance: Crash Tolerance is the ability of a system (either only specified or a software/hardware implementation) to ensure predetermined properties despite the occurence of any unpredictable crash failure. Non destructive crash (failure): Any crash, which does not compromise the persistent state ( i.e. the state of an application stored on a persistent storage) of an application. Definition of Reliable Messaging: (freely inspired and rearranged from WS-Glossary of W3C...) The ability: 1. of the intended receiver of the message to be assured that it receives and delivers a given message once and only once, i.e. exactly one time. 2. of a sender of a message to be able to determine whether a given message has been already received by its intended receiver. 3. of a sender to be assured that the messages are received and delivered by the intended receiver in the same order in which they were sent. 4. of both sender and receiver of a message to carry out (1), (2) and (3) in the face of inevitable, yet often unpredictable, non-destructive crashes which are eventually recovered. Failure Recovery: Failure recovery is the process of regaining operational status or restoring the system's integrity after the occurance of a failure. ___________________________________________________________________ I am also not fully satisfied by the current persistent storage definition. I hope I'll find the time to reword the current definition before the F2F meeting. Apart from the above considerations, as I wrote in one of my past mails, I believe that WS-RM suffers from the lack of multi-cast features. I can imagine several important use cases where such a feature would be useful (in general every time that some data has to be reliably exchanged between more than two endpoints). Practically, WS-RM may either directly exploit the multi-cast ability of multi-cast enabled transport protocols like SMTP, or for the common case of HTTP binding, WS-RM should take care of managing the correct data exchange over several TCP connections carrying the HTTP POST requests and corresponding responses. I want to make a motion to include multicast support in the requirements list, but I would appreciate any idea/comments from you. Looking forward to meeting you all at the F2F, Paolo -- Paolo Romano
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]