sdd message

Subject: Re: [sdd] Fw: Rollback - Reqt 2.1.1.6.1

From: Christine Draper <cdraper@us.ibm.com>
To: sdd@lists.oasis-open.org
Date: Wed, 9 Aug 2006 16:15:36 -0500

James,

Responses below.

James Falkner <james.falkner@sun.com> wrote on 08/08/2006 11:30:04 AM: > Christine, > > Fantastic summary and well thought out. a few questions below > > Christine Draper wrote: > > All, > > > > I was asked to put together a description of how the information in the > > current SDD schema could be used during rollback, as part of considering > > whether any further action is needed to address requirement 2.1.1.6.1: > > > > /The SDD specification must support the identification of components > > that should be rolled back if later components fail, and identification > > of components that should cause an overall rollback if that component > > fails to install./ > > > > Generic question here. Is it the responsibility of the SDD author > to implement a transaction-based update? For example, if an author > wants to deliver an update to an app, and part of that update changes > a config value from its current setting to some new setting, must > the "config artifact" save the previous value, such that if rollback > is needed, this previous value can be restored? Is it possible to > declare this value (and its associated identifier) such that a > runtime could handle the transaction semantics on its own?

The SDD author is responsible for providing the definition of the undo artifact/action which allows rollback for each IU/CU (or making the decision that rollback is not supported for that part of the package, which is the case if no "undoArtifact" element is defined). If the target environment knows how to rollback with no additional information other than what can be provided by the runtime - e.g. resulting resource identity, IU identity - this could simply be an empty element. However, note the target environment may need to be able to rollback to an arbitrary depth and not just the last action, as a composite SDD may include multiple IUs/CUs applied to the same target.

Specifically on configuration artifacts, assuming we agree to keep the declarative form of CU that's in the current schema ("configProperties"), I would propose that a Compliance Level TBD runtime should provide built-in support for rollback of "standard-format declarative" CUs. The runtime should be able to do this by storing the previous values of the properties. This is not possible for a general CU where the artifact format is "opaque".

Note I'm suggesting a rename of the "undoArtifact" element and other similar elements to <<operation>>Action, to remove the implication that a file is always associated with a lifecycle operation.

> > > Here's the analysis. > > > > *SUMMARY* > > > > The current schema can support: > > > > 1. Full rollback of a new solution or update to a solution, providing > > the appropriate artifacts are provided. > > 2. Partial rollback of a new solution to reestablish a valid system > > configuration, backing out optional components that had failed to > > install or caused functional problems. > > 3. Partial rollback of an update to a solution to reestablish a valid > > system configuration, where updates are provided as requisites and > > the previous version already met the solution requirements. > > 4. Fine-grained rollback of part of a solution update, before > > retrying the same update. > > > > What's the difference between #4 and #3. Both sound like they are > a partial rollback of an update. Oh, I think it's answered below.. >
Yup :-)
> > > > The schema elements that support this are: > > > > * Selectable content, which indicates optional components of an IU > > package that may be selected for rollback. > > * Requisites, which distinguish components of the solution, from > > separately installable IUs which need not be rolled-back. > > * Internal dependencies, which can be used to identify IUs that must > > be rolled-back as well as the selected IU. > > * Requirements/resulting resources, which can be used to identify > > when an IU from one package is required by an IU from another > > package (and so must be rolled-back as well). > > * Uninstall element for new SIUs; Undo element for update SIUs; > > UndoConfig element for SCUs which provide the content needed for > > rollback. > > * Internal & external dependencies for a given lifecycle operation, > > which can be used to modify the order in which rollback occurs. > > > > > > There are details to be worked on what the SDD spec should say about > > rollback order, see discussion below. > > > > There are details to be worked about the connection between features and > > selectable content, and what this should mean for rollback granularity. > > > > Support for partial rollback of a maintenance update to a solution (3) > > may require further discussion. > > > > > > *GORY DETAILS* > > > > *Rollback of an SIU/SCU* > > > > The basic behavior for rollback of an SIU/SCU is: > > > > * For a new install SIU, to rollback after either the install or > > install-config operations, perform uninstall operation on the SIU. > > If there is no uninstall element, then it is assumed that > > appropriate cleanup is performed by the uninstall of other SIUs in > > the package. If so, the package must ensure the uninstall is > > correctly sequenced, see note later. Of course, a may omit the > > uninstall element and perform no cleanup on uninstall - this would > > usually be considered badly-behaved, but there may be some > > circumstances in which this is the only option. > > * For an update SIU, to rollback after either the install or > > install-config operations, perform undo operation on the SIU. If > > undo is not supported (no undo element), then this SIU cannot be > > rolled-back. > > * For an SCU, to rollback after the install-config operation (in an > > IU package) or the config operation (in a CU package), perform > > undoConfig operation. If undoConfig is not supported, then this > > SCU cannot be rolled-back. > > > > > > If an error occurs whilst an SIU/SCU operation is in-flight, then one of > > three situations may occur: > > > > * Target environment is able to rollback the in-flight operation > > (guaranteed atomic transaction). > > How is this guarantee established? In other words, how will runtimes know > that it can safely assume that a partially-failed SIU/SCU has not left > any artifacts or other changes on the system?

Possibilities:

It is declared as a characteristic of the resource type (as part of the resource model, not the SDD)
The runtime can assert that the guarantee is required, and the target environment must not start the operation unless it can meet the guarantee.
Some part of the failure return code from the target environment indicates that this wasn't the case, so the runtime knows there may be a problem.
The runtime doesn't know and has to assume the worst.

> > > * Target environment is not able to rollback the in-flight > > operation, but the "undo" operation is robust when being applied > > to a partially-done operation. > > * Target environment is not able to rollback the in-flight > > operation, "undo" operation is not robust or doesn't exist. In > > this case, cleanup may not be complete. I don't think the > > runtime/SDD can do anything more about this (other than warn the > > user). > > > > Is there a way to flag a particular undo/uninstall operation as "robust" > such that it can be run when the SIU/Update SIU/SCU partially failed > but was not atomic?

Not in the current schema. I think best practices would say they always should be robust, if specified at all, so I'm reluctant to give people this "out". Uninstall has to cope with "corrupted" installations, so a partially complete installation is just one flavor of this. I'd think it was equally unacceptable to say "you can rollback this update, but only if it succeeds in the first place".
> > > > > *Order of Rollback* > > > > Typically, a runtime is likely to reverse the order of install in order > > to rollback. There are a couple of complications: > > > > * The split into install and install-config operations. These are > > undone by a single operation for an SIU. > > * The case where uninstall order needs to be different from install > > order. This will be indicated using internal dependencies > > (specifying that it is an uninstall dependency). > > > > So am I correct in saying that the default, for uninstall purposes, > should be to use the same dependencies as install? So, if B depends > on A, then install order is A,B and uninstall order is B,A. >
We need to decide this. It could just be best practice. If we did specify this behavior in the spec, the question is how to override it in cases where it is not what is desired.
> > > > We need to decide what the spec should say about rollback order. Here is > > one possibility. > > For each composite IU: > > > > * Undo any child CUs in reverse order of install-config. > > * Then undo any child update IUs in reverse order of install. > > * Then uninstall any child new IUs based on explicit dependencies. > > > > That sounds right to me, by default. > > > > > Alternatively, we could assert that undo order has to be explicitly > > defined using internal dependencies. > > > > It seems runtimes could support both. For a given lifecycle operation, > a dependency map is created and then used for ordering. So for > uninstall/rollback, the default dependency map is the one used by > install, and the order is driven by that. If there are any overrides > (say, if the install order was A,B,C,D,E, but the author says that > E depends on C for uninstall, then the dependency map would be: > > E D > | / > C > | > B > | > A > > The the uninstall order could be one of two orders: > > E,D,C,B,A > D,E,C,B,A > > The point is that internal dependencies serve only to override > default dependency order. The default order is the one driven > by the install lifecycle operation dependency order. >
I think this would be good practice for a runtime. The question is whether it is what the spec should require, and if so, making sure we specify the "override" behavior in an unambiguous way.
> This also seems like a good place to allow authors to say that certain > operations *must not* be rolled back during rollback (because they > are known to be un-rollback-able or cause bad things to happen. This > is part of the requirement we are trying to address). >
If there is "undoArtifact" element, then an IU is un-rollbackable or bad things happen. If the user has asserted that they want to install a composite solution with undo-ability, then having such an IU as part of the composite should prevent the operation. Of course, a runtime could permit this to be overridden, but I don't think it is within the scope of the standard to specify that this should (or should not) be permitted.

> > > > *Rollback Extent* > > > > The simplest assumption is full rollback - i.e. that all operations > > performed in a given "transaction" are to be rolled-back. > > > > There are two purposes for a finer-grained rollback may have one of two > > purposes: > > > > * Reestablish a valid system configuration, backing out updates that > > had failed or caused functional problems. > > * Cleanup before retrying the same system update. This could be much > > finer-grained. Any set of IUs might be rolled-back, providing they > > are then reinstalled according to the constraints on install order > > (and aren't installed twice without an undo in between). > > > > How will runtimes know that a given IU *must* be reinstalled > after a finer-grained rollback? What if it is not re-installed? >
If a runtime supports the second type of fine-grained rollback, it had better keep track of what it rolled-back and what it needs to reinstall. Otherwise, the system will likely end up in an invalid state.

> > > > The various possibilities for partial rollback will likely be driven by > > user or policy decisions. With the current schema, an SDD author cannot > > constrain partial rollback providing it re-establishes a valid system > > configuration. > > > > Possible granularities of partial rollback to reestablish a valid system > > configuration are: > > > > 1. Rollback the top-level IU package and all its composed components. > > Do not rollback its requisites. > > 2. Rollback a new optional component of the top-level IU package, and > > anything that depends on it. > > 3. Rollback a new optional component of a composed IU package, and > > anything that depends on it. > > 4. Rollback an update in a contained IU package which updates a > > shared (federated) resource, providing the pre-existing version > > satisfied the solution requirements. This is problematic, see below. > > 5. Rollback an update in a requisite IU package which updates a > > shared resource, providing the pre-existing version satisfied the > > solution requirements (user had chosen to apply the more recent > > level, even though the original level was within range). > > > > > > I haven't defined "optional component" in (2) and (3). This could either > > be: > > > > * A single selectable content IU/CU (elements immediately under > > selectableContent in an IU package). > > * The set of selectable content IU/CUs that correspond to removing a > > feature (and not used in another feature) > > > > This point should be resolved when we close on the semantics of features > > - e.g. are features to assist a user in making good selections, or a > > mandatory contract defining what can be selected. > > > > For (4), we would need to agree under what circumstances it is valid to > > rollback an update to one component of a solution, but not the overall > > solution. The current assumption is that the solution requires the > > composed level, or a more recent, backwards-compatible level. If the > > update is actually "optional", it could be better shipped as a requisite > > (although not all runtimes would support choosing to install a more > > recent requisite if the base requirement was satisfied). So I would > > argue that updates to contained components cannot be individually > > rolledback. > > > > *Note on Schema Element Names* > > > > Note: Schema currently has element names of "xxxArtifact" for each > > operation. I think it would be more accurate if we changed this to just > > "xxx" (e.g. install, undo), as there does not have to be an actual > > artifact for each operation. > > The only problem in naming elements using words that may be > interpreted as "verbs" is that someone may indeed interpret > it that way, and think of it is a procedure, instead of a > declaration. Is there perhaps a more descriptive noun that > could be used? "installArtifact" could also be mis-interpreted > as a verb too.. What about a nested situation, such as: > > <install ....> > <artifact ....> > </artifact> > </install> > > -jhf-
As mentioned above, what do you think about "installAction", "undoAction"?

Regards,
Christine

Senior Technical Staff Member
IBM, 11501 Burnet Road, Mail Point 901-6B10
Austin, TX 78758
1-512-838-3482 tl 678-3482

References:
- Re: [sdd] Fw: Rollback - Reqt 2.1.1.6.1
  - From: James Falkner <james.falkner@sun.com>