sdd message

Subject: Re: [sdd] Fw: Rollback - Reqt 2.1.1.6.1
From: James Falkner <james.falkner@sun.com>
To: Christine Draper <cdraper@us.ibm.com>
Date: Tue, 08 Aug 2006 12:30:04 -0400
Christine,

Fantastic summary and well thought out.  a few questions below

Christine Draper wrote:
> All,
> 
> I was asked to put together a description of how the information in the 
> current SDD schema could be used during rollback, as part of considering 
> whether any further action is needed to address requirement 2.1.1.6.1:
> 
> /The SDD specification must support the identification of components 
> that should be rolled back if later components fail, and identification 
> of components that should cause an overall rollback if that component 
> fails to install./
> 

Generic question here.  Is it the responsibility of the SDD author
to implement a transaction-based update?  For example, if an author
wants to deliver an update to an app, and part of that update changes
a config value from its current setting to some new setting, must
the "config artifact" save the previous value, such that if rollback
is needed, this previous value can be restored?  Is it possible to
declare this value (and its associated identifier) such that a
runtime could handle the transaction semantics on its own?

> Here's the analysis.
> 
> *SUMMARY*
> 
> The current schema can support:
> 
>    1. Full rollback of a new solution or update to a solution, providing
>       the appropriate artifacts are provided.
>    2. Partial rollback of a new solution to reestablish a valid system
>       configuration, backing out optional components that had failed to
>       install or caused functional problems.
>    3. Partial rollback of an update to a solution to reestablish a valid
>       system configuration, where updates are provided as requisites and
>       the previous version already met the solution requirements.
>    4. Fine-grained rollback of part of a solution update, before
>       retrying the same update. 
> 

What's the difference between #4 and #3.  Both sound like they are
a partial rollback of an update.  Oh, I think it's answered below..

> 
> The schema elements that support this are:
> 
>     * Selectable content, which indicates optional components of an IU
>       package that may be selected for rollback.
>     * Requisites, which distinguish components of the solution, from
>       separately installable IUs which need not be rolled-back.
>     * Internal dependencies, which can be used to identify IUs that must
>       be rolled-back as well as the selected IU.
>     * Requirements/resulting resources, which can be used to identify
>       when an IU from one package is required by an IU from another
>       package (and so must be rolled-back as well).
>     * Uninstall element for new SIUs; Undo element for update SIUs;
>       UndoConfig element for SCUs which provide the content needed for
>       rollback.
>     * Internal & external dependencies for a given lifecycle operation,
>       which can be used to modify the order in which rollback occurs. 
> 
> 
> There are details to be worked on what the SDD spec should say about 
> rollback order, see discussion below.
> 
> There are details to be worked about the connection between features and 
> selectable content, and what this should mean for rollback granularity.
> 
> Support for partial rollback of a maintenance update to a solution (3) 
> may require further discussion.
> 
> 
> *GORY DETAILS*
> 
> *Rollback of an SIU/SCU*
> 
> The basic behavior for rollback of an SIU/SCU is:
> 
>     * For a new install SIU, to rollback after either the install or
>       install-config operations, perform uninstall operation on the SIU.
>       If there is no uninstall element, then it is assumed that
>       appropriate cleanup is performed by the uninstall of other SIUs in
>       the package. If so, the package must ensure the uninstall is
>       correctly sequenced, see note later. Of course, a may omit the
>       uninstall element and perform no cleanup on uninstall - this would
>       usually be considered badly-behaved, but there may be some
>       circumstances in which this is the only option.
>     * For an update SIU, to rollback after either the install or
>       install-config operations, perform undo operation on the SIU. If
>       undo is not supported (no undo element), then this SIU cannot be
>       rolled-back.
>     * For an SCU, to rollback after the install-config operation (in an
>       IU package) or the config operation (in a CU package), perform
>       undoConfig operation. If undoConfig is not supported, then this
>       SCU cannot be rolled-back.
> 
> 
> If an error occurs whilst an SIU/SCU operation is in-flight, then one of 
> three situations may occur:
> 
>     * Target environment is able to rollback the in-flight operation
>       (guaranteed atomic transaction).

How is this guarantee established?  In other words, how will runtimes know
that it can safely assume that a partially-failed SIU/SCU has not left
any artifacts or other changes on the system?

>     * Target environment is not able to rollback the in-flight
>       operation, but the "undo" operation is robust when being applied
>       to a partially-done operation.
>     * Target environment is not able to rollback the in-flight
>       operation, "undo" operation is not robust or doesn't exist. In
>       this case, cleanup may not be complete. I don't think the
>       runtime/SDD can do anything more about this (other than warn the
>       user).
> 

Is there a way to flag a particular undo/uninstall operation as "robust"
such that it can be run when the SIU/Update SIU/SCU partially failed
but was not atomic?

> 
> *Order of Rollback*
> 
> Typically, a runtime is likely to reverse the order of install in order 
> to rollback. There are a couple of complications:
> 
>     * The split into install and install-config operations. These are
>       undone by a single operation for an SIU.
>     * The case where uninstall order needs to be different from install
>       order. This will be indicated using internal dependencies
>       (specifying that it is an uninstall dependency).
> 

So am I correct in saying that the default, for uninstall purposes,
should be to use the same dependencies as install?  So, if B depends
on A, then install order is A,B and uninstall order is B,A.

> 
> We need to decide what the spec should say about rollback order. Here is 
> one possibility.
> For each composite IU:
> 
>     * Undo any child CUs in reverse order of install-config.
>     * Then undo any child update IUs in reverse order of install.
>     * Then uninstall any child new IUs based on explicit dependencies.
>

That sounds right to me, by default.

> 
> Alternatively, we could assert that undo order has to be explicitly 
> defined using internal dependencies.
> 

It seems runtimes could support both.  For a given lifecycle operation,
a dependency map is created and then used for ordering.  So for
uninstall/rollback, the default dependency map is the one used by
install, and the order is driven by that.  If there are any overrides
(say, if the install order was A,B,C,D,E, but the author says that
E depends on C for uninstall, then the dependency map would be:

      E  D
      | /
      C
      |
      B
      |
      A

The the uninstall order could be one of two orders:

E,D,C,B,A
D,E,C,B,A

The point is that internal dependencies serve only to override
default dependency order.  The default order is the one driven
by the install lifecycle operation dependency order.

This also seems like a good place to allow authors to say that certain
operations *must not* be rolled back during rollback (because they
are known to be un-rollback-able or cause bad things to happen.  This
is part of the requirement we are trying to address).

> 
> *Rollback Extent*
> 
> The simplest assumption is full rollback - i.e. that all operations 
> performed in a given "transaction" are to be rolled-back.
> 
> There are two purposes for a finer-grained rollback may have one of two 
> purposes:
> 
>     * Reestablish a valid system configuration, backing out updates that
>       had failed or caused functional problems.
>     * Cleanup before retrying the same system update. This could be much
>       finer-grained. Any set of IUs might be rolled-back, providing they
>       are then reinstalled according to the constraints on install order
>       (and aren't installed twice without an undo in between).
> 

How will runtimes know that a given IU *must* be reinstalled
after a finer-grained rollback?  What if it is not re-installed?

> 
> The various possibilities for partial rollback will likely be driven by 
> user or policy decisions. With the current schema, an SDD author cannot 
> constrain partial rollback providing it re-establishes a valid system 
> configuration.
> 
> Possible granularities of partial rollback to reestablish a valid system 
> configuration are:
> 
>    1. Rollback the top-level IU package and all its composed components.
>       Do not rollback its requisites.
>    2. Rollback a new optional component of the top-level IU package, and
>       anything that depends on it.
>    3. Rollback a new optional component of a composed IU package, and
>       anything that depends on it.
>    4. Rollback an update in a contained IU package which updates a
>       shared (federated) resource, providing the pre-existing version
>       satisfied the solution requirements. This is problematic, see below.
>    5. Rollback an update in a requisite IU package which updates a
>       shared resource, providing the pre-existing version satisfied the
>       solution requirements (user had chosen to apply the more recent
>       level, even though the original level was within range).
> 
> 
> I haven't defined "optional component" in (2) and (3). This could either 
> be:
> 
>     * A single selectable content IU/CU (elements immediately under
>       selectableContent in an IU package).
>     * The set of selectable content IU/CUs that correspond to removing a
>       feature (and not used in another feature)
> 
> This point should be resolved when we close on the semantics of features 
> - e.g. are features to assist a user in making good selections, or a 
> mandatory contract defining what can be selected.
> 
> For (4), we would need to agree under what circumstances it is valid to 
> rollback an update to one component of a solution, but not the overall 
> solution. The current assumption is that the solution requires the 
> composed level, or a more recent, backwards-compatible level. If the 
> update is actually "optional", it could be better shipped as a requisite 
> (although not all runtimes would support choosing to install a more 
> recent requisite if the base requirement was satisfied). So I would 
> argue that updates to contained components cannot be individually 
> rolledback.
> 
> *Note on Schema Element Names*
> 
> Note: Schema currently has element names of "xxxArtifact" for each 
> operation. I think it would be more accurate if we changed this to just 
> "xxx" (e.g. install, undo), as there does not have to be an actual 
> artifact for each operation.

The only problem in naming elements using words that may be
interpreted as "verbs" is that someone may indeed interpret
it that way, and think of it is a procedure, instead of a
declaration.  Is there perhaps a more descriptive noun that
could be used?  "installArtifact" could also be mis-interpreted
as a verb too.. What about a nested situation, such as:

<install ....>
   <artifact ....>
   </artifact>
</install>

-jhf-
Follow-Ups:
- Re: [sdd] Fw: Rollback - Reqt 2.1.1.6.1
  - From: Christine Draper <cdraper@us.ibm.com>