RE: orchestration "mental model"

Thanks Peter, comments and questions in-line:

From: Bruun, Peter Michael (CMS RnD Orchestration) <peter-michael.bruun@hpe.com>
Sent: Thursday, July 22, 2021 12:01 AM
To: Chris Lauwers <lauwers@ubicity.com>; tosca@lists.oasis-open.org
Subject: RE: orchestration "mental model"

All,

I agree that asynchronous events indicate a deviation from the (current) intended state.

In principle, modification requests change the intended state, and so the current state could also be construed as a deviation from the (new) intended state.

If I understand correctly, you’re suggesting that “commands” (modification requests) and “events” (notifications of deviation of intended state) can be harmonized by treating both of them as a “deviation from intended state” indication, which should then result in the orchestrator attempting to restore the desired state.

The challenge, in my opinion, is that we currently have two different mechanisms for handling these “deviations from intended state”:

“Commands” ultimately result in the execution of workflows (which ideally are created automatically) that are designed to arrive at the intended state.
“Notifications” correspond to “events” that might trigger event/condition/action policies.

Should we investigate whether we can harmonize handling of both of these? We have already harmonized the “actions” that can be executed by workflows and by policies. Perhaps we should go even further?

So at a higher level, in both cases the task of the Orchestrator will be to transform the current state of the infrastructure to the intended state.

Asynchronous events will typically not pertain to the service in its entirety, but to one or more nodes in the node representation graph, typically indicating that that node is somehow failing or degraded – for example it could be overloaded.

Yes, that is why “notifications” (which represent external events) are part of interfaces that are associated with nodes or relationships (and not with the service as a whole).

This means that there must be some way to express how the orchestrator should:

1. Asynchronous events would often be coming from monitoring systems that are shared by many deployed service templates – monitoring systems are typically structured regionally and/or by the technology they monitor. So the first step would be to correlate an asynchronous event to the right deployed services. A failing server or network could easily impact multiple deployed templates.

2. Once it has been decided that an event pertains to a deployed service, it must be further correlated to one or more nodes in the node representation graph.

3. Based on policies take some action – like:

a. Send a mail or text message to an administrator

b. Record compensation in a billing system

c. Execute some operation supported by the interfaces defined for that node in the template – for example actions to tear down and recreate, or scale out/in.

d. Select a completely different template for some substitution within the service – e.g. if high-quality MPLS WAN services are failing or overloaded, fail over to a template for lower quality, IP-based networking.

We had discussions with Paul Jordan at TMF earlier this year about exactly this topic. The TMF models introduce a modeling pattern where each node is “married” to a monitoring node that observes external state. The monitoring agent updates the state of its corresponding node in response to such changes. Of course, this is just a design pattern, and it is not clear how much of this should be supported in the TOSCA language (e.g. using notifications and associated event/condition/action policies) vs. in the deployed entities themselves:

In one approach, TOSCA would be used to deploy the entity represented by the node as well as the monitoring agent for that node. TOSCA would “model” the relationship between the entity and its monitoring agent, but all notifications between the monitoring agent and the monitored entity are handled entirely outside of TOSCA and the TOSCA orchestrator is not involved in any corrective actions.
In a second approach, the monitoring agent would sent notifications to the TOSCA orchestrator, which then result in the execution of policies that change the state of the corresponding entity. Using this approach, the TOSCA orchestrator is responsible for taking corrective actions.

Asynchronous events are “dangerous” in a number of ways:

· The rate of incoming asynchronous events could be very high – thousands or even millions of events per second are not uncommon in a Telco environment.

Yes, this is why it might be best to deploy a monitoring agent that collects the (very high rate of) asynchronous events (outside of the scope of TOSCA). This agent could the fire “summary events” to the orchestrator when corrective action needs to be taken.

· The asynchronous events may be “toggling” – like the person asked to tell if the yellow turn signal lights are working: “Now it works … now it doesn’t … now it works … “. If this is attached to orchestrator actions that may take minutes to execute, it could be devastating.

Again, this type of logic is probably better implemented inside a monitoring agent rather than in the TOSCA orchestrator.

Some of these aspects are, of course outside the scope of TOSCA, but they must be dealt with by Orchestrators nevertheless. In our implementation, we can handle all of the above aspects.

Agreed. In addition to making sure that the TOSCA language can handle all of these aspects, we should also suggest “best practices” and associated design patterns.

Peter

Chris

From: tosca@lists.oasis-open.org [mailto:tosca@lists.oasis-open.org] On Behalf Of Chris Lauwers
Sent: 21. juli 2021 20:36
To: Chris Lauwers <lauwers@ubicity.com>; tosca@lists.oasis-open.org
Subject: [tosca] RE: orchestration "mental model"

Here is another way to think about the difference between “commands” and “notifications”:

A command is an _expression_ of “intent”: it allows a user to express something that needs to happen
A notification is an _expression_ of a deviation from an intended state.

I’m not sure if we’ll be able to harmonize these two into a single “event” model.

Thanks,

Chris

From: tosca@lists.oasis-open.org <tosca@lists.oasis-open.org> On Behalf Of Chris Lauwers
Sent: Tuesday, July 20, 2021 10:51 AM
To: tosca@lists.oasis-open.org
Subject: [tosca] orchestration "mental model"

During today’s TOSCA Language Ad-Hoc meeting, we discussed the lack of a clear “mental model” for events that may need to be processed by a TOSCA orchestrator. We discussed the following potential “categories” of events:

There seems to be a distinction between synchronous and asynchronous events. Synchronous event are “commands” issued by some external entity (e.g. an administrator or some management system that invokes APIs). Asynchronous events are the result of changes in the external “implementations” (i.e. the services under management). These changes are observed by some type of monitoring system, and are reported asynchronously as events or notifications.
While there was a suggestion that synchronous and asynchronous events should be treated the same by an orchestrator, I’m having a hard time seeing how this is possible. Based on the current TOSCA specification, synchronous events (“commands”) result in the execution of workflows, which ultimately call interface operations. Asynchronous events, on the other hand, result in the triggering of event/condition/action policies (which in turn may execute workflows). I’m not sure how these two could be harmonized.
As it relates to synchronous events (“commands”) it seems to me that all of these commands can be reduced to simple CRUD operations:

Create a service
Read a service
Update a service
Delete a service

In my implementation, I have exactly these 4 canonical operations. What allows all management operations to be reduced to one of these 4 operations is the set of arguments that is passed:

The “Create” operation takes a service template, a set of service input values, and (optionally) the name of the workflow that needs to be run to realize the service in the external world
The “Read” operation just takes a service id
The “Update” operation takes a service id and one or more of the following:

Update service input values
An updated service template
The name of a workflow that needs to be run

The “Delete” operation takes a service id and (optionally) the name of a workflow that needs to be run

Please provide feedback and comments so we can iterate on this and solidify the mental model, which will help streamline our future discussions.

Thanks,

Chris

tosca message