RE: [tosca] Groups - TOSCA Operational Modalities uploaded

Relaying to the ‘tosca’ distribution list

From: Bruun, Peter Michael (CMS RnD Orchestration) <peter-michael.bruun@hpe.com>
Sent: Monday, December 6, 2021 10:11 AM
To: Tal Liron <tliron@redhat.com>; tosca@lists.oasis-open.org
Cc: Chris Lauwers <lauwers@ubicity.com>
Subject: RE: [tosca] Groups - TOSCA Operational Modalities uploaded

Some comments (please relay to the rest of the group) on “TOSCA Operational Modalities”.

Concerning the “Orchestration Centrifuge”, I am concerned:

If event flows turn circular, die out too quickly, or take a long time to converge, it is complicated to debug or analyze where the problem is; and it is even harder to determine how to do something about it. I am aware that there are tools for this, but as Telco systems scale to topologies with 10s of thousands of nodes, it becomes unmanageable.
Why is “Bandwidth Scaler” a special “hard-coded” entity in the diagram? My guess is that things like “bandwidth” are not represented as anything the cloud-native management systems (K8, etc) would know about. So it pops out as requiring a special platform. Real-life there can be hundreds to thousands of such “special” entity types.

I have (unfortunately) tested this idea of autonomously collaborating subsystems without central coordination of events at production scale, and customers were not terribly happy.

I have even tried this in various scenarios both at low and high level in the slide 3 pyramid. The critical problems are:

There is a high risk of circular event storms, that perpetuate themselves without ever converging to a stable state
Even when there is convergence towards a stable state, the time to converge for N collaborating subsystems tend to grow as N^2 due to ripple effects. This means that the time to set up a topology of hundreds or thousands of nodes becomes unacceptable, as do the required resources in terms of event processing and network capacity during the convergence period.

So I am fully aware of this beautiful dream, but in my experience, it does not scale to Telco grade topologies.

There needs to be central coordination in order to ensure a convergent process, and that, in my experience, is exactly the role of the orchestrator. It may be, that this is what slide 2 is expressing – but that is not clear to me.

Peter

From: tosca@lists.oasis-open.org [mailto:tosca@lists.oasis-open.org] On Behalf Of Tal Liron
Sent: 6. december 2021 17:11
To: tosca@lists.oasis-open.org
Subject: [tosca] Groups - TOSCA Operational Modalities uploaded

Submitter's message
I'm uploading a new version of this presentation with two extra slides that I hope will assist in explaining the previous two.
-- Tal Liron

Document Name: TOSCA Operational Modalities

No description provided.
Download Latest Revision
Public Download Link

Submitter: Tal Liron
Group: OASIS Topology and Orchestration Specification for Cloud Applications (TOSCA) TC
Folder: Working Documents
Date submitted: 2021-12-06 08:11:05
Revision: 1

tosca message