RE: [tosca] Using pre-existing representations at runtime (without break

Hi Tal,

Thanks for the additional clarifications, but I donât believe the issue is that your viewpoint isnât properly understood. Instead, I believe we have a fundamentally different vision for TOSCA. Please correct my articulation, but I believe that you would prefer to keep TOSCA strictly a design time language that is used to define service topology graphs that then can be used as inputs to 3^rd-party orchestration systems such as Kubernetes, Ansible, and others. In this scenario, the semantics of how the TOSCA graph is interpreted and processed depend purely on the 3^rd-party orchestration system. TOSCA would only be used for Day 0 (the design phase), but not for Day 1 (deployment) or Day 2 (ongoing management).

My vision aligns with our recently updated Charter, which says that âThe Topology and Orchestration Specification for Cloud Applications (TOSCA) provides a language for describing application components and their relationships by means of a service topology, and for specifying the lifecycle management procedures for creation or modification of services using orchestration processes. The combination of topology and orchestration enables not only the automation of deployment but also the automation of the complete service lifecycle managementâ. Using this vision, TOSCA also includes grammar (and associated semantics) for automating the deployment of services defined in the TOSCA language as well as Day 2 management of these services.

Of course these visions are not necessarily incompatible. If you want to use TOSCA strictly as a design-time language, then you should feel free to no implement the lifecycle management features of TOSCA. We have discussed in the past an approach where we create two sets of compliance criteria: one for design time features, and one for lifecycle management features. This is an approach we could continue to pursue.

The real challenge here is where to draw the line between which features are design time features and which features are lifecycle management features. Because of your viewpoint, it appears that you are constantly trying to re-interpret many of TOSCA lifecycle management features as design time only features, since your implementation wants to be design time only. Iâm thinking specifically about features such interfaces, operations, notifications, and event/condition/action policies. If your implementation does not deploy or manage services (i.e. does not include an orchestrator), then it should not process these features.

We seem to have gotten into a similar situation now with requirements. Requirements are one of the most powerful features of TOSCA: they allow component designers to define dependencies on other components and/or resource requirements directly into component designs (i.e. node types). The real value of requirements is as a composition feature at deployment time (Day 1): they enable powerful automation where these requirements can be fulfilled dynamically by a system (rather than a human operator) at deployment time. Recasting requirements as a design-time only feature forces designers to âfulfillâ all these requirements manually in their service templates. Not only is this error prone, it also eliminates possibilities for automation, and will result in large monolithic and unmanageable service templates.

Assuming we all agree that requirements (and capabilities) are a lifecycle management feature, then I believe the arguments about grammar largely go away:

You express a strong objection to âbroken graphsâ. However, I donât really understand what the issue is here. Incomplete graphs can be fully validated (since target types of the âdanglingâ requirements are fully defined) and they provide the only clean mechanism for supporting service composition at deployment time.
There is not âheuristicâ for determining if a requirement is dangling, nor is there a need to introduce a âdanglingâ keyword. If a requirement does not have a target node assigned in a template, it is dangling. There is nothing more to it.
Yes, there are scenarios where you want the same target node to be used to fulfill requirements for multiple other nodes. This use case is explicitly presented in Section 2.9.2 of the spec, and it uses the exact node template approach youâre presenting earlier. There is no need to introduce a new âinventoryâ directive, since the âselectâ directive serves exactly this purpose. As explained in Section 3.4.3, the âselectâ keyword instructs the orchestrator to find a node in its inventory that can then service as the target node for the requirements to the âselectâ node. Just like dangling requirements, âselectâ nodes support a node filter that can constrain the set of candidates from which to select a suitable node. Note that your explanation below seems to suggest that node filters are a design time feature. Since node filters are closely tied to requirement fulfillment, they are definitely a deployment-time feature.
While âselectâ nodes are semantically the same as dangling requirements, we cannot just replace all dangling requirements with select nodes, since they donât support requirement mapping in substitution mapping. Dangling requirements are the âexternally-visible pinsâ through which topologies can be stitched together with other topologies. Select nodes cannot provide that functionality.
And finally, you continue to treat substitution mapping and requirement fulfillment as though they are the same. These are fundamentally different features that should not be conflated.

Letâs please agree on a common understanding of the lifecycle management features of TOSCA so we can move forward with the spec.

Thanks,

Chris

From: tosca@lists.oasis-open.org <tosca@lists.oasis-open.org> On Behalf Of Tal Liron
Sent: Wednesday, January 26, 2022 12:15 PM
To: tosca@lists.oasis-open.org
Subject: [tosca] Using pre-existing representations at runtime (without breaking the TOSCA graph)

I want to clarify some comments I made in the discussion in the last ad hoc meeting. I feel I've repeated the same arguments again and again but perhaps explaining them in a new way would help.

I am not at all opposed to the idea that an orchestrator could make use of pre-existing representations during Day 1 deployment. I indeed agree that it is a crucial aspect of TOSCA. But I specifically disagree that allowing for partial graphs, or "dangling" requirements, is a good way to achieve this.

Let's first examine the damage that "dangling" causes. Here's an example:

capability_types:

Host: {}

Storage: {}

node_types:

App:

requirements:

- host: Host

Server:

capabilities:

host: Host

requirements:

- backup: Storage

Baremetal:

derived_from: Server

VM:

derived_from: Server

Store:

capabilities:

storage: Storage

topology_template:

node_templates:

public-web:

type: App

requirements:

- host: Host

dangling: true

admin-web:

type: App

requirements:

- host: Host

dangling: true

backup:

type: Storage

Note that I invented a "dangling" keyword here just to make the intent clear. (I think Chris is imagining a heuristic to automatically determine whether a requirement is "dangling" or not, but let's not get caught up on that here.)

The point of leaving these requirements "dangling" in this use case is that 1) we want to keep the design agnostic to the platform, and 2) we specifically do not want to create a new server, but instead use this topology to install a service on existing servers. In other words, we'd want the orchestrator to find us a suitable server from whatever inventory it manages or knows about (e.g. by querying the platform). Maybe it will be a VM, maybe a baremetal machine, etc.

There are several potentially fatal problems with this approach.

First, we have no way of telling if both public-web and admin-web would end up being on the same server. Unfortunately, that's exactly what we want in this case. The reason being is that this service also includes a backup node and let's say that for technical reasons it can only backup a single server. But maybe it's not a technical limitation, but rather a design consideration. The architect here wants both apps to be on the same server because they share some kind of configuration or whatever. Bottom line: it needs to be a single server.

Second, there is no way to specify that we want the server to use our backup node. But because the server does not exist in our topology there is no way to assign requirements on it.

We essentially have a broken design here. We know there is a server (because the "dangling" requirement would have to be fulfilled) but we cannot refer to it because it simply does not exist in TOSCA.

The problem is not just that it's invisible, it's that it's non-deterministically invisible. Again, we have no way of knowing whether a single server will be used or two servers. That would have to be determined by runtime, perhaps according to availability in the inventory, and makes it impossible to properly design the "backup" part of our topology. We have no one topology, but a set of possible topologies. (Actually, it's even more complicated here because we are requiring a capability type, not a node type. So we don't even know if we will be getting a server-type node. Of course we can also require a Server node type, but I want to keep this loose for now in order to facilitate a discussion below.)

This goes against everything I think is important about TOSCA, namely that it allows Day 0 design to be validated deterministically. To this end we've actually limited TOSCA in several ways -- for example, there are no conditionals to the design, no "if/else" that can be applied to node templates. This is because we know that if the graph is not well determined and well understood we are losing the strict validation that is TOSCA's hallmark. "Dangling" is damage.

So, what can we do instead, if we don't want to break the graph? Quite a lot! We can do almost everything we need, in fact.

Here's the same use case as above with nothing left "dangling":

topology_template:

node_templates:

public-web:

type: App

requirements:

- host: Host

admin-web:

type: App

requirements:

- host: Host

backup:

type: Storage

server:

type: Server

directives:

- inventory

requirements:

- backup: Storage

The only difference is the addition of the explicit "server" node template. With this none of the requirements are left dangling and we can 1) be sure that there will only be one server, and 2) link up that server to our backup node. The designer determines the topology.

You'll notice that I added an "inventory" directive to this node template. I chose that rather arbitrarily, we can definitely decide to standardize this in the spec. The point is that it tells the orchestrator to not attempt to provision a server, but instead to use a pre-existing one. That's a runtime consideration. Otherwise our design is clear, complete, and valid as it is.

But there are some subtleties to discuss. The "Server" node type is abstract. But what we have in the inventory would be representations of concrete types that derive from it: VMs and Baremetals. So, we need to allow for the orchestrator to use compatible representations for the directive, meaning of the type or of any derived type. There should be no problem from the perspective of TOSCA because those compatible representations are guaranteed to fit within this base-typed node template. We do that all the time in TOSCA.

Another point I want to make is that this is grammatically identical to substitution. The only difference would be the directive used. However, there would be a difference, of course, in what the orchestrator would be doing, and also a difference in representations. Chris mentioned that he thinks that every node template should have exactly a single node representation, but I think substitution is an obvious case of breaking that, because that single node template would be represented by the many node representations of the substituting topology. And, indeed, this "inventory" directive could allow for any compatible representation(s), not just one.

Now, I do want to discuss two deficiencies to this approach, which I think are non-fatal and could be improved upon. (As opposed to the "dangling" approach, which I think is totally fatal.)

The first is a certain apparent loss of flexibility in terms of type. Specifically in the "dangling' example above we are requiring a capability type but not a node type. But by using a node template we must give it a type. It could be an abstract base type, as we've done here, but that still limits the selection in a way that selecting by capability does not.

I personally don't think this loss of flexibility is too bad. In a way it is a strength, as it tightens the design and removes the risk of introducing unknown node types to the topology. On the other hand, we might have use cases in which we are dealing with arbitrary 3rd-party TOSCA profiles that do not all use the same base types and thus we might want to leave the node type as a wildcard. I'm not 100% sure it's a good idea to support such use cases, but assuming we do, there are potential grammatical workarounds. For example, I can imagine a special "compatibility node type" that does not require object-oriented derivation, but instead can be tested for compatibility directly. (We discussed something like this at some point for another use case.) Here's an example of how it could look:

node_types:

ServerLike:

compatibility: true

capabilities:

host: Host

requirements:

- backup: Storage

topology_template:

node_templates:

public-web:

type: App

requirements:

- host: Host

admin-web:

type: App

requirements:

- host: Host

backup:

type: Storage

server:

type: ServerLike

directives:

- inventory

requirements:

- backup: Storage

This new "ServerLike" type has a "compatibility: true" marker (which I invented here) that specifies that representations are checked against its definitions. So, to be compatible with "ServerLike" a node type would have to have a capability definition named "host" of type Host and a requirement definition named "backup" of type "Storage". It would not have to be derived from ServerLike. This would lower the bar for 3rd-party profile designers, because they would just need to adhere to these definitions and do not have to derive from a shared based type.

Again, I want to emphasize that I don't think this feature is necessary, but assuming we want it this could possibly work.

Another apparent deficiency of this approach is that it's not immediately apparent how to "softly" constrain the representation selection. When we deal with design-time requirements, we can use node filters (and capability filters) that allow us to constrain properties. So, for example, we could require a server that has at least 16GB of RAM. That appears to be quite flexible. However, it's not really that flexible. The issue is that some of the most interesting aspects for runtime matching are not properties, but attributes. Even something like RAM is better understood as an attribute and not a property. Sure, for a baremetal machine it's probably something that will not change in runtime, unless a technician comes in and inserts more RAM to the motherboard. But virtual machines can change their RAM capacity dynamically, even without rebooting. And that's a very viable feature for orchestration. For example, if a new application is going to be installed into a VM and there's not enough RAM, then it makes a lot of sense for the orchestrator to increase the VM's RAM in order to make room. This is exactly the kind of smart optimization that we expect good orchestrators to do in order to conserve resources and maximize uptime. So, again, it shows just how closely TOSCA requirements are tied to design-time considerations and are not the right tool for runtime.

Let's put aside "soft" constraints for a moment and look at an example of a "hard" constraint:

server:

type: Server

directives:

- inventory

attributes:

ram: 16 GB

requirements:

- backup: Storage

Here we specified an exact value for the RAM attribute and it would not be a problem for the orchestrator to check against it when going through its inventory of representations. Exact values can work fine for some kinds of attributes (especially booleans) but in cases like these we want a "light" constraint. Specifically we want to say that we want a minimum of 16 GB.

It's not hard to imagine grammatical enhancements that could allow for insertion of constraints instead of exact values. Here's one way:

server:

type: Server

directives:

- inventory

attribute_constraints:

ram:

- { greater_or_equal: 16 GB }

requirements:

- backup: Storage

I think that's quite straightforward and flexible. I'm sure we can think of other syntax, too.

In summary, we definitely want runtime matching of representation in TOSCA, but there are much, much better ways of doing so than breaking TOSCA's hallmark feature by breaking the design graph.

tosca message