RE: [tosca] More on dangling requirements

Thanks Tal. Comments in-line

From: tosca@lists.oasis-open.org <tosca@lists.oasis-open.org> On Behalf Of Tal Liron
Sent: Tuesday, November 17, 2020 10:10 AM
To: tosca@lists.oasis-open.org
Subject: [tosca] More on dangling requirements

This came up at the end of the ad-hoc today, so I thought to elaborate more on what I mean here and what the implications are for the instance model.

Here's what I am assuming a dangling requirement looks like:

topology_template:

node_templates:

web-server:

type: Application

requirements:

- data: Database # node type

Yes, and presumably the Application node type already defines the âdataâ requirement to ârequireâ a node of type âDatabaseâ, in which case you donât even have to specify the âDatabaseâ type here. In addition, you will likely also want to specify a node filter to âfilterâ the set of Database nodes that will be considered to fulfill this particular requirement. The node filter (together with the node and capability types specified in the requirement definitions) define the âqueryâ that you will run against your inventory to find the set of suitable nodes for fulfilling this requirement.

From the perspective of a graph, this semantic model explicitly has a single vertex in it, with no edges.

Before the ârequirement fulfillment phaseâ, this is correct.

But how would the instance model look? Is it a single vertex? If it is, then there are no edges to the graph (edges are between vertexes, not floating in space). Or, do you understand this design to imply that there indeed are two vertexes with an edge between them? If that's the case, the instance model must have some kind of implicit "placeholder" vertex.

If the requirement is mandatory (i.e. the âoccurrencesâ keyword in the requirement definition has a lower bound that is greater than zero), then the orchestrator will find a suitable node during the requirement fulfillment phase, and create an edge between the node that has the dangling requirement and the node from inventory that was used to fulfill the requirement.

Unfortunately, this vertex is not named in the TOSCA design and indeed is invisible to it.

The target node is not named in the service template that has the dangling requirement, but it surely was named in its own service template (i.e. the template from which that node was originally created).

You can't do a "get_property" and refer to the database used by the web server (note that you could use the TARGET keyword, but only within the relationship values).

You can absolutely do a âget_propertyâ to refer to the database. Since your template specifies that the target node for the âdataâ requirement is of type Database, your parser can validate that âvalidâ property values are being retrieved.

In my opinion, this is one of the most useful features of TOSCA: it allows for the âpullâ mechanism for getting data into node types without needing additional functions. I use this all the time in my orchestrator.

Here's the alternative grammar that I vastly prefer, in which we require explicit vertexes:

topology_template:

node_templates:

web-server:

type: Application

requirements:

- data: database # node template

database:

type: Database

directives:

- substitutable

Yes, the TOSCA specification shows this as an alternative grammar for âdangling requirementsâ, although your example shows the wrong directive: if the âdatabaseâ node is supposed to be found using ârequirement fulfillmentâ, you use the âselectâ directive instead of the âsubstituteâ directive. The âsubstituteâ directive is only intended to be used for substitution mapping. If you use the âselectâ directive with a node template, then that node template can also define a ânode filterâ (similar to the way dangling requirements can define node filters).

The feature is exactly the same as with dangling requirements: it's the orchestrator's responsibility to provide some kind of database resource that complies with the Database node type.

According to the spec, this mechanism for specifying dangling requirements is useful when you want to indicate that multiple dangling requirements need to be fulfilled by the same node instance.

However, there is no confusion here regarding the instance model (two vertexes with one edge) because it indeed follows through from the semantic model. The design phase is complete and indeed it is a full graph with no "dangling", no need to consider edges floating in space.

I donât think there is confusion either way: if a dangling requirement is not mandatory, it will not result in an edge in the instance model. If it is mandatory, it will result in an edge in the instance model (or the orchestration will fail if a suitable target node cannot be found).

Moreover, I think this design is much more flexible in allowing for many different ways of specifying exactly how the orchestrator will provide the database:

TOSCA doesnât specify how the orchestrator is supposed to provide the (inventory) database, so Iâm not sure what additional flexibility is needed?

1) Directives. I'm not a fan of this simplistic grammatical feature, but we could use it [did we remove it in 2.0?]:

database:

type: Database

directives:

- substitutable

- provisionable

- allocatable

On the contrary, we made directives mandatory in v1.3 to make orchestration actions explicit. The designer uses the âsubstituteâ directive to indicate that a node is abstract and needs to be substituted, or the âselectâ directive that a node must be retrieved from inventory. We need to expand on directives support to allow for multiple directives in the same template, which would support âcreate-if-not-existsâ or âsubstitute-if-not-existsâ scenarios.

2) Policies. Much better because these are typed and complex and can thus model the actual mechanism used by the orchestrator (and be part of a profile):

policies:

database:

type: Provisioning

targets: [ database ]

properties:

optional: true

machine-types: [ virtual, baremetal ]

compatibility: [ mariadb, postgresql ]

cluster: true

redundancy: 2

The problem with policies as you use them is that they have absolutely no (language) semantics associated with them. All the semantics are encoded in the properties, which means that they can only be processed by an external domain-specific entity that knows what these properties mean.

Note that the dangling requirement grammar doesn't, as it stands, doesn't have a clear way to specify whether the requirement is optional or not, or whether it is conditionally optional, e.g. "nice to have" in certain situations, a "hard" requirement in others. A policy can allow us to have "optional: true" or something more complex, as needed.

As I stated earlier, the âoccurrencesâ keyword in the requirement definition specifies whether the requirement is mandatory or optional. Do you have example where âconditionally-optionalâ should be used?

This is why I keep emphasizing that a node instance (in the instance model!) can have zero corresponding resources in the real world. If the provisioning policy indeed allows for fulfilment to be optional then there might not be any database at all. It's still part of the template, it's just not part of what ended up being deployed. Likewise, the relationship is there in the design (complete graph), but there are zero actual database connections in the real world.

Fulfilling dangling requirements is strictly an âinstance modelâ action: it connects two nodes (vertices) in the instance model graph using an âedgeâ). If necessary. Whether anything happens in the real world because of this depends on the (domain) specific artifacts (or whatever else you use to reflect the instance model into the real world).

Also note the "redundancy: 2". This could mean that two database resources in the real world are provisioned. So, the application would need to configure two separate connections, one to each database resources. The instance model is two vertexes with a single edge between them, but that single edge represents two connections. (Whether it's "one to many" is very implementation specific, and indeed the connections might be of different types: perhaps there can be only one "read-write" primary database while the others are "read-only" secondaries).

Yes, that is completely fine and again independent of any instance model considerations. If your implementation creates two physical instances from one node instance in the instance model, it should feel free to do that. If that means you need lists of property values (one value for each physical instance) then of course you need to create your node types accordingly.

And it can get more complex: perhaps later on in the runtime lifecycle (day 2) a database becomes available, and because it's a "nice to have" suddenly there would be a non-zero correspondence between the instance model and the world.

Iâm not sure I understand what you mean here.

Nevertheless, despite all this runtime complexity the instance model is always the same: two vertexes with a single edge between them.

Yes, I think weâre in agreement on this. But again, this is completely orthogonal to the ârequirement fulfillmentâ discussion.

(Final note on this feature: there are policy frameworks out there with much richer grammar than TOSCA can ever hope to provide. In those cases, we would probably want to include those specifications as artifacts. But artifacts are currently only attached to nodes, not policies. And generally to my knowledge we have never discussed how external policy frameworks would interact with TOSCA.)

We have not yet had any solid discussions about TOSCA policies so clearly this is an area that needs to be explored more. However, most âexternal policy frameworksâ are really just âimperative programming languagesâ disguised as âpolicy frameworksâ. We clearly donât want to add all kinds of imperative support to TOSCA. Instead, we should add rich support for âdeclarativeâ policies.

3) Rich node and capability types that model ranges of possibilities rather than or in addition to specific hard values by using lists and maps instead of single values:

database:

type: Database

directives:

- substitutable

capabilities:

machine:

properties:

types: [ virtual, baremetal ]

In this case the "machine" capability might have a "type" attribute, which is not a list, and which specifies the exact type of the node instance.

Iâm not sure I understand what is intended here. How does a capability specify a node type (or a list of node types)?

It's worth discussing a possible limitation to all the above approaches (including the "dangling requirements" approach). How do we express more complex selection logic? For example: "provision a baremetal machine with at least 2 GB of RAM, but if it's a virtual machine require at least 4 GB of RAM".

If youâre talking about âselectionâ (i.e. requirement fulfillment i.e. find a node from inventory), then this is exactly what the âexpandedâ node filter syntax is for that we proposed several months ago. However, you use the word âprovisionâ in your example, which is different from âselectionâ. Assuming you âprovisionâ using substitution, then a substituting template for a virtual machine presumably will be different from a substituting template for a bare-metal server, and each substituting template will specify how much memory it needs.

It's definitely possible to create data types that can express and/or, but that only covers branching logic. The decision flow might not be tree-shaped at all, and indeed can itself be a graph. Moreover, it might not be a finite algorithm at all: there might be a ML-based system that makes the provisioning decision on its own, and at best we can provide certain hints to our preferences as well as hard (regulatory) requirements.

I think TOSCA currently supports all of this: a TOSCA orchestrator performs requirement fulfillment, or substitution. Both of these functions require âdecision logicâ to find the best node to fulfill a dangling requirement, or the best template to substitute an abstract node. If a TOSCA orchestrator wants to use AI/ML to help with this decision logic, it should feel free to do so.

My point is that this is not a limitation but a feature. :) We want TOSCA to be able to model all kinds of systems for requirement fulfilment, provisioning, allocation, and placement, including innovative approaches that have not been invented yet. In my view, doing so means improving our data type grammar for allowing more robust logic, e.g. support for anyOf or allOf for lists and maps, etc.

I think we all agree that we want the same. However, based on your examples in this email, Iâm not sure we need to make any changes to the current âmodelâ for how a TOSCA orchestrator is expected to work. However, we do need all the enhancements to the instance model that we have proposed over the last several months (including support for cardinality, richer TOSCAPath syntax, and streamlined condition/constraint clauses to be used in filters and policies). Iâm hoping we can make progress on those over the next several months.

Thanks,

Chris

tosca message