More on dangling requirements

This came up at the end of the ad-hoc today, so I thought to elaborate more on what I mean here and what the implications are for the instance model.

Here's what I am assuming a dangling requirement looks like:

topology_template:

Â node_templates:

ÂÂÂ web-server:

ÂÂÂÂÂ type: Application

ÂÂÂÂÂ requirements:

ÂÂÂÂÂ - data: Database # node type

From the perspective of a graph, this semantic model explicitly has a single vertex in it, with no edges. But how would the instance model look? Is it a single vertex? If it is, then there are no edges to the graph (edges are between vertexes, not floating in space). Or, do you understand this design to imply that there indeed are two vertexes with an edge between them? If that's the case, the instance model must have some kind of implicit "placeholder" vertex.

Unfortunately, this vertex is not named in the TOSCA design and indeed is invisible to it. You can't do a "get_property" and refer to the database used by the web server (note that you could use the TARGET keyword, but only within the relationship values).

Here's the alternative grammar that I vastly prefer, in which we require explicit vertexes:

topology_template:

Â node_templates:

ÂÂÂ web-server:

ÂÂÂÂÂ type: Application

ÂÂÂÂÂ requirements:

ÂÂÂÂÂ - data: database # node template

ÂÂÂ database:

ÂÂÂÂÂ type: Database

ÂÂÂÂÂ directives:

ÂÂÂÂÂ - substitutable

The feature is exactly the same as with dangling requirements: it's the orchestrator's responsibility to provide some kind of database resource that complies with the Database node type.

However, there is no confusion here regarding the instance model (two vertexes with one edge) because it indeed follows through from the semantic model. The design phase is complete and indeed it is a full graph with no "dangling", no need to consider edges floating in space.

Moreover, I think this design is much more flexible in allowing for many different ways of specifying exactly how the orchestrator will provide the database:

1) Directives. I'm not a fan of this simplistic grammatical feature, but we could use it [did we remove it in 2.0?]:

ÂÂÂ database:

ÂÂÂÂÂ type: Database

ÂÂÂÂÂ directives:

ÂÂÂÂÂ - substitutable

ÂÂÂÂÂ - provisionable

ÂÂÂÂÂ - allocatable

2) Policies. Much better because these are typed and complex and can thus model the actual mechanism used by the orchestrator (and be part of a profile):

ÂÂÂ policies:

ÂÂÂÂÂ database:

ÂÂÂÂÂÂÂ type: Provisioning

ÂÂÂÂÂÂÂ targets: [ database ]

ÂÂÂÂÂÂÂ properties:

ÂÂÂÂÂÂÂÂÂ optional: true

ÂÂÂÂÂÂÂÂÂ machine-types: [ virtual, baremetal ]

ÂÂÂÂÂÂÂÂÂ compatibility: [ mariadb, postgresql ]

ÂÂÂÂÂÂÂÂÂ cluster: true

ÂÂÂÂÂÂÂÂÂ redundancy: 2

Note that the dangling requirement grammar doesn't, as it stands, doesn't have a clear way to specify whether the requirement is optional or not, or whether it is conditionally optional, e.g. "nice to have" in certain situations, a "hard" requirement in others. A policy can allow us to have "optional: true" or something more complex, as needed.

This is why I keep emphasizing that a node instance (in the instance model!) can have zero corresponding resources in the real world. If the provisioning policy indeed allows for fulfilment to be optional then there might not be any database at all. It's still part of the template, it's just not part of what ended up being deployed. Likewise, the relationship is there in the design (complete graph), but there are zero actual database connections in the real world.

Also note the "redundancy: 2". This could mean that two database resources in the real world are provisioned. So, the application would need to configure two separate connections, one to each database resources. The instance model is two vertexes with a single edge between them, but that single edge represents two connections. (Whether it's "one to many" is very implementation specific, and indeed the connections might be of different types: perhaps there can be only one "read-write" primary database while the others are "read-only" secondaries).

And it can get more complex: perhaps later on in the runtime lifecycle (day 2) a database becomes available, and because it's a "nice to have" suddenly there would be a non-zero correspondence between the instance model and the world.

Nevertheless, despite all this runtime complexity the instance model is always the same: two vertexes with a single edge between them.

(Final note on this feature: there are policy frameworks out there with much richer grammar than TOSCA can ever hope to provide. In those cases, we would probably want to include those specifications as artifacts. But artifacts are currently only attached to nodes, not policies. And generally to my knowledge we have never discussed how external policy frameworks would interact with TOSCA.)

3) Rich node and capability types that model ranges of possibilities rather than or in addition to specific hard values by using lists and maps instead of single values:

ÂÂÂ database:

ÂÂÂÂÂ type: Database

ÂÂÂÂÂ directives:

ÂÂÂÂÂ - substitutable

ÂÂÂÂÂ capabilities:

ÂÂÂÂÂÂÂ machine:

ÂÂÂÂÂÂÂÂÂ properties:

ÂÂÂÂÂÂÂÂÂÂÂ types: [ virtual, baremetal ]

In this case the "machine" capability might have a "type" attribute, which is not a list, and which specifies the exact type of the node instance.

It's worth discussing a possible limitation to all the above approaches (including the "dangling requirements" approach). How do we express more complex selection logic? For example: "provision a baremetal machine with at least 2 GB of RAM, but if it's a virtual machine require at least 4 GB of RAM". It's definitely possible to create data types that can express and/or, but that only covers branching logic. The decision flow might not be tree-shaped at all, and indeed can itself be a graph. Moreover, it might not be a finite algorithm at all: there might be a ML-based system that makes the provisioning decision on its own, and at best we can provide certain hints to our preferences as well as hard (regulatory) requirements.

My point is that this is not a limitation but a feature. :) We want TOSCA to be able to model all kinds of systems for requirement fulfilment, provisioning, allocation, and placement, including innovative approaches that have not been invented yet. In my view, doing so means improving our data type grammar for allowing more robust logic, e.g. support for anyOf or allOf for lists and maps, etc.

tosca message