Using pre-existing representations at runtime (without breaking the TOSC

Subject: Using pre-existing representations at runtime (without breaking the TOSCA graph)

I want to clarify some comments I made in the discussion in the last ad hoc meeting. I feel I've repeated the same arguments again and again but perhaps explaining them in a new way would help.

I am not at all opposed to the idea that an orchestrator could make use of pre-existing representations during Day 1 deployment. I indeed agree that it is a crucial aspect of TOSCA. But I specifically disagree that allowing for partial graphs, or "dangling" requirements, is a good way to achieve this.

Let's first examine the damage that "dangling" causes. Here's an example:

capability_types:

Â Host: {}

Â Storage: {}

node_types:

Â App:

ÂÂÂ requirements:

ÂÂÂ - host: Host

Â Server:

ÂÂÂ capabilities:

ÂÂÂÂÂ host: Host

ÂÂÂ requirements:

ÂÂÂ - backup: Storage

Â Baremetal:

ÂÂÂ derived_from: Server

Â VM:

ÂÂÂ derived_from: Server

Â Store:

ÂÂÂ capabilities:

ÂÂÂÂÂ storage: Storage

topology_template:

Â node_templates:

ÂÂÂ public-web:

ÂÂ Â type: App

ÂÂ Â requirements:

ÂÂ Â - host: Host

ÂÂ Â Â dangling: true

ÂÂ admin-web:

ÂÂ Â type: App

ÂÂ Â requirements:

ÂÂ Â - host: Host

ÂÂ Â Â dangling: true

ÂÂ backup:

Â ÂÂ type: Storage

Note that I invented a "dangling" keyword here just to make the intent clear. (I think Chris is imagining a heuristic to automatically determine whether a requirement is "dangling" or not, but let's not get caught up on that here.)

The point of leaving these requirements "dangling" in this use case is that 1) we want to keep the design agnostic to the platform, and 2) we specifically do not want to create a new server, but instead use this topology to install a service on existing servers. In other words, we'd want the orchestrator to find us a suitable server from whatever inventory it manages or knows about (e.g. by querying the platform). Maybe it will be a VM, maybe a baremetal machine, etc.

There are several potentially fatal problems with this approach.

First, we have no way of telling if both public-web and admin-web would end up being on the same server. Unfortunately, that's exactly what we want in this case. The reason being is that this service also includes a backup node and let's say that for technical reasons it can only backup a single server. But maybe it's not a technical limitation, but rather a design consideration. The architect here wants both apps to be on the same server because they share some kind of configuration or whatever. Bottom line: it needs to be a single server.

Second, there is no way to specify that we want the server to use our backup node. But because the server does not exist in our topology there is no way to assign requirements on it.

We essentially have a broken design here. We know there is a server (because the "dangling" requirement would have to be fulfilled) but we cannot refer to it because it simply does not exist in TOSCA.

The problem is not just that it's invisible, it's that it's non-deterministically invisible. Again, we have no way of knowing whether a single server will be used or two servers. That would have to be determined by runtime, perhaps according to availability in the inventory, and makes it impossible to properly design the "backup" part of our topology. We have no one topology, but a set of possible topologies. (Actually, it's even more complicated here because we are requiring a capability type, not a node type. So we don't even know if we will be getting a server-type node. Of course we can also require a Server node type, but I want to keep this loose for now in order to facilitate a discussion below.)

This goes against everything I think is important about TOSCA, namely that it allows Day 0 design to be validated deterministically. To this end we've actually limited TOSCA in several ways -- for example, there are no conditionals to the design, no "if/else" that can be applied to node templates. This is because we know that if the graph is not well determined and well understood we are losing the strict validation that is TOSCA's hallmark. "Dangling" is damage.

So, what can we do instead, if we don't want to break the graph? Quite a lot! We can do almost everything we need, in fact.

Here's the same use case as above with nothing left "dangling":

topology_template:

Â node_templates:

ÂÂÂ public-web:

ÂÂ Â type: App

ÂÂ Â requirements:

ÂÂ Â - host: Host

ÂÂ admin-web:

ÂÂ Â type: App

ÂÂ Â requirements:

ÂÂ Â - host: Host

ÂÂ backup:

Â ÂÂ type: Storage

ÂÂ server:

ÂÂÂÂ type: Server

ÂÂÂÂ directives:

ÂÂÂÂ - inventory

ÂÂÂÂ requirements:

ÂÂÂÂ - backup: Storage

The only difference is the addition of the explicit "server" node template. With this none of the requirements are left dangling and we can 1) be sure that there will only be one server, and 2) link up that server to our backup node. The designer determines the topology.

You'll notice that I added an "inventory" directive to this node template. I chose that rather arbitrarily, we can definitely decide to standardize this in the spec. The point is that it tells the orchestrator to not attempt to provision a server, but instead to use a pre-existing one. That's a runtime consideration. Otherwise our design is clear, complete, and valid as it is.

But there are some subtleties to discuss. The "Server" node type is abstract. But what we have in the inventory would be representations of concrete types that derive from it: VMs and Baremetals. So, we need to allow for the orchestrator to use compatible representations for the directive, meaning of the type or of any derived type. There should be no problem from the perspective of TOSCA because those compatible representations are guaranteed to fit within this base-typed node template. We do that all the time in TOSCA.

Another point I want to make is that this is grammatically identical to substitution. The only difference would be the directive used. However, there would be a difference, of course, in what the orchestrator would be doing, and also a difference in representations. Chris mentioned that he thinks that every node template should have exactly a single node representation, but I think substitution is an obvious case of breaking that, because that single node template would be represented by the many node representations of the substituting topology. And, indeed, this "inventory" directive could allow for any compatible representation(s), not just one.

Now, I do want to discuss two deficiencies to this approach, which I think are non-fatal and could be improved upon. (As opposed to the "dangling" approach, which I think is totally fatal.)

The first is a certain apparent loss of flexibility in terms of type. Specifically in the "dangling' example above we are requiring a capability type but not a node type. But by using a node template we must give it a type. It could be an abstract base type, as we've done here, but that still limits the selection in a way that selecting by capability does not.

I personally don't think this loss of flexibility is too bad. In a way it is a strength, as it tightens the design and removes the risk of introducing unknown node types to the topology. On the other hand, we might have use cases in which we are dealing with arbitrary 3rd-party TOSCA profiles that do not all use the same base types and thus we might want to leave the node type as a wildcard. I'm not 100% sure it's a good idea to support such use cases, but assuming we do, there are potential grammatical workarounds. For example, I can imagine a special "compatibility node type" that does not require object-oriented derivation, but instead can be tested for compatibility directly. (We discussed something like this at some point for another use case.) Here's an example of how it could look:

node_types:

Â ServerLike:

ÂÂÂ compatibility: true

ÂÂÂ capabilities:

ÂÂÂÂÂ host: Host

ÂÂÂ requirements:

ÂÂÂ - backup: StorageÂ

topology_template:

Â node_templates:

ÂÂÂ public-web:

ÂÂ Â type: App

ÂÂ Â requirements:

ÂÂ Â - host: Host

ÂÂ admin-web:

ÂÂ Â type: App

ÂÂ Â requirements:

ÂÂ Â - host: Host

ÂÂ backup:

Â ÂÂ type: Storage

ÂÂ server:

ÂÂÂÂ type: ServerLike

ÂÂÂÂ directives:

ÂÂÂÂ - inventory

ÂÂÂÂ requirements:

ÂÂÂÂ - backup: Storage

This new "ServerLike" type has a "compatibility: true" marker (which I invented here) that specifies that representations are checked against its definitions. So, to be compatible with "ServerLike" a node type would have to have a capability definition named "host" of type Host and a requirement definition named "backup" of type "Storage". It would not have to be derived from ServerLike. This would lower the bar for 3rd-party profile designers, because they would just need to adhere to these definitions and do not have to derive from a shared based type.

Again, I want to emphasize that I don't think this feature is necessary, but assuming we want it this could possibly work.

Another apparent deficiency of this approach is that it's not immediately apparent how to "softly" constrain the representation selection. When we deal with design-time requirements, we can use node filters (and capability filters) that allow us to constrain properties. So, for example, we could require a server that has at least 16GB of RAM. That appears to be quite flexible. However, it's not really that flexible. The issue is that some of the most interesting aspects for runtime matching are not properties, but attributes. Even something like RAM is better understood as an attribute and not a property. Sure, for a baremetal machine it's probably something that will not change in runtime, unless a technician comes in and inserts more RAM to the motherboard. But virtual machines can change their RAM capacity dynamically, even without rebooting. And that's a very viable feature for orchestration. For example, if a new application is going to be installed into a VM and there's not enough RAM, then it makes a lot of sense for the orchestrator to increase the VM's RAM in order to make room. This is exactly the kind of smart optimization that we expect good orchestrators to do in order to conserve resources and maximize uptime. So, again, it shows just how closely TOSCA requirements are tied to design-time considerations and are not the right tool for runtime.

Let's put aside "soft" constraints for a moment and look at an example of a "hard" constraint:

ÂÂ server:

ÂÂÂÂ type: Server

ÂÂÂÂ directives:

ÂÂÂÂ - inventory

ÂÂÂÂ attributes:

ÂÂÂÂÂÂ ram: 16 GB

ÂÂÂÂ requirements:

ÂÂÂÂ - backup: Storage

Here we specified an exact value for the RAM attribute and it would not be a problem for the orchestrator to check against it when going through its inventory of representations. Exact values can work fine for some kinds of attributes (especially booleans) but in cases like these we want a "light" constraint. Specifically we want to say that we want a minimum of 16 GB.

It's not hard to imagine grammatical enhancements that could allow for insertion of constraints instead of exact values. Here's one way:

ÂÂ server:

ÂÂÂÂ type: Server

ÂÂÂÂ directives:

ÂÂÂÂ - inventory

ÂÂÂÂ attribute_constraints:

ÂÂÂÂÂÂ ram:

ÂÂÂÂÂÂ - { greater_or_equal: 16 GB }

ÂÂÂÂ requirements:

ÂÂÂÂ - backup: Storage

I think that's quite straightforward and flexible. I'm sure we can think of other syntax, too.

In summary, we definitely want runtime matching of representation in TOSCA, but there are much, much better ways of doing so than breaking TOSCA's hallmark feature by breaking the design graph.

tosca message