RE: [tosca] Towards an instance model for TOSCA

Hi Tal, thanks for the detailed write-up. Iâll need more time to digest the entire message, but I wanted to respond specifically about your comments about requirements, since it appears we have a fundamental difference of opinion here:

In my opinion, the only useful reason to have requirements in service templates (rather than relationships) is to allow an orchestrator to fulfill these requirements at orchestration time (i.e. to leave the requirements dangling). Therefore, requirements are exclusively a run-time issue. If you know at design time which node/capability is needed to fulfill a requirement, then use an explicit relationship instead.
Used in that way, requirements are actually the most important and most useful feature in TOSCA: they allow service designers (and more importantly, component designers) to explicitly define resources required by their components without having to specify where those resources are supposed to come from. Having a language that allows you to define a service/component topology as well as the resource requirements for that topology all in the same place is extremely powerful.
All components that are intended to be âcreatedâ by a TOSCA orchestrator need resources on which to orchestrate those components. Or said a different way: every TOSCA node in a TOSCA service template needs some other node THAT YOU ALREADY HAVE. An orchestrator canât create anything out of thin air. Whatâs bugged me most about TOSCA is the idea that âsomeâ components (such as Compute nodes) are considered to be âfundamental building blocksâ that every orchestrator is supposed to be able to deploy, and as a result these components donât have requirements (specifically they donât have a HostedOn requirement). An orchestrator is supposed to know how to create them somehow and somewhere. However, if an orchestrator is supposed to create Compute nodes, it needs cloud infrastructure on top of which to create those Compute nodes. The need for cloud infrastructure should be expressed as a requirement in a Compute node. Instead, it is assumed that the orchestrator âknows aboutâ available cloud infrastructure somehow, and information about available cloud infrastructure is communicated to the orchestrator without using TOSCA. This assumption is guaranteed to make service templates non-portable, since there is now way to discover what infrastructure an orchestrator has available, and how it is matched to nodes that need the infrastructure.
The obvious question, then, is where an orchestrator is supposed to find the âresourcesâ that it needs to use to fulfill dangling requirements. As I said before, orchestrators canât just create resources out of thin air, and even if they could, it would be entirely inappropriate (as well as technically challenging) to orchestrate nodes just for the purpose of using those nodes as resources to be used for other nodes. The only reasonable approach, in my opinion, is for an orchestrator to use an inventory of available resources and fulfill dangling requirements by allocating nodes/capabilities from that inventory. That is exactly what I do in my implementation.
Of course, this means that nodes in inventory need a TOSCA representation so that their capabilities can be matched to dangling requirements in service templates. I accomplish that by using TOSCA service templates (containing node templates) to represent resources that need to be âonboardedâ into an inventory. This also has the extremely useful side-effect that onboarded resources have the exact same representation in the inventory (the âinstance modelâ) as orchestrated services, and as a result either of them can be used to fulfill requirements for any future services that need to be orchestrated.

Again, in my opinion dangling requirements are the most useful and most important feature in TOSCA. We need to discuss how to make them even more powerful, not how to remove them.

Thanks,

Chris

From: tosca@lists.oasis-open.org <tosca@lists.oasis-open.org> On Behalf Of Tal Liron
Sent: Wednesday, October 02, 2019 7:47 PM
To: tosca@lists.oasis-open.org
Subject: [tosca] Towards an instance model for TOSCA

To follow up on our Tuesday discussion, here's a potential starting point for thinking about clarifying TOSCA's instance model.

The core principle that we should maintain is that a node template != a node instance. To elaborate: a node template can have zero or more node instances associated with it.

To elaborate even further: TOSCA should not dictate the semantics of node instances. For example, in Kubernetes and other scheduling paradigms the actual number of instances is a cloud-native orchestration decision. TOSCA is here to provide the guidelines and hints for such orchestration. This could happen via the actual properties and attributes of the node template or its capabilities, metadata in the node type, directives, policies, and can even take into consideration existing relationships from/to the node template, and really the topology as a whole, the run-time environment that is external to TOSCA, and even past decisions in similar situations (machine learning). Likewise, I see TOSCA as very useful for describing cloud configuration topologies, such as datacenter infrastructure (classes of physical machines), software-defined networking, etc. In these cases the "instance" ends up being the act of configuring a virtual or physical component. It is not necessarily an enumerable item, neither "one" nor "zero". The less TOSCA has to say about what an "instance" is, the more widely usable it can be.

But of course, as it stands, there are aspects in TOSCA that are ambiguous or downright nonsensical without understanding the interplay between node template and instance. E.g. attributes in general, the get_property function, policy targets, etc.

I would say that what we're really trying to clarify here is the difference between design-time and run-time state in TOSCA, because TOSCA does have something to say about both.

PROPERTIES

Let's start with what is clear right now. :) First, node template properties -- we take as granted that all instances of a node template would have the same property values. In other words: properties are immutable. In yet other words: properties are design-time state. An interesting implication here is that, actually, properties are attached to the node template rather than the node instances. So, even if there are zero instances of the template, the property should still be considered part of the state. get_property would thus work in both in design-time and run-time, because the node template always "exists" in this sense.

ATTRIBUTES

By contrast, attributes are run-time state. I've argued previously that get_attribute should always return a list -- in case there are zero node instances it would be an empty list. So, how many values would be returned in case there are more than zero instances? This should be entirely implementation-specific. There can be orchestration scenarios in which it would make sense to only return only one value. In other words: an attribute represents a "class" of values of a specific name and type. Or maybe each instance has a different attribute value. An interesting implication could be that, actually, we could allow values that are of the attribute type or of derived types. The attribute type represents a contract for this dynamic state.

It's important to remember that attributes and/or their types can have constraints. This means that whatever mechanism is used to store attribute state, it is expected to comply with these constraints. There is some leeway here as to when the constraints would apply. For example, one could say that they would only be applied when calling get_attribute -- whatever state is there that does not comply would result in an error message or perhaps be discarded. Or, this could apply in the storage mechanism itself: non-compliant state would not be allowed to be inserted. The error (or warning) would occur elsewhere in the orchestration system.

Should TOSCA have a say as to how enforcing attribute compliance would work? I can't think of a reason why it should. The only thing that TOSCA should care about in this case is that get_attribute always returns compliant values when it does succeed. We need this function, at least, to be deterministic.

There's an important implication here: get_attribute can only make sense as a run-time function (similarly get_operation_output and get_artifact). A TOSCA parser can't technically "call" it, but instead must leave it as a stub to be called during orchestration and indeed implemented by an external system. This extremely important distinction is not brought up in the TOSCA spec right now, but I think we really need to divide TOSCA functions between design-time and run-time functions. (Note that the TOSCA parser could potentially still validate that the function is written correctly. For example, if an attribute name hasn't been defined in the node type, then there's no technical justification to wait until run-time to see that it would fail.)

There's another important grammatical point: the semantic meaning of SELF (I've argued that we should get rid of this keyword, but I acknowledge it as implicitly there). When you are using get_property, it doesn't matter if SELF is the node template or node instance, because properties mean the same for both. But for get_attribute it's quite different. Likewise for attribute mappings in notifications, where SELF seems to be taken for granted. In these cases SELF could make sense only according to however the orchestration solution understand node instances. It's very important, then, to define what SELF is when used in various contexts. More on that below.

INTERFACES, OPERATIONS, AND NOTIFICATIONS

When an operation is executed, does this happen on all node instances at once in parallel? Or serially? Would it fail if only of them fails? Would separate failure states need to be maintained? What happens if there are no instances of the node template? And what would all this mean for a workflow, in which a step could be the execution of an operation? And what do we with operation outputs? (An especially poorly defined concept right now.) None of this is clear. (Relatedly: TOSCA does not have anything to say about run-time error handling for operation calls.)

To follow the general principles I stated above, I would say that TOSCA should not have an opinion here. The "classical" orchestration solutions that would work with these features are very varied in the their approaches, and ideally we would want TOSCA to support all of them. So, whether the operation is done on all node instances or not, and even whether it makes sense if there are "zero" instances, should be left as an entirely up to the orchestrator. If some orchestrators need more "hints" in order for them to know what to do, well, these can be provided as inputs to operations and the interface as a whole. Examples: run in serial, execute asynchronously, fail-fast, provide a timeout for failure, fail on zero instances, etc. The possibilities obviously depend on the orchestrator and its features.

(Another related point: actually, I'm not sure these execution parameters are best defined as "inputs". If we do so, it would mean that each orchestrator would need its own interface types. The way TOSCA works right now would mean that the entire type hierarchies, really the entire profile, would have to be rewritten per orchestration solution. I think, instead, there should be some way to provide metadata for execution, so that the operation "signature" could remain constant. In the ARIA project we had a rather "creative" solution for this, but it's worth discussing in the future. I have some ideas.)

Remember the issue regarding SELF above? This is where it comes into play: what if an operation input calls get_attribute on SELF? The SELF here should, then, be the node instance. Which is an implementation detail. Remember, I've argued above that get_attribute should be implemented by the orchestrator, so in the same way SELF would be interpreted here as whatever is relevant to that orchestration solution.

GROUPS

Do groups even have an instance model? Do they have any run-time reality? Do we need them to have it?

It seems to me that there has been an evolution on thinking about groups. For example, in TOSCA 1.3 (as of now), we have removed support for interfaces in groups. This, to me, implies that some of us are thinking of groups as a design-time feature.

It's worth reconsidering this. Groups can provide a straightforward way to relate many nodes together at run-time. You could probably achieve the same result using requirements/relationships -- just have all the node templates require each other in some way. But obviously the group grammar is much easier.

Run-time semantics, though, could be very different indeed: a group instance would contain node instances, not node templates. In reality, several instances of the same group (template) might contain different node instances. But this gets quite complicated fast -- do we want to say that all node instances would need to be in at least one group instance? Could there be overlaps (a node instance belonging to more than one group instance)? Do we want to provide rules for grouping? We could potentially do something like this, but it's hard to see the benefits. So, to be clear, I am not advocating that groups be instantiable.

So what can groups give us at run-time? My thinking is this -- groups should not in themselves be instantiable, but they can allow us a way to add extra properties, attributes, and interfaces to node templates and node instances regardless of the node type. They thus represent an "add-on" contract, that is still strongly typed according to TOSCA standards. To me that's seems like a valuable and useful feature that does not change anything discussed above. The properties, attributes, and interfaces are all still at the node template and node instance level regarding state. A group would thus just be a grammatical feature, and quite a powerful at that. For this to work, in many places where you can specify "node template name" in TOSCA we should also say "or group name". E.g., a workflow step to call an operation on a node template (really all instances of the node template) could be for a group instead. We'll still be executing the operation on node instances, but the contract will be the one from the group type rather than the node type.

So, yes, I am advocating that we reverse the decision to remove interfaces from groups, and indeed also add attributes to groups. Of course all the above has to explained in order to make this happen: that the idea is these are added to all node instances in the group.

Related point: I would like us to get rid of "interfaces" on relationships instead. :) They never sat comfortably with me, as it was always confusing as to where exactly this operation should "happen". Indeed, in Cloudify you must specify whether you want the operations to happen on the source node or the target node. It seemed to me to be a way to add yet another contract to nodes, "borrowed" to them via the incoming or outgoing relationships. I think putting "interfaces" in groups is a more coherent way to think about "add ons".

Final suggestion: I also think we should allow for group recursion: I see no reason why a group can't also contain other groups. E.g., Cloudify allows for this. In the end it's just adding more node instances to the group.

REQUIREMENTS AND RELATIONSHIPS (AND MORE...)

I've left the most contentious topic for last. :D

In one important way, TOSCA does separate design-time from run-time for this topic: actually, we can only specify requirements in TOSCA while it seems to be assumed that relationships are their run-time instantiation. This is similar to the movement from node template to node instance. The difference is that counting relationships is semantically important at design-time. While node templates can have zero or more instances -- and it shouldn't matter to TOSCA how many there are -- each relationship instance must be accounted for if the topology to make sense as a topology. But this "sense" must be unpacked in detail.

First, the big problem: there might very well be a difference between the design-time topology and the run-time topology. That difference has always been rather hairy in TOSCA, even at its most trivial.

For example, let's say that the node template "App" specifies a requirement for the "Hosting" capability, and we have a "Server" node template that has that capability. (Let's assume that there are corresponding node types for each.) This seems to be a very clear design-time interdependency. But is it? If we have 5 instances of node template App, does that mean we also want 5 instances of node template Server? Or can a single Server handle 5 Apps? This can be asked in a different way: let's say we have a node template for Server, but actually there are no other node templates that require it. Perhaps, then, it shouldn't be instantiated at all (a "zero instances" case). This "orphaned" node template could still appear as part of the design, but it doesn't participate in the run-time topology.

One possible answer to all these questions is that none of them matter in terms of designing the topology: Apps need Servers, that's what we want to express. How this would be instantiated depends on the orchestrator, and we could possibly provide hints in TOSCA using properties, metadata, directives, or -- better yet -- policies (this is actually, as I see it, an important use for policies). So, as long as the TOSCA parser finds a node template with a capability that satisfies that requirement, then the TOSCA service template should be considered valid. It would be considered "properly designed". (It could fail to deploy, but that's another matter.) How many relationships would be instantiated? Zero or more. Just as with node instances.

As long as we're still discussing design-time, let's detour slightly to Chris's point about "dangling requirements". Back to our example: let's remove the Server node template from our topology template. So now we are left with an App node template that requires a Hosting capability, but there are no node templates that can satisfy that requirement. Is this "properly designed"? Should this be considered a valid service template? In my view: no. We could here say that this requirement would be satisfied at run-time, but semantically this would be implementation-specific. What kinds of nodes would provide a Hosting capability? If they don't exist, does the orchestrator need to provision them? How? If this node suddenly appears out of nowhere, would it be accounted for at all in the design? Are there limitations on what it should be? The thing is, there is a very good way to specify all of this in TOSCA: it's a node template. :) Node templates are templates for node instances. So explicitly describing a node template would tell the orchestrator exactly the parameters needed for provisioning a node instance. And we can use policies, directives, etc., to help, as usual. The bottom line is that we don't need to support "dangling requirements" if we have a broader understanding of what a node template is in relation to node instances. "Dangling" could mean that there would be zero node instances of the target node template, and that could be totally fine, or not, according to the implementation and our policies.

OK, let's take this a step further. As a real-world example, let's say that we want our App hosted, but we don't care if it's hosted on a physical Server or, say, a VirtualMachine. (Let's assume that both node types have the Hosting capability.) If we can't leave the requirement "dangling", as I suggested, then we must add a node template. According to our node type hierarchy, we might have the option of putting just one node template, let's call it "Host", and have it be of a base type from which both Server and VirtualMachine derive. We would thus be leaving it up to the orchestrator to decide what "an instance" would mean. (Maybe none will be available and it will fail to deploy, but at least our design is clear.) If we don't have a common base type (due to TOSCA's unfortunate reliance on object-orientation and single inheritance) then another option would be to specify two node templates, one of which is a Server and one of which is a VirtualMachine. But now we have a problem: the TOSCA parser would satisfy the requirement once, with either of the node templates. The selection criteria would be arbitrary. The problem is that this selection happens at design-time, whereas we want the decision to happen at run-time.

Are we stuck? No, because this is my setup to move the discussion along. :) The point I want to make is this: satisfying a requirement is entirely design-time, but instantiating a relationship is entirely run-time. This hard distinction can go a long way towards clarifying many ambiguities and allow us to improve the grammar.

Back to our example: rather than saying that the TOSCA parser must select either Server or VirtualMachine as the target node template of the requirement, let's say that both of them satisfy the requirement, as they indeed do. Wasn't that easy? And now we've removed any arbitrariness. And we've properly expressed exactly what we wanted to express: that either of these node templates -- or both -- would satisfy the requirement. A parser error would happen only if nothing satisfies the requirement. If we were to draw a diagram of our design-time topology, then there would be circles for App, Server, and VirtualMachine, and there would be a line from App to Server and a line from App to VirtualMachine. These lines are not relationships. They are something that we have neither identified nor articulated properly in TOSCA. I propose to call them "potential relationships", or perhaps just "potentials". It tells the orchestrator very precisely where a relationship can be instantiated in a way that agrees with the design. We have a coherent design-time topology, and I think it's quite easy to see how a run-time topology would differ from it while still having an unambiguous connection to it.

A "potential" would be instantiated as a "simple" relationship if we did not associate a relationship type with the requirement. Or, it could use that relationship type if we did provide one. (There is a grammatical feature in TOSCA called "relationship template", but I think it's a terrible feature that I hope we can remove. It is really more like a variation on a relationship type, and entirely unlike how node templates get instantiated to node instances. It's just bad.)

An important emphasis: for this diagram, we want to make sure that both of these "potential" lines come out from the same point from the App circle. The reason is that "potentials" are grouped together according to the requirement that created them. If App has other requirements, then they would have their own origin points on the App circle with their own "potentials" extending out. Also, the target of each line is a capability, so it would also be a certain point on the destination circles for Server and VirtualMachine. Each capability has its own incoming "potentials".

We've achieved another important thing here: we've freed relationship instantiation from some quirky and unclear specifications in TOSCA. Let's dive in! First, take the "occurrences" range keyword, which exists in two places in TOSCA: in requirement definitions within node types,and in capability definitions. It's always been a struggle for me to understand what to do with them, although I can appreciate the intent. Let's start with "occurrences" in requirements. I understand this as a syntactical count: it implies that the node template must specify the requirement a number of times that fits within this range. In other words, it can mean that some requirements are "required" (overloaded term alert!) while others are "optional". So, for our App node type we can give the requirement the occurrences range of [ 1, 1 ] meaning that it must be specified once and only once. (By the way, thank you for whoever finally fixed the definition of "range" to allow for >= in TOSCA 1.3! I always thought it was wrong to have it as >.) This seem to makes sense semantically, right? An App instance is installed only in one place, right? And it also makes sense that we would have to provide it with a host (minimum 1 in the range), otherwise it can't be instantiated, right?

All of this makes sense in run-time, but we've expressed it in a very poor way in design-time. We've essentially implied that the App node template could have only one "potential" target satisfy that requirement, because otherwise we would want to allow a range of [ 1, UNBOUNDED ] so that we could specify the requirement more than once, supposedly meaning that various instances of App could be installable on various hosts. But, is that what specifying the requirement twice means? Does it mean either relationship is OK? Or does it mean both must be created? (logical or/and ambiguity.) Are we saying that we want one instance of App installed in two places? Or do we need to create two instances, one for each host? Let's ask this a different way: what could and what should it mean to specify the same requirement more than once? Do we need to add grammar for or/and?

If we understand that requirements create "potentials" rather than relationships then it's not so confusing anymore. Each time we specify the requirement we're allowing for more "potentials". Back to our example: App requires the "Hosting" capability. Both Server and VirtualMachine provide it, so "potentials" are created. But let's say that there's yet another node template in our topology: Container. Container also has "Hosting" capability, but we don't want App to be hosted on it (our design decision). So we need to narrow down our requirement somehow. What we can do, then, is specify the requirement twice: first explicitly to the Server node template, and then explicitly to the VirtualMachine node template. This doesn't mean "create two relationships", but rather "create two potentials". Just to clarify this example: if we leave App with the general requirement for "Hosting", then it would create three "potentials", to Server, to VirtualMachine, and to Container. It's not a matter of "and" or "or" here: how the relationships are actually instantiated would depend entirely on the orchestration implementation, policies, etc. The relationships can only happen where "potentials" were created, which is the design we want. Indeed their instantiation relationship happens together with deciding how many node instances"need to be created per node template. The decision should actually happen together depending on the nature of the nodes and the relationships.

So, I say: let's get rid of "occurrences" in requirement definitions. But the remaining issue to address is whether some requirements are "required" or "optional". Simple: let's add a "required" keyword, and make it "false" by default. And I'll say even more: if a requirement is "required" it doesn't mean that you have to specify it at the node template. We can easily rule that if you don't specify it then it will be assumed to automatically be there in the most generic sense (just a requirement for the capability type with no node filtering). That makes sense to me, because that's what the node type designer intended: that the requirement always be there. (This is kinda what Puccini does today if "occurrences" is not specified.)

Another place where we have "occurrences" is in capability definitions. This is equally if not more confusing. If "occurrences" in requirements is syntactical, here we have a vaguely defined semantic use. I think the intent it to somehow specify capacity. For example, let's say a NetworkSwitch node type has 20 ports, so its "Port" capability "occurrences" would be [ 0, 20 ]. Well, first let's think: what does it mean to have a minimum in this range? In Puccini, I've taken this to mean that that the capability must have at least the lower bound and at most the upper bound of incoming "relationships", and indeed Puccini checks for these after satisfying all requirements (it gets quite complicated). But, again, this is a very poor way to express such an intent at design-time. It seems to assume that every time a requirement is satisfied then a relationship is created: in this case, we can imagine a physical cable connecting to the "Port" of NetworkSwitch. But is that really what is expressed here? Because, remember, we are specifying the requirement for a node template, not for a node instance. What we've really done (I think?) is limit the number of incoming relationships from node templates. And if that's the case, it's really hard to understand what the semantics are. We might have 30 node templates in our topology that need to connect to the NetworkSwitch. Or just 5. But the actual number of node instances of each of these is what we want to refer to here. What is the point of limiting the number of satisfied requirements, or requiring a minimum number? It's hard to understand what "capacity" could mean in term of design-time elements like node templates. The capacity of each NetworkSwitch instance should be a property of the node template or capability definition (possibly even an attribute of node instances if we want to allow for various kinds of physical boxes for the NetworkSwitch node template, including some that might have broken ports and thus lower "run-time" capacity). Or maybe it can be a policy. The usual orchestration would then apply: node instances and relationship instances are created together according to various implementation-specific factors. Let's get rid of "occurrences" here, too.

What we're left with is grammatically almost identical to what TOSCA is right now. The change I am proposing is in regards to semantic interpretation of requirements and relationships -- with the addition of the concept of "potential" -- all of which of course will have cascading effects on how we understand various mechanisms in TOSCA.

THANKS

As usual, thank you for being patient with my verbose emails and reading thus far. Unfortunately, I don't see a choice: there remain gaping holes in TOSCA and we need to spend quite a lot of effort to fill them with solid ideas. Onward!

tosca message