Re: [tosca-comment] Identifiers in reference to multiplicity discussions

Hi Paul,

A couple of clarifications (these may be obvious, but I just want to make sure we’re on the same page):

There are actually four “kinds” of entities that are relevant for this discussion:

TOSCA node types: these define re-usable components
TOSCA node templates: these define (typed) components in a service. Node templates assign specific values (often using intrinsic functions) to the properties etc. defined in their types. It is not uncommon to have multiple node templates of the same node type in a service template.
TOSCA node representations: at deployment time, a TOSCA orchestrator “marries” service templates with deployment-specific input values in order to create a (run-time) representation of the service that is to be deployed/managed. Node representations must be fully resolved, i.e. all properties must have actual values (instead of using an intrinsic function), all requirements must be fulfilled, etc.)
Node implementations: these are the actual service entities in the “external” world that correspond to the node representations managed by the orchestrator

Based on this distinction between the different “kinds” of entities, it should be clear that “multiplicity” could be supported for any or all of these entities:

As I already stated, it is very common to have multiple node templates of the same node type in a given service template. These different node templates are uniquely identified using the “node template name” in the service template.
We could also allow multiple “node representations” to be created (at deployment time) from the same “node template”. The generic use case for this is a SD-WAN or VPN service where the service template might support an arbitrary number of VPN sites. The template for each of these sites would be identical, but at deployment time different input values could be specified for each individual site. The question that’s currently being discussed in the TOSCA meetings is whether this use case should be supported. Supporting this use case introduces (at least) two challenges:

How is the node representation identified? If we only allow a one-to-one mapping between node templates and node representations, then each node representation could be uniquely identified (by the orchestrator) using the corresponding node template name. If we allow multiplicity (i.e. the creation of multiple node representations from the same node template) then the node template name can no longer be used to identify a specific node representation.
What is the impact of multiplicity on the cardinality of relationships?

And finally, it is possible to create multiple (external) node implementations from the same node representation. You state that you would prefer to have one and only one node instance/implementation in the real world correspond to one node representation, but this is not something that can be controlled by TOSCA since the creation of the external entities is entirely implementation-specific (if you use pure TOSCA orchestration, then the TOSCA artifacts are responsible for creating the entities).

Given this discussion, it should also be clear that there are two types of identifiers that may need to managed:

Identifiers used by the TOSCA orchestrator to uniquely identify node representations. These should be built into the TOSCA language
Identifies used to uniquely identify entities in the external world. These are domain/implementation specific and should be modeled using TOSCA attributes.

You could combine the two types of IDs into one as you suggested by having TOSCA language support for specify which attribute should be used as a unique ID, but I would caution against using that approach because:

It muddies the distinction between what’s in the language and what is in the type system. With version 2.0, we’ve spent a lot of time cleaning this up and we should be careful to re-introduce requirements to have specific entities in the TOSCA types
Specifically, it may not be necessary or desirable to define a “unique id” in some application domains. We shouldn’t force type designers to introduce an ID attribute just because the orchestrator expects it.

On a related node, I sympathize with your comment about “property_value_expression” being underdefined. We have a number of proposals to rectify this, but they require agreement on multiplicity support for TOSCA node representations before those proposals can be discussed.

Thanks,

Chris

From: tosca-comment@lists.oasis-open.org <tosca-comment@lists.oasis-open.org> On Behalf Of paul.m.jordan@bt.com
Sent: Friday, February 19, 2021 2:25 AM
To: tliron@redhat.com; tosca-comment@lists.oasis-open.org
Subject: RE: [tosca-comment] Identifiers in reference to multiplicity discussions

Tal,

My suggestion would be that by default a node_template refers to one and only one node instance in the real world. That way there is a clear distinction between node type definitions (which are the specification) and node templates (which are instances).

That default position would be overridden where there would otherwise be a need to write out in full multiple node templates which are all derived from the same node type definition and differ only in parameter value assignments. In effect the’ occuances’ syntax would act as a loop over node template.

The syntax for creating that loop would include mandatory indicator of which parameter or attribute definition (defined in the node type def) is to be used as the ID for items in the range. The assigned value for that ID would only need to be unique within the scope of the cluster and must be invariant for the lifetime of the instance.

In many case the ID value will be assigned by the system or the orchestrator in which case the ID would be defined as an attribute. In other cases the ID value would be assigned to a property. Property assignments would need a function which could be evaluated for each cluster member. TOSCA currently has very few intrinsic functions and those which exist are unlikely to be enough for this purpose. The availability of more functions is one area where HELM currently has more functionality than TOSCA. property_value_expression does not seem to be defined at all at the moment and I’m not sure how a processor would definitively distinguish between an property value assignment and an _expression_.

Rather than defining a programming language syntax within TOSCA I believe it would be preferable to allow breakout to an existing language. I think that that puccini has this ability.

I agree with you that the range ID will need to be qualified to make it unique within the template so that the cluster members can be referenced from elsewhere in the template but think we should make it mandatory that the qualifier is the <modelable_entity_name> (which is what you happen to have chosen in your server example). This departs less from the current usage for single nodes.

As for select statement, I very much dislike the way that we drift off into implementation specific syntax at certain points. I would much prefer that the query language used be explicitly declared. The same comment applies to the syntax used for schema definitions. I don’t think these language declarations need to be included each time a select directive or a schema statement occurs in the template, instead I suggest that the TOSCA header includes a statement which defines the context for the whole document.

tosca_definitions_version: tosca_2_0

schema_syntax_context:

path: org.json-schema/specification.html

version: 2019-09

select_syntax_context:

path: net.goessner/articles/JsonPath/

version: 2007-02-21| e1

function_syntax_context:

path: org.golang

version: 1.16

node_types:

Server:

properties:

hostname:

type: string

attributes:

current_ram:

type: scalar-unit.size

topology_template:

inputs:

os:

type: string

default: linux

inputs:

numberOfServersInCluster:

type: integer

node_templates:

server:

type: Server

occurrences:

identifier: hostname

limit: [1, UNBOUNDED]

instance_count: { get_input: numberOfServersInCluster }

properties:

hostname: [naming_function_in_golang]

outputs:

ram_use_for_named_server:

get_attribute [server, current_ram, hostname_one ]

ram_use_array_all_servers:

select: [$.server..current_ram] ## a JSON Path query

ram_use_array_selected_:

select: [$.server.????.current_ram] ## a JSON Path query but how to pass in the os input? In what order are the different languages substituted/processed?

ram_use_sum:

??? Some function for summing the results of the array. In what order are the different languages substituted/processed?

Paul Jordan
OSS Specialist
BT Technology | Tel +44 (0) 3316252643 | paul.m.jordan@bt.com

This email contains information from BT that might be privileged or confidential. And it's only meant for the person above. If that's not you, we're sorry - we must have sent it to you by mistake. Please email us to let us know, and don't copy or forward it to anyone else. Thanks.

We monitor our email systems and may record all our emails.
British Telecommunications plc
R/O : 81 Newgate Street, London EC1A 7AJ
Registered in England: No 1800000

From: Tal Liron <tliron@redhat.com>
Sent: 17 February 2021 19:12
To: Jordan,PM,Paul,TNK6 R <paul.m.jordan@bt.com>; tosca-comment@lists.oasis-open.org
Subject: Re: [tosca-comment] Identifiers in reference to multiplicity discussions

Thanks Paul, we started to discuss this challenge in depth in the ad-hoc meeting.

In my view, you're on the right track. For specific systems that need to identify node instances we can use attributes. That allows the values to be filled by an external system, whether it's the platform itself or an orchestrator that manages IDs. TOSCA would then let you model the data type for that identifying attribute as is appropriate.

The problem is that we still don't entirely understand what attributes are in TOSCA. :) Even more specifically we don't have tools for using attributes as unique identifiers.

Here's one possible approach:

We add a keyword called "unique" (boolean) to attribute declarations. When "unique: true" is set on an attribute it is a signal to orchestrators that multiple instances (whatever that would mean in any specific implementation) would require unique identification management for this attribute. How this is done would be out of scope for TOSCA, and indeed in many cases would be handled by the platform itself, e.g. you spin up a virtual machine and get a GUID after it's created. Is it not a GUID, but rather an ID that is unique only per cluster? Then maybe your attribute needs to comprise a combination of cluster ID and resource ID in order to be unique. You can model that easily in TOSCA. Note that it might also be possible to have multiple attributes marked as unique. Why not? There might be different unique IDs coming from different parts of the system but all refer to the same "node" as you've encapsulated it in TOSCA.

So why have the keyword at all? Well, now that we know this attribute is unique we can use it as a grammatical reference in functions, specifically the get_attribute function (but also get_artifact and possibly others). Right now get_attribute uses either a node template name or magic keyword (SELF, SOURCE, TARGET) as the "modelable entity name", but I think we would all agree that this is poorly defined. A unique ID attribute can help us narrow what we mean. In trying to brainstorm, here's something I came up with:

A new intrinsic function called "select". This implementation-specific function can search through the runtime or orchestration universes and return a list of IDs. This result (the list of IDs) can then be used as the "modelable entity name" for get_attribute. An example:

data_types:

ID:

properties:

cluster_name:

type: string

serial_number:

type: integer

node_types:

Server:

attributes:

identifier:

type: ID

unique: true

current_ram:

type: scalar-unit.size

topology_template:

inputs:

os:

type: string

default: linux

node_templates:

server:

type: Server

outputs:

total_ram_use: { get_attribute: [ { select: [ server, identifier, "where os = ", { get_input: os } ] }, current_ram }

The "select" function here has the following arguments: first argument is template name (or SELF, SOURCE, TARGET) and the second is the name of an attribute that must be marked as "unique: true". The rest of the arguments will remain as implementation specific. In this example I'm assuming an orchestrator that has some kind of textual querying language. The get_attribute function then uses the result of this function as its first argument, which would be zero or more instance IDs of that specific node template. For those instances it would extract the "current_ram" attribute. What would the result be? I think it should be a dict of ID mapped to attribute value. So that output can then be used by other parts of orchestration to calculate averages, create a total sum, etc.

And those IDs are also consistent. So if, for example, you have multiple outputs with different "select" queries and different attributes, well, that's fine, somewhere down the toolchain you are guaranteed that they refer to the same instance of that node template, so you can cross-reference and construct whatever totals or graphs you want from those outputs.

On Wed, Feb 17, 2021 at 6:01 AM <paul.m.jordan@bt.com> wrote:

I see in the TOSCA TC mailing list a discussion about multiplicity. Good that has been a an area of confusion for me.

In particular an email from Peter Bruun about node identities.

I’d just like to mention that in my work on mapping TMForum to TOSCA I have a profile where every one of my node types is derived from a root type which represents the SID RootEntity. The SID RootEntity is defined as having a mandatory Name and ID plus an optional description thus.

ID	RootEntity	required	Unambiguously distinguishes different object instances.
description	RootEntity		This is a string, and defines a textual free-form description of the object. Notes: This attribute doesn’t exist in M.3100. The CIM has two attributes for this purpose, Caption (a short description) and Description.
name	RootEntity	required	Represents a user-friendly identifier of an object. It is a (possibly ambiguous) name by which the object is commonly known in some limited scope (such as an organization) and conforms to the naming conventions of the country or culture with which it is associated. It is NOT used as a naming attribute (i.e., to uniquely identify an instance of the object).

Paul Jordan
OSS Specialist
BT Technology | Tel +44 (0) 3316252643 | paul.m.jordan@bt.com

We monitor our email systems and may record all our emails.
British Telecommunications plc
R/O : 81 Newgate Street, London EC1A 7AJ
Registered in England: No 1800000

tosca-comment message