Using property or attribute values as unique IDs for node representations is challenging, in my opinion:
I think we’re all converging on creating unique “node representation” names based on “node template names” instead, either by:
We’ll continue to debate the pros and cons of these approaches during our meetings.
Thank you Chris, that's a very clear explanation. I think the TOSCA cause would be advanced if such explanations were publicly available in a tutorial document.
I accept the four kinds and the two identifiers you mention.
With regard to the 'Identifiers used by the TOSCA orchestrator to uniquely identify node representations.' I'm not clear why one option for these is not to allow a property to be nominated as the identifier to be used for multiple node representations derived
from a named node template.
You also say 'it may not be necessary or desirable to define a “unique id” in some application domains. We shouldn’t force type designers to introduce an ID attribute just because the orchestrator expects it.' But if there is multiplicity from node template
to node representation, then, as you have already stated, a unique id is needed. The language would only require a property to be nominated in cases where multiple representations from a single node template are actually used.
Has the TC also considered the option of allowing node templates to take names which are derived from some function which yields a unique value from a loop variable, and then wrapping such node templates in a while/wend loop? Once the loop has been expanded,
each resultant node template would map to one, and only one, node representation. The challenge you mention of relationship cardinality in such a solution would be addressed by permitting relationship targets to be expressed either as a constant or as a function
of the loop variable.
Hi Paul,
A couple of clarifications (these may be obvious, but I just want to make sure we’re on the same page):
1.
There are actually four “kinds” of entities that are relevant for this discussion:
1.
TOSCA node types: these define re-usable components
2.
TOSCA node templates: these define (typed) components in a service. Node templates assign specific values (often using intrinsic functions) to the properties etc. defined in their types. It is not uncommon to have multiple node templates
of the same node type in a service template.
3.
TOSCA node representations: at deployment time, a TOSCA orchestrator “marries” service templates with deployment-specific input values in order to create a (run-time) representation of the service that is to be deployed/managed. Node
representations must be fully resolved, i.e. all properties must have actual values (instead of using an intrinsic function), all requirements must be fulfilled, etc.)
4.
Node implementations: these are the actual service entities in the “external” world that correspond to the node representations managed by the orchestrator
2.
Based on this distinction between the different “kinds” of entities, it should be clear that “multiplicity” could be supported for any or all of these entities:
1.
As I already stated, it is very common to have multiple node templates of the same node type in a given service template. These different node templates are uniquely identified using the “node template name” in the service template.
2.
We could also allow multiple “node representations” to be created (at deployment time) from the same “node template”. The generic use case for this is a SD-WAN or VPN service where the service template might support an arbitrary number
of VPN sites. The template for each of these sites would be identical, but at deployment time different input values could be specified for each individual site. The question that’s currently being discussed in the TOSCA meetings is whether this use case should
be supported. Supporting this use case introduces (at least) two challenges:
1.
How is the node representation identified? If we only allow a one-to-one mapping between node templates and node representations, then each node representation could be uniquely identified (by the orchestrator) using the corresponding
node template name. If we allow multiplicity (i.e. the creation of multiple node representations from the same node template) then the node template name can no longer be used to identify a specific node representation.
2.
What is the impact of multiplicity on the cardinality of relationships?
3.
And finally, it is possible to create multiple (external) node implementations from the same node representation. You state that you would prefer to have one and only one node instance/implementation in the real world correspond to one
node representation, but this is not something that can be controlled by TOSCA since the creation of the external entities is entirely implementation-specific (if you use pure TOSCA orchestration, then the TOSCA artifacts are responsible for creating the entities).
3.
Given this discussion, it should also be clear that there are two types of identifiers that may need to managed:
1.
Identifiers used by the TOSCA orchestrator to uniquely identify node representations. These should be built into the TOSCA language
2.
Identifies used to uniquely identify entities in the external world. These are domain/implementation specific and should be modeled using TOSCA attributes.
4.
You could combine the two types of IDs into one as you suggested by having TOSCA language support for specify which attribute should be used as a unique ID, but I would caution against using that approach because:
1.
It muddies the distinction between what’s in the language and what is in the type system. With version 2.0, we’ve spent a lot of time cleaning this up and we should be careful to re-introduce requirements to have specific entities in
the TOSCA types
2.
Specifically, it may not be necessary or desirable to define a “unique id” in some application domains. We shouldn’t force type designers to introduce an ID attribute just because the orchestrator expects it.
On a related node, I sympathize with your comment about “property_value_expression” being underdefined. We have a number of proposals to rectify this, but they require agreement on multiplicity support for TOSCA node representations before
those proposals can be discussed.
Thanks,
Chris
Tal,
My suggestion would be that by default a node_template refers to one and only one node instance in the real world. That way there is a clear distinction between node type definitions (which are the specification) and
node templates (which are instances).
That default position would be overridden where there would otherwise be a need to write out in full multiple node templates which are all derived from the same node type definition and differ only in parameter value
assignments. In effect the’ occuances’ syntax would act as a loop over node template.
The syntax for creating that loop would include mandatory indicator of which parameter or attribute definition (defined in the node type def) is to be used as the ID for items in the range. The assigned value for that
ID would only need to be unique within the scope of the cluster and must be invariant for the lifetime of the instance.
In many case the ID value will be assigned by the system or the orchestrator in which case the ID would be defined as an attribute. In other cases the ID value would be assigned to a property. Property assignments would
need a function which could be evaluated for each cluster member. TOSCA currently has very few intrinsic functions and those which exist are unlikely to be enough for this purpose. The availability of more functions is one area where HELM currently has more
functionality than TOSCA. property_value_expression does not seem to be defined at all at the moment and I’m not sure how a processor would definitively distinguish between an property value assignment and an _expression_.
Rather than defining a programming language syntax within TOSCA I believe it would be preferable to allow breakout to an existing language. I think that that puccini has this ability.
I agree with you that the range ID will need to be qualified to make it unique within the template so that the cluster members can be referenced from elsewhere in the template but think we should make it mandatory that
the qualifier is the <modelable_entity_name> (which is what you happen to have chosen in your server example). This departs less from the current usage for single nodes.
As for select statement, I very much dislike the way that we drift off into implementation specific syntax at certain points. I would much prefer that the query language used be explicitly declared. The same comment applies
to the syntax used for schema definitions. I don’t think these language declarations need to be included each time a select directive or a schema statement occurs in the template, instead I suggest that the TOSCA header includes a statement which defines the
context for the whole document.
tosca_definitions_version: tosca_2_0
schema_syntax_context:
path: org.json-schema/specification.html
version: 2019-09
select_syntax_context:
path: net.goessner/articles/JsonPath/
version: 2007-02-21| e1
function_syntax_context:
path: org.golang
version: 1.16
node_types:
Server:
properties:
hostname:
type: string
attributes:
current_ram:
type: scalar-unit.size
topology_template:
inputs:
os:
type: string
default: linux
inputs:
numberOfServersInCluster:
type: integer
node_templates:
server:
type: Server
occurrences:
identifier: hostname
limit: [1, UNBOUNDED]
instance_count: { get_input: numberOfServersInCluster }
properties:
hostname: [naming_function_in_golang]
outputs:
ram_use_for_named_server:
get_attribute [server, current_ram, hostname_one ]
ram_use_array_all_servers:
select: [$.server..current_ram] ## a JSON Path query
ram_use_array_selected_:
select: [$.server.????.current_ram] ## a JSON Path query but how to pass in the os input? In what order are the different languages substituted/processed?
ram_use_sum:
??? Some function for summing the results of the array.
In what order are the different languages substituted/processed?
Paul Jordan
OSS Specialist
BT Technology |
Tel +44 (0) 3316252643
| paul.m.jordan@bt.com
This email contains information from BT that might be privileged or confidential. And it's
only meant for the person above. If that's not you, we're sorry - we must have sent it to you by mistake. Please email us to let us know, and don't copy or forward it to anyone else. Thanks.
We monitor our email systems and may record all our emails.
British Telecommunications plc
R/O : 81 Newgate Street, London EC1A 7AJ
Registered in England: No 1800000
From: Tal Liron <tliron@redhat.com>
Sent: 17 February 2021 19:12
To: Jordan,PM,Paul,TNK6 R <paul.m.jordan@bt.com>;
tosca-comment@lists.oasis-open.org
Subject: Re: [tosca-comment] Identifiers in reference to multiplicity discussions
Thanks Paul, we started to discuss this challenge in depth in the ad-hoc meeting.
In my view, you're on the right track. For specific systems that need to identify node instances we can use attributes. That allows the values to be filled by an external system, whether it's the platform itself or an
orchestrator that manages IDs. TOSCA would then let you model the data type for that identifying attribute as is appropriate.
The problem is that we still don't entirely understand what attributes are in TOSCA. :) Even more specifically we don't have tools for using attributes as unique identifiers.
Here's one possible approach:
We add a keyword called "unique" (boolean) to attribute declarations. When "unique: true" is set on an attribute it is a signal to orchestrators that multiple instances (whatever that would mean in any specific implementation)
would require unique identification management for this attribute. How this is done would be out of scope for TOSCA, and indeed in many cases would be handled by the platform itself, e.g. you spin up a virtual machine and get a GUID after it's created. Is
it not a GUID, but rather an ID that is unique only per cluster? Then maybe your attribute needs to comprise a combination of cluster ID and resource ID in order to be unique. You can model that easily in TOSCA. Note that it might also be possible to have
multiple attributes marked as unique. Why not? There might be different unique IDs coming from different parts of the system but all refer to the same "node" as you've encapsulated it in TOSCA.
So why have the keyword at all? Well, now that we know this attribute is unique we can use it as a grammatical reference in functions, specifically the get_attribute function (but also get_artifact and possibly others).
Right now get_attribute uses either a node template name or magic keyword (SELF, SOURCE, TARGET) as the "modelable entity name", but I think we would all agree that this is poorly defined. A unique ID attribute can help us narrow what we mean. In trying to
brainstorm, here's something I came up with:
A new intrinsic function called "select". This implementation-specific function can search through the runtime or orchestration universes and return a list of IDs. This result (the list of IDs) can then be used as the
"modelable entity name" for get_attribute. An example:
total_ram_use: { get_attribute: [ { select: [ server, identifier, "where os = ", { get_input: os } ] }, current_ram }
The "select" function here has the following arguments: first argument is template name (or SELF, SOURCE, TARGET) and the second is the name of an attribute that must be marked as "unique: true". The rest of the arguments
will remain as implementation specific. In this example I'm assuming an orchestrator that has some kind of textual querying language. The get_attribute function then uses the result of this function as its first argument, which would be zero or more instance
IDs of that specific node template. For those instances it would extract the "current_ram" attribute. What would the result be? I think it should be a dict of ID mapped to attribute value. So that output can then be used by other parts of orchestration to
calculate averages, create a total sum, etc.
And those IDs are also consistent. So if, for example, you have multiple outputs with different "select" queries and different attributes, well, that's fine, somewhere down the toolchain you are guaranteed that they refer
to the same instance of that node template, so you can cross-reference and construct whatever totals or graphs you want from those outputs.
I see in the TOSCA TC mailing list a discussion about multiplicity. Good that has been a an area of confusion for me.
In particular an email from Peter Bruun about node identities.
I’d just like to mention that in my work on mapping TMForum to TOSCA I have a profile where every one of my node types is derived from a root type which represents
the SID RootEntity. The SID RootEntity is defined as having a mandatory Name and ID plus an optional description thus.
ID
|
RootEntity
|
required
|
Unambiguously distinguishes different object instances.
|
description
|
RootEntity
|
|
This is a string, and defines a textual free-form description of the object. Notes: This attribute doesn’t
exist in M.3100. The CIM has two attributes for this purpose, Caption (a short description) and Description.
|
name
|
RootEntity
|
required
|
Represents a user-friendly identifier of an object. It is a (possibly ambiguous) name by which the object
is commonly known in some limited scope (such as an organization) and conforms to the naming conventions of the country or culture with which it is associated. It is NOT used as a naming attribute (i.e., to uniquely identify an instance of the object).
|
Paul Jordan
OSS Specialist
BT
Technology | Tel
+44 (0) 3316252643 |
paul.m.jordan@bt.com
This email contains information from BT that might be privileged or confidential. And it's
only meant for the person above. If that's not you, we're sorry - we must have sent it to you by mistake. Please email us to let us know, and don't copy or forward it to anyone else. Thanks.
We monitor our email systems and may record all our emails.
British Telecommunications plc
R/O : 81 Newgate Street, London EC1A 7AJ
Registered in England: No 1800000