I'm rethinking on the artifact processing topic and I want to propose an alternative point of view.
What if we are looking at the problem with a wrong approach?
The problem of "where to process the artifact" is trying to solve the HOW the orchestrator have to work, but this is an "imperative" problem.
The real trouble that I also had was to ask myself: "I have this script in the TOSCA archive how do I instruct the orchestrator to tell it where does it have to execute it?"
The question was wrong since I should not have a script in the TOSCA archive since that is again IMPERATIVE.
We cannot mix declarative and imperative.
The real problem is that we have an oversimplified set of properties for many nodes and if we look at the "code in the script" we find many things that should be properties of nodes.
This is obviously due to the fact that we needed to have a simple example to work with, but to have a simple working example we needed to place the missing information hardcoded somewhere and this ended up in the shell script.
Probably the better solution could be to place all the needed information in the proper node (with proper relations), than the orchestrator (that could be implemented in any way) will use that info to implement the topology.
In my case I do not implement much in the orchestrator, but use an existing DevOp tool like puppet and all the properties for the node goes inside a template that is associated with the node.
I do not have any script in the tosca archive and leverage on existing tools (that I do not have to code myself and that have a wide amount of modules around for many things).
The properties I use in the nodes are very detailed (like the reverse proxy rule of the httpd.conf of apache for example), but this allows the orchestrator to use the information and do the things it likes with it and could be potentially be much more interoperable than a shell script.
In my case I use puppet and could either add the various manifest that the orchestrator dynamically generate on a puppet master and the newly created machine get the catalog and applies it or I could ssh all the files and use a puppet apply on the VM. That is the imperative work that is related to how I implemented the orchestrator and does not interest the TOSCA archive designer.
I believe that if we look at how those various DevOp tools represent for example apache we could come out with a better example of an apache node that has all the right properties that are interoperable and represent a real production installation set of information.
Than the orchestrator could also work by invoking shell scripts, but will compile the final one from a template using the provided information in the tosca file.
I believe that this should be the philosophy of TOSCA.
Il 10 gennaio 2017 alle 2.27 Chris Lauwers <email@example.com> ha scritto:
“Prescriptive” orchestration languages (such as Ansible or StackStorm) have already addressed the issue of how and where to process arbitrary scripts. I suggest we borrow from the approaches taken by these tools to generalize support for arbitrary artifacts in TOSCA.
- Ansible playbooks contain a list of plays, where each play consists of a list of tasks, each of which is executed by an Ansible “module”. Modules are intended to communicate with a remote “host” (typically specified in an inventory file) to configure that host and provision services on that host.
- An Ansible module is typically (but not always) a piece of python code that runs on the Ansible host. Modules expect input parameters in a certain format and return results and errors in JSON format.
- Ansible includes a “commands” module that is intended to execute arbitrary commands and/or scripts on a remote host. There are a number of flavors of these commands, but in general Ansible commands work in a way that is similar to how we currently envision implementing operations in TOSCA.
- Ansible plays have a “transport” parameter that specifies how modules associated with the play communicate with the host. Typically, the transport will be set to “ssh”, but other values are possible (for example, many network devices use “cli”, or “rest”, or “netconf”)
- Ansible allows the value of the “transport” variable to be set to “local”. When used with the commands module, this value indicates that the command or script to be run will be executed on the local host rather than on the host.
- StackStorm is built around workflows that execute “actions”. An “action” in StackStorm is an arbitrary piece of executable code (although in most cases it’s a bash script of a python script), bundled with metadata (in YAML) that specify how the script is supposed to be run by StackStorm.
- The main parameter of action metadata is the “runner”. StackStorm runners are part of the StackStorm platform and are responsible for “running” the script specified by the action. Runners are similar to the “artifact processors” that I proposed in my email below.
- StackStorm comes with a number of built-in runners. The most notable ones are:
- local-shell-cmd - executes a Linux command on the same host where StackStorm components are running.
- local-shell-script – executes a cript on the same hosts where StackStorm components are running.
- remote-shell-cmd - executes a Linux command on one or more remote hosts provided by the user.
- remote-shell-script - Actions are implemented as scripts. They run on one or more remote hosts provided by the user.
- python-script - This is a Python runner. Actions are implemented as Python classes with a run() method. They run locally on the same machine where StackStorm components are running..
- http-request - HTTP client which performs HTTP requests for running HTTP actions.
- This shows that in StackStorm, the location where the action is run is implicitly specified by selecting the type of runner, rather than by specifying a parameter value.
- StackStorm action metadata also specify how the script expects input values (e.g. via named or positional command line arguments).
Recommendations for TOSCA
- These examples show that other orchestrators give script developers a lot of options for what types of scripts can be supported, where these scripts are run, and how to connect to the “hosts” to which the orchestration applies. I suggest TOSCA should be equally flexible.
- I personally like the StackStorm approach better than the Ansible approach, since it is closer to what we already do in TOSCA. Lifecycle operations in TOSCA are expressed in a way that is similar to how StackStorm actions are specified. Specifically, the “inputs” section of operations are “metadata” for the script that describe the input variables expected by the script.
- As stated below, I recommend introducing the concept of an “artifact processor” that specifies how the artifact is supposed to be run (similar to action runners in StackStorm). This processor would be specified in a “processor” keyname under the “operation” section of a TOSCA interface. The TOSCA spec needs to include a number of built-in processors, but should also allow for development of user-provided processors.
- By default (and to preserve current behavior), TOSCA will use a “remote shell script” processor that uses SSH to connect to the remote host.
- Artifact processors may need to include a parameter that specifies how the processor connects to the host (similar to the “transport” parameter in Ansible). Alternatively, different processor types could be introduced for different types of transport.
- If we introduce the concept of “artifact processor”, then it’s not clear if there is any value in also specifying artifact types (since presumably artifact processors would need to specify mime types and/or file extensions of the artifacts they are able to process).
From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of Chris Lauwers
> Sent: Monday, December 12, 2016 11:26 AM
> To: firstname.lastname@example.org
> Subject: [tosca] artifact processing
For tomorrow’s Simple Profile meeting, I suggest we keeping thinking about how to “formalize” mechanisms that describe how artifacts need to be processed.
Just to recap: most (if not all) of the prose in the document uses examples where artifacts are “install scripts” that need to be run on a “Host”, where a host is assumed to be a Compute node that is the target of a HostedOn relationship.
However, in practice we need to be able to handle artifacts other than install scripts. I can think of the following four different types of artifacts (there may be others):
- Install scripts: like the install scripts just described
- API scripts: scripts that “deploy” nodes by making API calls to an external entity (e.g. Python scripts that call OpenStack or OpenDaylight APIs)
- Playbooks/recipes (e.g. Ansible playbooks, or Chef recipes)
- Images: “snapshots” of deployed entities.
Each of these types of artifacts requires a different mechanism for getting the artifact deployed. Said a different way, each of these types of artifacts may need to get “processed” differently. This means that in order to fully specify operations, we can’t just specify the artifact for the operation, we also need to be clear about the processor that is needed to process that artifact:
operation: <artifact> + <artifact processor>
Flexible artifact processing, then, requires the following:
- Specifying the type of processor required for the artifact
- Specifying any configuration parameters for the artifact
- Specifying tenant/user-specific parameters for the artifact
Specifying the type of processor
Ideally, each type of artifact would have a unique artifact processor, which would allow us to “standardize” on artifact processors based on the type of artifact. However, how do we handle similar artifact that can belong to multiple types, for example:
- A Python script could be an install script to be run on a Host
- A Python script could be an API script to be run by the Orchestrator
If we statically “define” artifact processor types, we can’t base this on file extensions of artifact types.
In order to “user” a processor, we may need configuration parameters for this processor. This could involve:
- DNS names (or IP addresses) for contacting the processor (e.g. Chef servers, or API servers).
In some cases, the processor may not already be running, in which case the processor itself might need to get orchestrated (e.g. using TOSCA). In this case, the configuration parameters would be the result of the orchestration, but we would need a CSAR file representing the processor.
Some processor-related parameters may be necessary to “use” the processor, for example user credentials. We may need to specify those.
Let’s discuss if this is the “right” way to think about artifact processing, and if so how do we reflect this in the TOSCA spec.