Re: [tosca] Re: Proposal: CSARs should be tarballs, not ZIPs

I agree with Peter in that we should make sure TOSCA templates and related artifacts can be interchanged outside of CSAR and there are many contexts where CSAR files are secondary. Unfurl for examples relies on git not CSARs for packaging and interchange. CSARs are just something it imports.Â

BTW, tar.gz is fairly universal at this point and many specifications and technologiesÂrely on it -- licensing and file system compatibilityÂare no longer concerns.Â

Adam

On Sat, Apr 17, 2021 at 12:57 AM Bruun, Peter Michael (CMS RnD Orchestration) <peter-michael.bruun@hpe.com> wrote:

All,

Â

From my perspective the purpose CSAR is to have a standardized format that is guaranteed to be understood by all TOSCA software, so that interchange of TOSCA between tools, processors, orchestrators, etc. is possible, at least at the lowest level.

Â

For me the CSAR is, however, secondary as a representation of your TOSCA content, there are many other, equally respectable, ways that TOSCA software could represent the TOSCA content *internally* and for interchange within a software family: Filesystem, git, database, etc.

Â

So as long as any software family is able to ingest and export the CSAR formats that we choose to standardize, and assuming that such formats are reasonably open and common (zip, tar, tar.gz, etc) then I am happy enough.

Â

I do have a small reservation with respect to gz, since the GNU licenses tend to turn any software that incorporates their licenses into open software as well. That could have some impact on how those of us who produce commercial software have to package the CSAR capabilities.

Â

I am ancient, and I remember having had trouble with some versions of tar having a limit on the number of characters in the full file paths. I assume that modern implementations (other than GNU) do not have such limitations? Otherwise, this could be a problem with the suggestion.

Â

Another point, that I have been trying to make, is that even the âNode Representations Graphâ takes the form of one or more, small/large/huge, sets of TOSCA nodes and relationships. I believe that it ought to be possible to *export* that graph (or coherent subsets of that graph) on TOSCA template format.

Â

This will require, that the nodes in the Node Representation Graph retain a relationship to the TOSCA nodetypes, relationship types, artifacts, interfaces etc, from which they were created. What will be missing would be where the inputs came from (but the values would be there). The sources of properties and attributes should still be derivable based on the types and templates that the nodes and relationships were derived from. Of course there would not (necessarily) be any abstractions in that representation â no requirements and capabilities or filters â because at the Node Representation Graph level, all such abstractions will have been resolved.

Â

Where the above may seem pointless from a day-0 and day-1 perspective, from a day-2 perspective this becomes a serious requirement because without that information being retained (I deliberately say âinformationâ, not postulating any specific representation of that information). So the ability to make such a low.

Â

Also, by elevating *any* internal representation of the Node Representation Graph to be formally representable as TOSCA, the question about âdangling requirementsâ goes away, because within that TOSCA representation all requirements will have been completely resolved.

Â

In my context, such a Node Representation Graph can easily contain hundreds of millions of nodes, making representation as one huge CSAR impractical, but theoretically possible.

Â

The point is:

Â

Any representation that can be exported as TOSCA, is TOSCA.

Â

That broadens the view of TOSCA being equated to CSAR, to TOSCA is anything that can be described using a CSAR. So representations in git, files, database, whatever are TOSCA as long as they can be converted to a valid CSAR format.

Â

Peter

Â

Â

From: tosca@lists.oasis-open.org [mailto:tosca@lists.oasis-open.org] On Behalf Of Tal Liron
Sent: 16. april 2021 19:24
To: tosca@lists.oasis-open.org
Subject: [tosca] Re: Proposal: CSARs should be tarballs, not ZIPs

Â

Some additional thoughts:

Remember that the CSAR version is separate from the TOSCA version. The current CSAR version is 1.1. So my proposal here would be for CSAR 2.0 (it's a significant enough change that I think it would warrant a major semantic version change).

But, backwards compatibility would mean that systems would still be able to support CSAR 1.1, which is in ZIP. To be 100% clear: you could write a TOSCA 2.0 service template and package it in CSAR 1.1. We would have to be clear in the TOSCA 2.0 spec that this is supported.

Another thought regarding extensions -- if we move to tarballs, it might be a good idea to choose a different extension than ".csar" so that processors would easily know if they're dealing with a new-style vs. old-style container. (This is a common problem with systems that upgrade their formats.) So, perhaps something like this:

".csar" extension: means CSAR 1.1 or CSAR 1.0, meaning it's a ZIP
".csar2" extension: means CSAR 2.0 (and beyond), meaning it's a TAR
".csar2.gz" extension: GZIPped TAR

It's a bit awkward, but 100% deterministic.

Â

On Thu, Apr 15, 2021 at 12:03 PM Tal Liron <tliron@redhat.com> wrote:

In a conversation I had with someone who professes to "hate TOSCA" one of the issues that came up was how bad CSARs are. And one point made hit home.

Â

CSARs are currently defined as ZIP containers. Unfortunately, ZIP is not a streaming format, instead requiring random access to locations in the container. The entire container needs to be read in order to access an individual entry. Thus The any processing of a CSAR has to take place on an accessible file system, which means that if the CSAR is at a URL then the whole package would have to be downloaded first.

Â

If you're dealing with a CSAR with very big artifacts (virtual machine images) then this quickly becomes a major burden on different parts of the system which need to process specific parts of a CSAR. This is indeed a pain point with currently existing TOSCA solutions, e.g. ONAP.

Â

There's a reason why "tarballs" are so often preferred in packaging. A ".tar.gz" file is streamable for two reasons: gunzip is streaming decompression of a single file, and that single file is a "tape archive" (tar), which is a straightforward concatenation, likewise streaming. There is no random access. Thus a CSAR processor can choose to process just a specific entry and not have to download the entirety. It can throw away bytes that do not interest it.

Â

Note that if one can benefit from random access to a tarball, then it's easy enough to unpack it in its entirety, and indeed in a much more efficient way than a ZIP: the tarball can be unpacked and streamed directly to the filesystem. A ZIP would still have to be downloaded first to accomplish the some function, leading indeed to more than double the storage requirement.

Â

So, it's very obvious to me that this needs to change in TOSCA 2.0 with a new CSAR specification.

Â

My specific recommendations:

Â

1. Let's first standardize on TAR. So a raw ".csar" extension would be exactly a "tape archive" (a tarball).

2. Let's then standardize on GZIP for the supported algorithm. So a ".csar.gz" extension would imply a GZIPped CSAR. There are many other popular algorithms used (bz2, xz) but in the interests of interoperability it's best to recommend one. The usefulness of adding the extra ".gz" is to clarify if decompression is needed, and indeed many toolchains recognize that convention automatically.

Â

tosca message