OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

tosca message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Proposal: CSARs should be tarballs, not ZIPs


In a conversation I had with someone who professes to "hate TOSCA" one of the issues that came up was how bad CSARs are. And one point made hit home.

CSARs are currently defined as ZIP containers. Unfortunately, ZIP is not a streaming format, instead requiring random access to locations in the container. The entire container needs to be read in order to access an individual entry. Thus The any processing of a CSAR has to take place on an accessible file system, which means that if the CSAR is at a URL then the whole package would have to be downloaded first.

If you're dealing with a CSAR with very big artifacts (virtual machine images) then this quickly becomes a major burden on different parts of the system which need to process specific parts of a CSAR. This is indeed a pain point with currently existing TOSCA solutions, e.g. ONAP.

There's a reason why "tarballs" are so often preferred in packaging. A ".tar.gz" file is streamable for two reasons: gunzip is streaming decompression of a single file, and that single file is a "tape archive" (tar), which is a straightforward concatenation, likewise streaming. There is no random access. Thus a CSAR processor can choose to process just a specific entry and not have to download the entirety. It can throw away bytes that do not interest it.

Note that if one can benefit from random access to a tarball, then it's easy enough to unpack it in its entirety, and indeed in a much more efficient way than a ZIP: the tarball can be unpacked and streamed directly to the filesystem. A ZIP would still have to be downloaded first to accomplish the some function, leading indeed to more than double the storage requirement.

So, it's very obvious to me that this needs to change in TOSCA 2.0 with a new CSAR specification.

My specific recommendations:

1. Let's first standardize on TAR. So a raw ".csar" extension would be exactly a "tape archive" (a tarball).
2. Let's then standardize on GZIP for the supported algorithm. So a ".csar.gz" extension would imply a GZIPped CSAR. There are many other popular algorithms used (bz2, xz) but in the interests of interoperability it's best to recommend one. The usefulness of adding the extra ".gz" is to clarify if decompression is needed, and indeed many toolchains recognize that convention automatically.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]