cti-stix message

Subject: Title in Particular, Strings in General

From: Eric Burger <Eric.Burger@georgetown.edu>
To: cti-stix@lists.oasis-open.org
Date: Wed, 8 Jun 2016 15:13:49 -0400

I think I figured out what bugs me about the Title proposal (64 ‘things’ long limit; ‘thing’ = characters (of some flavor) or bytes).

If it was 1976, it would be obvious why we need to limit the Title. It has to fit on a single punch card and it should not wrap on a 72-column line printer or ASR-33.

If it was 1986, it would be obvious why we need to limit the Title. Every byte counts - it is why we don’t waste a byte with a length indicator when we can just use a NUL to terminate the string.

If it was 1996, it would be obvious why we need to limit the Title. There is no way I can do a full text search, and a human curator saying what is in a document is incredibly useful.

But, it is 2016. In 2016, what is the purpose of a Title at all? I get it if it was 1916: one needs a card catalog to find something and index cards are incredibly useful to find things. They are nice and compact and you can put a whole multi-story library’s worth of information in a few rows of card catalogs. That logic flows to 1996 in the electronic world. In 1996 I cannot realistically do full-text searches to find what I am looking for, so searching titles is a very useful trick. However, in 2016, what is the use? I would accept the idea of a human curated brief introduction. However, it is highly likely to be wrong over time and of little use. For example, what would the title for a detection of STUXNET look like in 2009? My guess would be “harmless, mystery worm”.

So, why limit the title? Arguments have been that because there are physical limitations (there are only so many disks and cloud providers and atoms in the universe), implementations will have some sort of limit. Even implementations with no hard limits will have to deal with what happens when all the connected disks are full and all nuclear spin states in the universe are already encoding STIX documents (i.e., we have run out of subatomic particles). So, given we know we will run out, the belief is that interoperability would be enhanced if we set some hard limits for data elements. That is great, if you don’t mind redoing standards every now and then. Also, this presumes data base migrations are free. People come up with new encodings that take up more space. People who study history know the punched card went from 8 columns to 24 columns to 80 columns to 96 columns for a reason. 64 ‘characters’ for a title? Why the limit?

Again, why limit the title? Another argument is, “because people will use the Title field for the Description.” Thing #1: this tells me the Title field is useless. If Title is a mini-Description, it is redundant, unusable data. If it is something else, then people would not put the Description in the Title field. It is 2016, not 1996. If you need to find a report about something, you will search for that something. You will not be fingering Titles in a card catalog looking for the right book.

I am very willing to entertain coherent arguments for why we need a title. For that matter, I would entertain arguments for why that title needs a limit. However, the more I hear arguments for why the title needs a limit, the less I see the need for a Title field in the first place.

Now, on to strings in general. My feeling is the general rule is to not have any limitations, but I could see how some strings could have limitations. For example, RFC 1035 limits a fully qualified domain name to 255 octets (including the dots). Before getting excited about there being limits in the Internet, that same spec says you cannot have a DNS name component that starts with a number. Oops. Then again, if we say that a data element representing a domain name is a string limited to 255 characters, what would you have said when you said an IP address is 32 bits. That’s what the spec says! Except today it is 128 bits. Oops. It has been twenty years since IPv6 has been available, and even with 4G and the IoT driving it, it has not broken 5% penetration.

More on strings: because all storage is finite, you are always at risk of running out of space. That means you always need to have code to check to see if you have run out of space. Moreover, if you have limits, you need to have code to check to see if what you received fits those limits. That means that for at least two reasons (there are more), you have to have code to check that the length of what you receive will fit into storage. If that is the case, you already have written the code to handle what happens if someone sends you a nominally unbounded string that you cannot store. In fact, eliminating the redundant “is this string less than 64 characters” test improves the testability of the code and reduces the chance of latent defects hiding in the code.

Said differently, not having arbitrary limits improves the security and stability of products processing STIX. That to me is a much more laudable goal than making sure that human readable short titles that no one will read are actually short.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail