cti-stix message

Subject: Re: [cti-stix] Strings and Limits

From: "Jason Keirstead" <Jason.Keirstead@ca.ibm.com>
To: Eric Burger <Eric.Burger@georgetown.edu>
Date: Fri, 3 Jun 2016 13:06:37 -0300

I agree with some of this, but not this statement:

"we are not the United Nations: so long as the minimum is long enough, it is irrelevant that some choices of language, like Sanskrit, may be at a disadvantage to other choices of language, like ancient Egyptian hieroglyphics.

As a standards body we may not be the United Nations, but for those of us who build product, we have to support and treat different languages equally.

It actually isn't acceptable for Chinese or Japanese (or any other language) users to be at a disadvantage (be able to type shorter length titles or descriptions than English users when using our products).

"because some languages use more octets per character than others. "

As has been pointed out by @JMG and myself a couple of times... it is not even that simple. The number of octets is a product of both the encoding being used as well as the language, and neither is part of STIX as the encoding being used is part of the serialization not STIX itself, unless we're going to mandate UTF-8 always and disallow all other Unicode formats.

RE Mark "If we have no limits and a Soltra Edge user creates a 100GB title and $compatible-product falls over – how does that get resolved? "

The argument here is to remove legnth limits for the sake of interoperability and internationalization... I don't think that pulling out crazy edge cases helps this argument. I do not think a 100GB title is a reasonable expectation for anyone to handle. That isn't what this is about - it is more about specifying a limit of 255 or 1K or 2K, and having that be vastly different meaning depending on what language you happen to use.

-
Jason Keirstead
STSM, Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

Eric Burger ---06/03/2016 12:01:43 PM---The world has lots and lots of interoperating protocols that do not have string limits. Remember, th

From: Eric Burger <Eric.Burger@georgetown.edu>
To: cti-stix@lists.oasis-open.org
Date: 06/03/2016 12:01 PM
Subject: [cti-stix] Strings and Limits
Sent by: <cti-stix@lists.oasis-open.org>

The world has lots and lots of interoperating protocols that do not have string limits. Remember, there was a time when 640KB for an entire address space seemed infinite. That got hacked in order to address an entire megabyte. Wow! Saying a particular field value will not see a length greater than 255, 16383, 32767, 65535, or even 4294967295 may sound quaint in five years.

One comment was that string length is not a standards issue. I disagree. It most definitely is a standards issue. Saying a standard allows for arbitrary length strings is a standards statement. In fact, being silent invites stack overflow: if there are no limits and the standard is silent implementations may impose limits and then barf. As another comment mentioned, if everyone sets their own limit, there may be interoperability problems.

One line of comments was along the argument that in the real world that does not have Von Neumann’s infinite card deck, there will be some limit to what any particular implementation can digest. Experience shows that since any seemingly infinite limit gets reached decades before anyone anticipates it, the common solution is to take one of two negotiation paths. The first, simpler one is the recipient simply rejects a request with a string that is too long to digest with an appropriate error code. The downside of this is the recipient needs to start parsing the document before realizing it cannot process the document. The second, more complex one is the sender first says that it expects the recipient to be able to digest strings up to N octets long (notice the use of octets (bytes) here, not characters, point codes, or glyphs; more on that below). If the recipient cannot honor that request, it tells the sender that up front. Note that this does not relieve the recipient from counting bytes as in the first case. Assuming the sender is not lying or has no bugs and miscounted is really bad form and will lead to buffer overflows or other indigestion at the recipient. Note that this negotiation would occur at the transport layer, most likely in TAXII.

One thing to point out is more modern protocols like SIP impose minimum string lengths for various fields. Again, the idea is to foster interoperability. If there is some minimum length that everyone knows about, then you know you can safely send something up to that length. In the real world, implementations creep up their limits. Unlike Web servers that needed to cater to Netscape 0.8, still up to 2012, or SSL 1.0 up to 2015, I think our user base is small enough and painfully paranoid about applying updates that even if someone only implements the minimum string length, that will expand over time in a meaningful way.

I have never seen any specification for lengths in anything other than octets. There were some comments that it would not be fair to specify octets, because some languages use more octets per character than others. Here is a surprising take on that: Chinese may have up to five octets per character, but that character might encode 30+ characters of English text. Said differently, we are not the United Nations: so long as the minimum is long enough, it is irrelevant that some choices of language, like Sanskrit, may be at a disadvantage to other choices of language, like ancient Egyptian hieroglyphics.

Now, does this mean I am advocating for no limits on any kind of string? While the right hand side of a JSON key-value pair should not have a limit, it is OK to specify a limit on the left hand side of a JSON key-value pair. The key is just an opaque string of bytes that may happen to have meaning to an English-speaking person. However, that meaning is not normative. “Title” and “Foobar” can mean whatever we specify it means in the STIX standards document. It may mean, “Name of the system administrators second pet goldfish.” It does not have to mean, “Title of the incident.” Because we specify meaning in the standards document, we can restrict the character set, language, and maximum length however we see fit.

[attachment "signature.asc" deleted by Jason Keirstead/CanEast/IBM]

Follow-Ups:
- Re: [cti-stix] Strings and Limits
  - From: Eric Burger <Eric.Burger@georgetown.edu>

References:
- Strings and Limits
  - From: Eric Burger <Eric.Burger@georgetown.edu>