OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-stix] Strings and Limits


Thanks Eric

Bret 

Sent from my Commodore 64

On Jun 3, 2016, at 8:01 AM, Eric Burger <Eric.Burger@georgetown.edu> wrote:

The world has lots and lots of interoperating protocols that do not have string limits. Remember, there was a time when 640KB for an entire address space seemed infinite. That got hacked in order to address an entire megabyte. Wow! Saying a particular field value will not see a length greater than 255, 16383, 32767, 65535, or even 4294967295 may sound quaint in five years.

One comment was that string length is not a standards issue. I disagree. It most definitely is a standards issue. Saying a standard allows for arbitrary length strings is a standards statement. In fact, being silent invites stack overflow: if there are no limits and the standard is silent implementations may impose limits and then barf. As another comment mentioned, if everyone sets their own limit, there may be interoperability problems.

One line of comments was along the argument that in the real world that does not have Von Neumann’s infinite card deck, there will be some limit to what any particular implementation can digest. Experience shows that since any seemingly infinite limit gets reached decades before anyone anticipates it, the common solution is to take one of two negotiation paths. The first, simpler one is the recipient simply rejects a request with a string that is too long to digest with an appropriate error code. The downside of this is the recipient needs to start parsing the document before realizing it cannot process the document. The second, more complex one is the sender first says that it expects the recipient to be able to digest strings up to N octets long (notice the use of octets (bytes) here, not characters, point codes, or glyphs; more on that below). If the recipient cannot honor that request, it tells the sender that up front. Note that this does not relieve the recipient from counting bytes as in the first case. Assuming the sender is not lying or has no bugs and miscounted is really bad form and will lead to buffer overflows or other indigestion at the recipient. Note that this negotiation would occur at the transport layer, most likely in TAXII.

One thing to point out is more modern protocols like SIP impose minimum string lengths for various fields. Again, the idea is to foster interoperability. If there is some minimum length that everyone knows about, then you know you can safely send something up to that length. In the real world, implementations creep up their limits. Unlike Web servers that needed to cater to Netscape 0.8, still up to 2012, or SSL 1.0 up to 2015, I think our user base is small enough and painfully paranoid about applying updates that even if someone only implements the minimum string length, that will expand over time in a meaningful way.

I have never seen any specification for lengths in anything other than octets. There were some comments that it would not be fair to specify octets, because some languages use more octets per character than others. Here is a surprising take on that: Chinese may have up to five octets per character, but that character might encode 30+ characters of English text. Said differently, we are not the United Nations: so long as the minimum is long enough, it is irrelevant that some choices of language, like Sanskrit, may be at a disadvantage to other choices of language, like ancient Egyptian hieroglyphics.

Now, does this mean I am advocating for no limits on any kind of string? While the right hand side of a JSON key-value pair should not have a limit, it is OK to specify a limit on the left hand side of a JSON key-value pair. The key is just an opaque string of bytes that may happen to have meaning to an English-speaking person. However, that meaning is not normative. “Title” and “Foobar” can mean whatever we specify it means in the STIX standards document. It may mean, “Name of the system administrators second pet goldfish.” It does not have to mean, “Title of the incident.” Because we specify meaning in the standards document, we can restrict the character set, language, and maximum length however we see fit.




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]