[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [cti-stix] Unicode, strings, and STIX
There is simply no logical way to define a "max length" in a way that protects against "buffer overflow" problems with Unicode... so if buffer overflow is the main motivation for this
- If we say "max_length" of title means 255 *BYTES*, then in some languages that is going to result in a very short title than other languages - and furthermore, you could be truncating it in the middle of a character (grapheme) making it all the more invalid for the person entering it on their screen.
- If we say "max_length" of title means 255 *code points*, then in some languages it will result in shorter titles being allowd than others, and it also could equal an arbitrary number of bytes, as it depends on the encoding and language being encoded. And you still have the problem of truncating in the middle of a character (grapheme)
- If we say "max_length" of title means 255 *graphemes*, then all languages are allowed the same title length, and you have no problems truncating in the middle of a character. However, it means a title could equal an arbitrary number of bytes.
I say throw it out.
-
Jason Keirstead
STSM, Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com
Without data, all you are is just another person with an opinion - Unknown
Terry MacDonald ---06/01/2016 07:19:19 PM---I think having built in maximum field size is pragmatic. We don't want to design buffer overflow sus
From: Terry MacDonald <terry.macdonald@cosive.com>
To: Rich Piazza <rpiazza@mitre.org>
Cc: John-Mark Gurney <jmg@newcontext.com>, Jason Keirstead/CanEast/IBM@IBMCA, "Jordan, Bret" <bret.jordan@bluecoat.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
Date: 06/01/2016 07:19 PM
Subject: RE: [cti-stix] Unicode, strings, and STIX
Sent by: <cti-stix@lists.oasis-open.org>
I personally think that maximum field length should be defined in the STIX standards doc for each STIX type (e.g. boolean, number), and that it should be sized in Unicode characters. Then in each serialisation document (e.g. in a JSON serialisation doc) we should convert that Unicode character length into what ever length definition makes sense for that serialisation format e.g. JSON and the use of code points.
I really don't want to be responsible for creating threat intelligence hacks in 2-5 years from now because of a decision we made today.
Cheers
Terry MacDonald
Cosive
On 2/06/2016 04:17, "Piazza, Rich" <rpiazza@mitre.org> wrote:
Then, implementers would have to make sure they could support that.
In STIX 1.2.1, the description field of all of the objects had this text in the specification documents. I’m not sure in which direction that will sway you J
If we do not define a max length then everyone will set their own. And we will have problems.
Bret
Sent from my Commodore 64
On Jun 1, 2016, at 8:08 AM, Piazza, Rich <rpiazza@mitre.org> wrote:
In addition, I kinda agree that that the length of strings isn’t a “standards” issue, or an implementation issue that we need to comment on anywhere.
RE the encoding language question, I posted some sample language to slack that I think solves the problem: "Any serialization of STIX MUST encode all String values in an encoding that follows the Unicode standard".
I do not think the below proposal solves some of the other key questions JMG poses. The most critical question we have is with regards to all of these "max length" properties in the spec and how they will be validated. These things actually *can not* be validated in an encoding-independent way. I have asked a few times and will ask again - in 2016, is "max length" really anything we need to care about here. DBAs may have a bit of heartburn, but IMO it is not something we should be concerned with in STIX. Modern databases do not pre-allocate storage for columns anymore anyway. I would rather just forget about the idea. It makes things a lot simpler.
Also, the idea that we should say for example "a title should only be 255 code points long" is completely arbitrary IMO and imposing undue limits on the analyst.
-
Jason Keirstead
STSM, Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com
Without data, all you are is just another person with an opinion - Unknown
<image001.gif>"Piazza, Rich" ---06/01/2016 11:39:45 AM---+1 From: cti-stix@lists.oasis-open.org [mailto:cti-stix@lists.oasis-open.org] On Behalf Of Terry Mac
From: "Piazza, Rich" <rpiazza@mitre.org>
To: Terry MacDonald <terry.macdonald@cosive.com>, John-Mark Gurney <jmg@newcontext.com>
Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
Date: 06/01/2016 11:39 AM
Subject: RE: [cti-stix] Unicode, strings, and STIX
Sent by: <cti-stix@lists.oasis-open.org>
+1
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]