cti-cybox message

Subject: Re: [cti-cybox] Recent CybOX Changes
From: "Kirillov, Ivan A." <ikirillov@mitre.org>
To: "Back, Greg" <gback@mitre.org>, "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>
Date: Fri, 29 Jul 2016 17:46:46 +0000
Thanks for the comments Greg!

>I'm still not clear on what value of a separate file-path-type is. The 
>only thing that immediately comes to mind is being able to write 
>patterns that match a specific path component vs. relying entirely on 
>regex to match substrings in a file path. I'm not sure if that is worth 
>the additional complexity.

I think our rationale when we implemented this (it’s been a while!) was that it made it more flexible to specify paths across different operating systems and could be useful in patterning as well. We did an example to the patterning spec showing how patterns work against the current file-path-type [1]. This is a generic example against a full path, but one could see how it could be useful for matching against a specific part (e.g., the beginning or end) of a file path. That said, you could also do this with a regex, so I think we can reconsider going back to a single-string based file path.

>I'm concerned about the increased complexity during parsing. In contrast 
>to combining "encoding" with "base64_value" (which is needed when a 
>character sequence cannot be represented in UTF-8, and *required* in 
>order to determine the "natural" or "native" representative of the 
>string), using "encoding" with "value" is really just "ancillary" 
>information ("this is the encoding I saw this string as, before 
>converting it to UTF-8"). In other words, you can safely ignore the 
>"encoding" field in the latter case if you don't care about it, but not 
>in the former case. Combined with needing to distinguish between a 
>string and and object when parsing, this is a lot of additional effort 
>for *every* field where that choice is provided
>
>I know this is a tough problem, and it's possible that this is the best 
>solution. But the idea of writing code to support this does not make me 
>excited.

Yeah, I think the additional parsing complexity is definitely the biggest problem with this approach. That said, is it really that difficult to test whether something is a string or an object when parsing a field? At this point, this does seem like the best solution that we’ve able to put forth, because it means that content producers who don’t care/know about observed encoding (which will likely be most) will always just specify a string, while those who need observed encoding can specify the corresponding object.

Also, it’s worth noting that one of reasons we’ve been trying to fit this into the MVP release is that if we push this back to a later release it will likely require significant changes to CybOX Core, the existing Object data models, or both.

>As one of the people who raised this point, I'd also like to add that 
>I've never seen magic_number well-defined and is usually (in my 
>experience) inconsistently specified. There are some resources online, 
>but I don't think we should necessarily incorporate those by reference 
>or require people to learn about well- and less-well-known magic numbers.
>
>Also, the mime_type for various files (as reported by the "file" utility 
>via libmagic) is not necessarily stable between versions. Hopefully it's 
>pretty uniform for common file types, but I'm worried that mime_type is 
>less "fact" and more "assertion by a specific tool at a specific time".
>
>I realize that file extension by itself is easily spoofed, and that 
>noting when an extension doesn't match the file content is incredibly 
>significant in the CTI domain. But I can't think of a better way to 
>capture this information.

Definitely good points. From the feedback we’ve received, it seems like mime type and magic number are still used by many to get some idea of a what a file purports to be. This is a valid use case, so maybe what we can do is try to write text that is normative as possible around them as we can, while also stating their nature as assertions by a specific tool (for mime type).

[1] https://docs.google.com/document/d/1suvd7z7YjNKWOwgko-vJ84jfGuxSYZjOQlw5leCswPY/edit#heading=h.esieydsmfktm

Regards,
Ivan

On 7/29/16, 10:19 AM, "cti-cybox@lists.oasis-open.org on behalf of Greg Back" <cti-cybox@lists.oasis-open.org on behalf of gback@mitre.org> wrote:

>On 7/28/2016 3:56 PM, Kirillov, Ivan A. wrote:
>> Added File Path Type to Common Object Types (8.1.4.2) so that it can be re-used for file paths as needed in the various CybOX Objects
>
>I'm still not clear on what value of a separate file-path-type is. The 
>only thing that immediately comes to mind is being able to write 
>patterns that match a specific path component vs. relying entirely on 
>regex to match substrings in a file path. I'm not sure if that is worth 
>the additional complexity.
>
>> Based on comments and discussions, removed Object Property Metadata section and instead added String with Encoding Type to Common Object Types (8.1.4.2). This type permits the capture of observed encodings for strings in Objects wherever appropriate (see example: https://docs.google.com/document/d/1DdS-NrVTjGJ3wvCJ7dbSlhYeiaWS6G6dOXu2F3POpUs/edit#heading=h.47ju1z5ea7t). Accordingly, updated the type definitions throughout the CybOX Objects to be an OR between a string and this new type wherever it made sense. We realize that this may complicate parsing (e.g., having to distinguish between strings and objects) and creation of CybOX data so we look forward to your feedback.
>
>I'm concerned about the increased complexity during parsing. In contrast 
>to combining "encoding" with "base64_value" (which is needed when a 
>character sequence cannot be represented in UTF-8, and *required* in 
>order to determine the "natural" or "native" representative of the 
>string), using "encoding" with "value" is really just "ancillary" 
>information ("this is the encoding I saw this string as, before 
>converting it to UTF-8"). In other words, you can safely ignore the 
>"encoding" field in the latter case if you don't care about it, but not 
>in the former case. Combined with needing to distinguish between a 
>string and and object when parsing, this is a lot of additional effort 
>for *every* field where that choice is provided
>
>I know this is a tough problem, and it's possible that this is the best 
>solution. But the idea of writing code to support this does not make me 
>excited.
>
>> Moved magic_number from File Metadata Extension to base File Object, since it is analogous to mime_type which was already on the base. Accordingly, renamed File Metadata to File Metadata Mismatch and removed redundant has_mismatch field. However, a point was raised about this particular extension, namely that it represents an assertion rather than a “fact” such as a magic number or hash. Accordingly, we need to consider the question of whether such assertions belong in CybOX or not.
>
>As one of the people who raised this point, I'd also like to add that 
>I've never seen magic_number well-defined and is usually (in my 
>experience) inconsistently specified. There are some resources online, 
>but I don't think we should necessarily incorporate those by reference 
>or require people to learn about well- and less-well-known magic numbers.
>
>Also, the mime_type for various files (as reported by the "file" utility 
>via libmagic) is not necessarily stable between versions. Hopefully it's 
>pretty uniform for common file types, but I'm worried that mime_type is 
>less "fact" and more "assertion by a specific tool at a specific time".
>
>I realize that file extension by itself is easily spoofed, and that 
>noting when an extension doesn't match the file content is incredibly 
>significant in the CTI domain. But I can't think of a better way to 
>capture this information.
>
>Greg
>
>---------------------------------------------------------------------
>To unsubscribe from this mail list, you must leave the OASIS TC that 
>generates this mail.  Follow this link to all your TCs in OASIS at:
>https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
>
Follow-Ups:
- Re: [cti-cybox] Recent CybOX Changes
  - From: Greg Back <gback@mitre.org>
References:
- Recent CybOX Changes
  - From: "Kirillov, Ivan A." <ikirillov@mitre.org>
- Re: [cti-cybox] Recent CybOX Changes
  - From: Greg Back <gback@mitre.org>