Re: [cti-users] STIX 2.0 Pattern Expressiveness

On Tue, Jan 3, 2017 at 12:01 PM, John-Mark Gurney <jmg@newcontext.com> wrote:

> (4): Good point here, I can see how the divergence between dictionary and list type lookup syntax is odd. I’m wondering if we should just use square bracket notation for dictionary values (as in Python) as well? E.g., “file:hashes[MD5]” instead of “file:hashes.MD5”. This would also allow us to use an equivalent syntax for ANY.

Yeah, this is a tricky one, and complicated by the fact that some keys
have a hyphen (minus sign) in them making them not safe from a parsing
perspective... There was discussion of using ["key"], but was nixed
partly due to the extra characters needed, and when going to deep
dictionaries, you'd have something like
network-traffic:extended_properties["http-ext"]["request_header"]["Basic-Auth"]

Actually I prefer both the use of strings as keys, and keeping the brackets. In practice, I think dictionaries will be mostly used for dictionaries that map directly to observed data dictionary fields. There's no requirement that these keys conform to the "shape" of a STIX property per the spec (ie, requirements around use of dash vs underscore). The spec also says these keys are case-sensitive, such as the http request headers. Thus, IMHO it's better to treat them as the data that they are, with this explicit syntax.

Also, IIRC, per RC4, your example would be

network-traffic:extensions['http-request-ext']['request_header']['Basic-Auth']

Yes, verbose, but also clear that it's "stepping into" userdata. Compared to:

network-traffic:extensions.http-request-ext.request_header.Basic-Auth

I think the former is less confusing to read once you get to the 'Basic-Auth' bit.

If verbosity is truly a concern, it seems like there's other places to save some characters: short-hand aliases for object types, 'exts' instead of 'extensions', 'httpreq' instead of 'http-request-ext' (you know it's an extension, after all), 'header' instead of 'request_header' (you already know it's an http request, there's no response data encoded anywhere), &c. We could also do away with "extensions" and support conceptual sub-typing. In that case, an 'http request' would be in an 'is a' relationship with 'network-traffic' and the whole 'extensions' part of the object-path goes away.

Quite the contrary, my impression of this specification is to support interoperability through specificity. If I'm correct, I would think verbosity would be a secondary consideration from precision.

> (7), (8): We’ve thought a bit about defining function in the language and decided to postpone them to a future release. Most of our discussion has been on functions for casting back and forth between constant types (e.g., hex -> integer), but other types of functions for primitive types definitely makes sense. As far as user-defined functions, this isn’t something we’ve discussed, and while I see the utility in them I think they would also have the potential to make patterning much more complex, especially for implementers/consumers. That said, I think it’s an interesting idea and one worth discussing amongst our community.

The biggest issue w/ adding UDF's is that it now means that the
patterning language is not universally readable... Once we add UDF's,
we now have to decide how to tell the consumer which functions they
need, etc. It's something that will take a lot of work, and my gut
feeling is that we really don't want to support too flexible UDF's due
to compatibility.

I agree UDFs opens a can of worms. Forget I mentioned it. I think it could still be useful to define some "functions" or "methods" for the basic types. Could be done simply as ephemeral/implicit properties actually. the 'string' primitive type could have a 'length' property that's available for use in expressions. There's no parameters to pass, so there's no new syntax in the grammar.

Don't forget the more functions that are added, the slower matching
will be, and it'll be an interesting issue to decide on how to
optimize it...

See my earlier observation re: this being a spec for interoperability. It would be up to implementations to evaluate patterns efficiently.

Thanks for your comments!

Thanks for considering them!

-n

> From: <cti-users@lists.oasis-open.org> on behalf of Nick Dimiduk <ndimiduk@gmail.com>
> Date: Thursday, December 22, 2016 at 2:35 PM
> To: "cti-users@lists.oasis-open.org" <cti-users@lists.oasis-open.org>
> Subject: [cti-users] STIX 2.0 Pattern Expressiveness
>
> Hello,
>
> I'm new to STIX and I've been evaluating the use of STIX 2.0RC3 Patterns ([0]) for some use-cases. I find it to be quite a powerful tool. However, there are a couple concepts I don't know how to express. I'm hoping the community might be able to help me out -- either there's a usage I don't see (missed in my reading) or there's an oversight in the language. For any of the latter, I hope I'm not too late for my suggestions to be considered for 2.0 timelines.
>
> (1) Dictionary key absence. For instance, looking for some network-traffic property AND the absence of a specific HTTP header. Do I treat the dictionary as a collection and use NOT IN comparison operator? That seems to violate the grammar rules, which say the LHS (left-hand side) of an IN clause is the Object Path and the RHS (right-hand side) is a set of constant values.
>
> (2) Related to (1), the same syntax question is raised for optional Object Paths (properties) which are not present. For example, a path like "file:size" describes an optional property of the File Object. How to check for (the absence of) that field? Is there a "null/nil/None" object value that can be checked for? What's the syntax for the check? What primitive types support it? Some candidate syntax comes to mind: "[file:size != nil]" or maybe "[file:size IS NOT null]". Where is that discussed in the spec?
>
> (3) Also related to (1), the comparison _expression_ is always Object Path on the LHS, literal value on the RHS. This is inflexible, and means there's no way to compare two Object Paths to each other. It also means I cannot check to see if Object Path A is present in Object Path B where B is a collection.
>
> (4) Speaking of collections, collection object lookup syntax is divergent. For list types (Part 4, Section 5.2), we have a 0-based index with square-brackets ('[]'). We also have a convenience syntax of "list_property[*]" as syntactic sugar for the logical ANY operator from SQL. However, dictionary types (Part 4, Section 5.3) are referenced with "dot-notation", just like any object property. This obscures the property's type for a casual reader and restricts (or at least confuses) the body of syntax available for expressions on dictionary elements. For example, there's no ANY equivalent for matching dictionary members like we can for lists.
>
> (5) Related to (4), there appears to be no syntax for the other collection-based logical operators provided by SQL -- ALL, SOME, EXISTS. As mentioned in (2), (3), there is IN syntax, but it's not available for Object Path element collections, only constant set literals.
>
> (6) While talking about logical operators, I haven't noticed the equivalent of a BETWEEN _expression_. One must say, for example "[type:property > A AND type:property < B]". I haven't thought in depth on this topic, could be other non-numerical types for which it's less obvious how to express value constraints, for which BETWEEN syntax goes from not just "nice to have" but "required for _expression_" -- ie, where the semantics of greater-than, less-than do not make sense but BETWEEN does.
>
> (7) Is there any thought around a library of functions for primitive types? For instance, a length() function that operates on strings.
>
> (8) The next logical question from (7) is where would UDF's or implementation-specific extensions be installed into the syntax? It would be very powerful to have a Function Object (including a function implementation) that can be exported along with the Indicators that contain Patterns that make use of that function.
>
> (9) Also from (7), any consideration for aggregation operators/functions? "Match when the combined file size of all attachments on the Email Message is greater than 5mb". Assuming the Email Message Object had a property "attachment_refs" of type object-ref list that's restricted to File Object types, that might look like "[sum(email-message:attachment_refs[*].size) > 5 * 1024 * 1024]".
>
> (10) It's not clear to me if the RHS from the example in (9) is even valid -- are arbitrary mathematical expressions legal for either LHS or RHS of comparison expressions?
>
> (11) Capability for backward references. I'd like to be able to refer to the value that matched a previous _expression_ later in the pattern. For instance, "match when an email contains a URL hosted on baddomain.com<http://baddomain.com> and subsequent http traffic contains a 200 success request for that URL". One approach might be to borrow group backreferences from regex syntax, "[(email-message:url_refs[*].value MATCHES '*.baddomain.com/.*\.docx'<http://baddomain.com/.*\.docx'>)] FOLLOWED BY [network-traffic:extensions.http-request-ext.request_value = \1 AND network-traffic:extensions.x-examplecom-http-request-ext.status_code = 200]". I don't quite know how this would work -- you'd want to define capturing groups of arbitrary expressions and also allow for capturing groups to be embedded into the RHS of a MATCHES _expression_. Tricky.

>
> Thanks a lot!
> Nick
>
> [0]: https://docs.google.com/document/d/1suvd7z7YjNKWOwgko-vJ84jfGuxSYZjOQlw5leCswPY
>

--
John-Mark

cti-users message