cti-users message

Subject: Re: [cti-users] STIX 2.0 Pattern Expressiveness
From: John-Mark Gurney <jmg@newcontext.com>
To: "Kirillov, Ivan A." <ikirillov@mitre.org>
Date: Tue, 3 Jan 2017 12:01:13 -0800
Kirillov, Ivan A. wrote this message on Fri, Dec 23, 2016 at 16:05 +0000:
> Thanks for the great feedback on STIX Patterning! Overall, I think we had considered many of the points that you’ve raised, and pushed back on them to focus on a “minimum viable product” release of STIX Patterning that would be useful for the vast majority of basic patterns (i.e., those seen in the wild today). However, I think many of these are great topics for a future release of STIX Patterning.
> 
> (1), (2): This isn’t something we’ve considered, though I agree that it’s a useful and likely necessary capability. I think we could probably use the same syntax for testing for absent dictionary keys and Object Paths. I rather like the "[file:size != nil]" syntax that you’ve proposed for this purpose.
> 
> (3): This was done intentionally, as we felt that for the initial patterning release it would be simpler and more consistent to have the Object Path always on the LHS and literal value on the RHS. However, I think it’s likely that in a future release we will allow Object Paths on the RHS as well.
> 
> (4): Good point here, I can see how the divergence between dictionary and list type lookup syntax is odd. I’m wondering if we should just use square bracket notation for dictionary values (as in Python) as well? E.g., “file:hashes[MD5]” instead of “file:hashes.MD5”. This would also allow us to use an equivalent syntax for ANY.

Yeah, this is a tricky one, and complicated by the fact that some keys
have a hyphen (minus sign) in them making them not safe from a parsing
perspective...  There was discussion of using ["key"], but was nixed
partly due to the extra characters needed, and when going to deep
dictionaries, you'd have something like
network-traffic:extended_properties["http-ext"]["request_header"]["Basic-Auth"]

> (5) I think we can consider adding support for something like ALL, SOME, and EXISTS in a future release.
> 
> (6) As you mentioned, right now we can support value constraints using AND with the same property. If there are indeed constraints that we can’t express using this notation (maybe for timestamps?), then I think adding something like a BETWEEN or INRANGE operator makes sense.

There is the SQL CONTAINS operator, and that can be implemented by
expanding it out...

It may not be pretty, but for a while there, PostgreSQL wouldn't using
indexes when using the CONTAINS operator, but *would* when it was
spelled out to it's equivalent...

There are lots to add, but we wanted to make sure we go something usable
out for 2.0...

> (7), (8): We’ve thought a bit about defining function in the language and decided to postpone them to a future release. Most of our discussion has been on functions for casting back and forth between constant types (e.g., hex -> integer), but other types of functions for primitive types definitely makes sense. As far as user-defined functions, this isn’t something we’ve discussed, and while I see the utility in them I think they would also have the potential to make patterning much more complex, especially for implementers/consumers. That said, I think it’s an interesting idea and one worth discussing amongst our community.

The biggest issue w/ adding UDF's is that it now means that the
patterning language is not universally readable...  Once we add UDF's,
we now have to decide how to tell the consumer which functions they
need, etc.  It's something that will take a lot of work, and my gut
feeling is that we really don't want to support too flexible UDF's due
to compatibility.

Now there isn't anything preventing someone creating a "new" language
that is similar to this, but w/ a few UDF's defined, and putting the
name in the pattern_lang field..  In fact, we could do UDF's that way,
where the [future] pattern_lang field is "stix+ext1+ext2" to document
up front which UDF's are required, and allowing implementations that
do not implement the various UDF's to not have problems...

Some UDF's can also be implemented by adding custom properties to the
CybOX objects ahead of time...  When we support more generic functions,
things will get interesting...

Don't forget the more functions that are added, the slower matching
will be, and it'll be an interesting issue to decide on how to
optimize it...

> (9), (10): We’ve briefly discussed aggregator functions and I think it’s certainly something we can add once we incorporate functions in general. As far as arbitrary mathematical expressions, they are currently not legal, though this is also something that we’ll likely add in a later release as well (we actually had them in an early draft and decided to remove them for the sake of simplicity).
> 
> (11): This has also been discussed, and will likely be implemented in a future release. One possibility we’ve floated for such a capability is to add the ability to define variables and accordingly substitute them in Object Paths. E.g., [{0} = “foo.dll” AND file:name = {0}] ALONGWITH [win-registry-key:key MATCHES {0}].

This is also a big one for me..  Trying to figure out how to use data
from one object, to match against data on another object...  These will
be needed to help coorelate the data between CybOX objects, but got to
crawl before we can walk.. :)

Thanks for your comments!

> From: <cti-users@lists.oasis-open.org> on behalf of Nick Dimiduk <ndimiduk@gmail.com>
> Date: Thursday, December 22, 2016 at 2:35 PM
> To: "cti-users@lists.oasis-open.org" <cti-users@lists.oasis-open.org>
> Subject: [cti-users] STIX 2.0 Pattern Expressiveness
> 
> Hello,
> 
> I'm new to STIX and I've been evaluating the use of STIX 2.0RC3 Patterns ([0]) for some use-cases. I find it to be quite a powerful tool. However, there are a couple concepts I don't know how to express. I'm hoping the community might be able to help me out -- either there's a usage I don't see (missed in my reading) or there's an oversight in the language. For any of the latter, I hope I'm not too late for my suggestions to be considered for 2.0 timelines.
> 
> (1) Dictionary key absence. For instance, looking for some network-traffic property AND the absence of a specific HTTP header. Do I treat the dictionary as a collection and use NOT IN comparison operator? That seems to violate the grammar rules, which say the LHS (left-hand side) of an IN clause is the Object Path and the RHS (right-hand side) is a set of constant values.
> 
> (2) Related to (1), the same syntax question is raised for optional Object Paths (properties) which are not present. For example, a path like "file:size" describes an optional property of the File Object. How to check for (the absence of) that field? Is there a "null/nil/None" object value that can be checked for? What's the syntax for the check? What primitive types support it? Some candidate syntax comes to mind: "[file:size != nil]" or maybe "[file:size IS NOT null]". Where is that discussed in the spec?
> 
> (3) Also related to (1), the comparison expression is always Object Path on the LHS, literal value on the RHS. This is inflexible, and means there's no way to compare two Object Paths to each other. It also means I cannot check to see if Object Path A is present in Object Path B where B is a collection.
> 
> (4) Speaking of collections, collection object lookup syntax is divergent. For list types (Part 4, Section 5.2), we have a 0-based index with square-brackets ('[]'). We also have a convenience syntax of "list_property[*]" as syntactic sugar for the logical ANY operator from SQL. However, dictionary types (Part 4, Section 5.3) are referenced with "dot-notation", just like any object property. This obscures the property's type for a casual reader and restricts (or at least confuses) the body of syntax available for expressions on dictionary elements. For example, there's no ANY equivalent for matching dictionary members like we can for lists.
> 
> (5) Related to (4), there appears to be no syntax for the other collection-based logical operators provided by SQL -- ALL, SOME, EXISTS. As mentioned in (2), (3), there is IN syntax, but it's not available for Object Path element collections, only constant set literals.
> 
> (6) While talking about logical operators, I haven't noticed the equivalent of a BETWEEN expression. One must say, for example "[type:property > A AND type:property < B]". I haven't thought in depth on this topic, could be other non-numerical types for which it's less obvious how to express value constraints, for which BETWEEN syntax goes from not just "nice to have" but "required for expression" -- ie, where the semantics of greater-than, less-than do not make sense but BETWEEN does.
> 
> (7) Is there any thought around a library of functions for primitive types? For instance, a length() function that operates on strings.
> 
> (8) The next logical question from (7) is where would UDF's or implementation-specific extensions be installed into the syntax? It would be very powerful to have a Function Object (including a function implementation) that can be exported along with the Indicators that contain Patterns that make use of that function.
> 
> (9) Also from (7), any consideration for aggregation operators/functions? "Match when the combined file size of all attachments on the Email Message is greater than 5mb". Assuming the Email Message Object had a property "attachment_refs" of type object-ref list that's restricted to File Object types, that might look like "[sum(email-message:attachment_refs[*].size) > 5 * 1024 * 1024]".
> 
> (10) It's not clear to me if the RHS from the example in (9) is even valid -- are arbitrary mathematical expressions legal for either LHS or RHS of comparison expressions?
> 
> (11) Capability for backward references. I'd like to be able to refer to the value that matched a previous expression later in the pattern. For instance, "match when an email contains a URL hosted on baddomain.com<http://baddomain.com> and subsequent http traffic contains a 200 success request for that URL". One approach might be to borrow group backreferences from regex syntax, "[(email-message:url_refs[*].value MATCHES '*.baddomain.com/.*\.docx'<http://baddomain.com/.*\.docx'>)] FOLLOWED BY [network-traffic:extensions.http-request-ext.request_value = \1 AND network-traffic:extensions.x-examplecom-http-request-ext.status_code = 200]". I don't quite know how this would work -- you'd want to define capturing groups of arbitrary expressions and also allow for capturing groups to be embedded into the RHS of a MATCHES expression. Tricky.
> 
> Thanks a lot!
> Nick
> 
> [0]: https://docs.google.com/document/d/1suvd7z7YjNKWOwgko-vJ84jfGuxSYZjOQlw5leCswPY
> 

-- 
John-Mark
Follow-Ups:
- Re: [cti-users] STIX 2.0 Pattern Expressiveness
  - From: Nick Dimiduk <ndimiduk@gmail.com>