cti-users message

Subject: Re: [cti-users] STIX 2.0 Pattern Expressiveness
From: John-Mark Gurney <jmg@newcontext.com>
To: Nick Dimiduk <ndimiduk@gmail.com>
Date: Mon, 9 Jan 2017 17:17:02 -0800
Nick Dimiduk wrote this message on Tue, Jan 03, 2017 at 13:23 -0800:
> On Tue, Jan 3, 2017 at 12:01 PM, John-Mark Gurney <jmg@newcontext.com>
> wrote:
> 
> > > (4): Good point here, I can see how the divergence between dictionary
> > and list type lookup syntax is odd. I’m wondering if we should just use
> > square bracket notation for dictionary values (as in Python) as well? E.g.,
> > “file:hashes[MD5]” instead of “file:hashes.MD5”. This would also allow us
> > to use an equivalent syntax for ANY.
> >
> > Yeah, this is a tricky one, and complicated by the fact that some keys
> > have a hyphen (minus sign) in them making them not safe from a parsing
> > perspective...  There was discussion of using ["key"], but was nixed
> > partly due to the extra characters needed, and when going to deep
> > dictionaries, you'd have something like
> > network-traffic:extended_properties["http-ext"]["
> > request_header"]["Basic-Auth"]
> 
> Actually I prefer both the use of strings as keys, and keeping the
> brackets. In practice, I think dictionaries will be mostly used for
> dictionaries that map directly to observed data dictionary fields. There's
> no requirement that these keys conform to the "shape" of a STIX property
> per the spec (ie, requirements around use of dash vs underscore). The spec
> also says these keys are case-sensitive, such as the http request headers.
> Thus, IMHO it's better to treat them as the data that they are, with this
> explicit syntax.
> 
> Also, IIRC, per RC4, your example would be
> 
> network-traffic:extensions['http-request-ext']['request_header']['Basic-Auth']

per chapter 5 of WD01 ( https://docs.google.com/document/d/1suvd7z7YjNKWOwgko-vJ84jfGuxSYZjOQlw5leCswPY/edit#heading=h.i7kzkq2evwxj ),
it would be:
network-traffic:extensions.'http-request-ext'.request_header.'Basic-Auth'

There are no metion of square brackets in the spec...  Looks like my
original was incorrect as I'm pretty sure I forgot the single quotes
around them...

> Yes, verbose, but also clear that it's "stepping into" userdata. Compared
> to:
> 
> network-traffic:extensions.http-request-ext.request_header.Basic-Auth
> 
> I think the former is less confusing to read once you get to the
> 'Basic-Auth' bit.
> 
> If verbosity is truly a concern, it seems like there's other places to save
> some characters: short-hand aliases for object types, 'exts' instead of
> 'extensions', 'httpreq' instead of 'http-request-ext' (you know it's an
> extension, after all), 'header' instead of 'request_header' (you already
> know it's an http request, there's no response data encoded anywhere), &c.
> We could also do away with "extensions" and support conceptual sub-typing.
> In that case, an 'http request' would be in an 'is a' relationship with
> 'network-traffic' and the whole 'extensions' part of the object-path goes
> away.
> 
> Quite the contrary, my impression of this specification is to support
> interoperability through specificity. If I'm correct, I would think
> verbosity would be a secondary consideration from precision.
> 
> > (7), (8): We’ve thought a bit about defining function in the language and
> > decided to postpone them to a future release. Most of our discussion has
> > been on functions for casting back and forth between constant types (e.g.,
> > hex -> integer), but other types of functions for primitive types
> > definitely makes sense. As far as user-defined functions, this isn’t
> > something we’ve discussed, and while I see the utility in them I think they
> > would also have the potential to make patterning much more complex,
> > especially for implementers/consumers. That said, I think it’s an
> > interesting idea and one worth discussing amongst our community.
> >
> > The biggest issue w/ adding UDF's is that it now means that the
> > patterning language is not universally readable...  Once we add UDF's,
> > we now have to decide how to tell the consumer which functions they
> > need, etc.  It's something that will take a lot of work, and my gut
> > feeling is that we really don't want to support too flexible UDF's due
> > to compatibility.
> >
> 
> I agree UDFs opens a can of worms. Forget I mentioned it. I think it could
> still be useful to define some "functions" or "methods" for the basic
> types. Could be done simply as ephemeral/implicit properties actually. the
> 'string' primitive type could have a 'length' property that's available for
> use in expressions. There's no parameters to pass, so there's no new syntax
> in the grammar.

Yeah, we can discuss this for 2.1, but it is too late to add anything to 2.0...

> Don't forget the more functions that are added, the slower matching
> > will be, and it'll be an interesting issue to decide on how to
> > optimize it...
> 
> See my earlier observation re: this being a spec for interoperability. It
> would be up to implementations to evaluate patterns efficiently.

I agree...

> Thanks for your comments!
> >
> 
> Thanks for considering them!
> 
> -n
> 
> > From: <cti-users@lists.oasis-open.org> on behalf of Nick Dimiduk <
> > ndimiduk@gmail.com>
> > > Date: Thursday, December 22, 2016 at 2:35 PM
> > > To: "cti-users@lists.oasis-open.org" <cti-users@lists.oasis-open.org>
> > > Subject: [cti-users] STIX 2.0 Pattern Expressiveness
> > >
> > > Hello,
> > >
> > > I'm new to STIX and I've been evaluating the use of STIX 2.0RC3 Patterns
> > ([0]) for some use-cases. I find it to be quite a powerful tool. However,
> > there are a couple concepts I don't know how to express. I'm hoping the
> > community might be able to help me out -- either there's a usage I don't
> > see (missed in my reading) or there's an oversight in the language. For any
> > of the latter, I hope I'm not too late for my suggestions to be considered
> > for 2.0 timelines.
> > >
> > > (1) Dictionary key absence. For instance, looking for some
> > network-traffic property AND the absence of a specific HTTP header. Do I
> > treat the dictionary as a collection and use NOT IN comparison operator?
> > That seems to violate the grammar rules, which say the LHS (left-hand side)
> > of an IN clause is the Object Path and the RHS (right-hand side) is a set
> > of constant values.
> > >
> > > (2) Related to (1), the same syntax question is raised for optional
> > Object Paths (properties) which are not present. For example, a path like
> > "file:size" describes an optional property of the File Object. How to check
> > for (the absence of) that field? Is there a "null/nil/None" object value
> > that can be checked for? What's the syntax for the check? What primitive
> > types support it? Some candidate syntax comes to mind: "[file:size != nil]"
> > or maybe "[file:size IS NOT null]". Where is that discussed in the spec?
> > >
> > > (3) Also related to (1), the comparison expression is always Object Path
> > on the LHS, literal value on the RHS. This is inflexible, and means there's
> > no way to compare two Object Paths to each other. It also means I cannot
> > check to see if Object Path A is present in Object Path B where B is a
> > collection.
> > >
> > > (4) Speaking of collections, collection object lookup syntax is
> > divergent. For list types (Part 4, Section 5.2), we have a 0-based index
> > with square-brackets ('[]'). We also have a convenience syntax of
> > "list_property[*]" as syntactic sugar for the logical ANY operator from
> > SQL. However, dictionary types (Part 4, Section 5.3) are referenced with
> > "dot-notation", just like any object property. This obscures the property's
> > type for a casual reader and restricts (or at least confuses) the body of
> > syntax available for expressions on dictionary elements. For example,
> > there's no ANY equivalent for matching dictionary members like we can for
> > lists.
> > >
> > > (5) Related to (4), there appears to be no syntax for the other
> > collection-based logical operators provided by SQL -- ALL, SOME, EXISTS. As
> > mentioned in (2), (3), there is IN syntax, but it's not available for
> > Object Path element collections, only constant set literals.
> > >
> > > (6) While talking about logical operators, I haven't noticed the
> > equivalent of a BETWEEN expression. One must say, for example
> > "[type:property > A AND type:property < B]". I haven't thought in depth on
> > this topic, could be other non-numerical types for which it's less obvious
> > how to express value constraints, for which BETWEEN syntax goes from not
> > just "nice to have" but "required for expression" -- ie, where the
> > semantics of greater-than, less-than do not make sense but BETWEEN does.
> > >
> > > (7) Is there any thought around a library of functions for primitive
> > types? For instance, a length() function that operates on strings.
> > >
> > > (8) The next logical question from (7) is where would UDF's or
> > implementation-specific extensions be installed into the syntax? It would
> > be very powerful to have a Function Object (including a function
> > implementation) that can be exported along with the Indicators that contain
> > Patterns that make use of that function.
> > >
> > > (9) Also from (7), any consideration for aggregation
> > operators/functions? "Match when the combined file size of all attachments
> > on the Email Message is greater than 5mb". Assuming the Email Message
> > Object had a property "attachment_refs" of type object-ref list that's
> > restricted to File Object types, that might look like "[sum(email-message:attachment_refs[*].size)
> > > 5 * 1024 * 1024]".
> > >
> > > (10) It's not clear to me if the RHS from the example in (9) is even
> > valid -- are arbitrary mathematical expressions legal for either LHS or RHS
> > of comparison expressions?
> > >
> > > (11) Capability for backward references. I'd like to be able to refer to
> > the value that matched a previous expression later in the pattern. For
> > instance, "match when an email contains a URL hosted on baddomain.com<
> > http://baddomain.com> and subsequent http traffic contains a 200 success
> > request for that URL". One approach might be to borrow group backreferences
> > from regex syntax, "[(email-message:url_refs[*].value MATCHES '*.
> > baddomain.com/.*\.docx'<http://baddomain.com/.*\.docx'>)] FOLLOWED BY
> > [network-traffic:extensions.http-request-ext.request_value = \1 AND
> > network-traffic:extensions.x-examplecom-http-request-ext.status_code =
> > 200]". I don't quite know how this would work -- you'd want to define
> > capturing groups of arbitrary expressions and also allow for capturing
> > groups to be embedded into the RHS of a MATCHES expression. Tricky.
> > >
> > > Thanks a lot!
> > > Nick
> > >
> > > [0]: https://docs.google.com/document/d/1suvd7z7YjNKWOwgko-
> > vJ84jfGuxSYZjOQlw5leCswPY

-- 
John-Mark
Follow-Ups:
- Re: [cti-users] STIX 2.0 Pattern Expressiveness
  - From: Nick Dimiduk <ndimiduk@gmail.com>
References:
- Re: [cti-users] STIX 2.0 Pattern Expressiveness
  - From: John-Mark Gurney <jmg@newcontext.com>
- Re: [cti-users] STIX 2.0 Pattern Expressiveness
  - From: Nick Dimiduk <ndimiduk@gmail.com>