[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [cti-users] STIX 2.0 Pattern Expressiveness
Nick Dimiduk wrote this message on Tue, Jan 03, 2017 at 13:23 -0800: > On Tue, Jan 3, 2017 at 12:01 PM, John-Mark Gurney <jmg@newcontext.com> > wrote: > > > > (4): Good point here, I can see how the divergence between dictionary > > and list type lookup syntax is odd. I’m wondering if we should just use > > square bracket notation for dictionary values (as in Python) as well? E.g., > > “file:hashes[MD5]” instead of “file:hashes.MD5”. This would also allow us > > to use an equivalent syntax for ANY. > > > > Yeah, this is a tricky one, and complicated by the fact that some keys > > have a hyphen (minus sign) in them making them not safe from a parsing > > perspective... There was discussion of using ["key"], but was nixed > > partly due to the extra characters needed, and when going to deep > > dictionaries, you'd have something like > > network-traffic:extended_properties["http-ext"][" > > request_header"]["Basic-Auth"] > > Actually I prefer both the use of strings as keys, and keeping the > brackets. In practice, I think dictionaries will be mostly used for > dictionaries that map directly to observed data dictionary fields. There's > no requirement that these keys conform to the "shape" of a STIX property > per the spec (ie, requirements around use of dash vs underscore). The spec > also says these keys are case-sensitive, such as the http request headers. > Thus, IMHO it's better to treat them as the data that they are, with this > explicit syntax. > > Also, IIRC, per RC4, your example would be > > network-traffic:extensions['http-request-ext']['request_header']['Basic-Auth'] per chapter 5 of WD01 ( https://docs.google.com/document/d/1suvd7z7YjNKWOwgko-vJ84jfGuxSYZjOQlw5leCswPY/edit#heading=h.i7kzkq2evwxj ), it would be: network-traffic:extensions.'http-request-ext'.request_header.'Basic-Auth' There are no metion of square brackets in the spec... Looks like my original was incorrect as I'm pretty sure I forgot the single quotes around them... > Yes, verbose, but also clear that it's "stepping into" userdata. Compared > to: > > network-traffic:extensions.http-request-ext.request_header.Basic-Auth > > I think the former is less confusing to read once you get to the > 'Basic-Auth' bit. > > If verbosity is truly a concern, it seems like there's other places to save > some characters: short-hand aliases for object types, 'exts' instead of > 'extensions', 'httpreq' instead of 'http-request-ext' (you know it's an > extension, after all), 'header' instead of 'request_header' (you already > know it's an http request, there's no response data encoded anywhere), &c. > We could also do away with "extensions" and support conceptual sub-typing. > In that case, an 'http request' would be in an 'is a' relationship with > 'network-traffic' and the whole 'extensions' part of the object-path goes > away. > > Quite the contrary, my impression of this specification is to support > interoperability through specificity. If I'm correct, I would think > verbosity would be a secondary consideration from precision. > > > (7), (8): We’ve thought a bit about defining function in the language and > > decided to postpone them to a future release. Most of our discussion has > > been on functions for casting back and forth between constant types (e.g., > > hex -> integer), but other types of functions for primitive types > > definitely makes sense. As far as user-defined functions, this isn’t > > something we’ve discussed, and while I see the utility in them I think they > > would also have the potential to make patterning much more complex, > > especially for implementers/consumers. That said, I think it’s an > > interesting idea and one worth discussing amongst our community. > > > > The biggest issue w/ adding UDF's is that it now means that the > > patterning language is not universally readable... Once we add UDF's, > > we now have to decide how to tell the consumer which functions they > > need, etc. It's something that will take a lot of work, and my gut > > feeling is that we really don't want to support too flexible UDF's due > > to compatibility. > > > > I agree UDFs opens a can of worms. Forget I mentioned it. I think it could > still be useful to define some "functions" or "methods" for the basic > types. Could be done simply as ephemeral/implicit properties actually. the > 'string' primitive type could have a 'length' property that's available for > use in expressions. There's no parameters to pass, so there's no new syntax > in the grammar. Yeah, we can discuss this for 2.1, but it is too late to add anything to 2.0... > Don't forget the more functions that are added, the slower matching > > will be, and it'll be an interesting issue to decide on how to > > optimize it... > > See my earlier observation re: this being a spec for interoperability. It > would be up to implementations to evaluate patterns efficiently. I agree... > Thanks for your comments! > > > > Thanks for considering them! > > -n > > > From: <cti-users@lists.oasis-open.org> on behalf of Nick Dimiduk < > > ndimiduk@gmail.com> > > > Date: Thursday, December 22, 2016 at 2:35 PM > > > To: "cti-users@lists.oasis-open.org" <cti-users@lists.oasis-open.org> > > > Subject: [cti-users] STIX 2.0 Pattern Expressiveness > > > > > > Hello, > > > > > > I'm new to STIX and I've been evaluating the use of STIX 2.0RC3 Patterns > > ([0]) for some use-cases. I find it to be quite a powerful tool. However, > > there are a couple concepts I don't know how to express. I'm hoping the > > community might be able to help me out -- either there's a usage I don't > > see (missed in my reading) or there's an oversight in the language. For any > > of the latter, I hope I'm not too late for my suggestions to be considered > > for 2.0 timelines. > > > > > > (1) Dictionary key absence. For instance, looking for some > > network-traffic property AND the absence of a specific HTTP header. Do I > > treat the dictionary as a collection and use NOT IN comparison operator? > > That seems to violate the grammar rules, which say the LHS (left-hand side) > > of an IN clause is the Object Path and the RHS (right-hand side) is a set > > of constant values. > > > > > > (2) Related to (1), the same syntax question is raised for optional > > Object Paths (properties) which are not present. For example, a path like > > "file:size" describes an optional property of the File Object. How to check > > for (the absence of) that field? Is there a "null/nil/None" object value > > that can be checked for? What's the syntax for the check? What primitive > > types support it? Some candidate syntax comes to mind: "[file:size != nil]" > > or maybe "[file:size IS NOT null]". Where is that discussed in the spec? > > > > > > (3) Also related to (1), the comparison expression is always Object Path > > on the LHS, literal value on the RHS. This is inflexible, and means there's > > no way to compare two Object Paths to each other. It also means I cannot > > check to see if Object Path A is present in Object Path B where B is a > > collection. > > > > > > (4) Speaking of collections, collection object lookup syntax is > > divergent. For list types (Part 4, Section 5.2), we have a 0-based index > > with square-brackets ('[]'). We also have a convenience syntax of > > "list_property[*]" as syntactic sugar for the logical ANY operator from > > SQL. However, dictionary types (Part 4, Section 5.3) are referenced with > > "dot-notation", just like any object property. This obscures the property's > > type for a casual reader and restricts (or at least confuses) the body of > > syntax available for expressions on dictionary elements. For example, > > there's no ANY equivalent for matching dictionary members like we can for > > lists. > > > > > > (5) Related to (4), there appears to be no syntax for the other > > collection-based logical operators provided by SQL -- ALL, SOME, EXISTS. As > > mentioned in (2), (3), there is IN syntax, but it's not available for > > Object Path element collections, only constant set literals. > > > > > > (6) While talking about logical operators, I haven't noticed the > > equivalent of a BETWEEN expression. One must say, for example > > "[type:property > A AND type:property < B]". I haven't thought in depth on > > this topic, could be other non-numerical types for which it's less obvious > > how to express value constraints, for which BETWEEN syntax goes from not > > just "nice to have" but "required for expression" -- ie, where the > > semantics of greater-than, less-than do not make sense but BETWEEN does. > > > > > > (7) Is there any thought around a library of functions for primitive > > types? For instance, a length() function that operates on strings. > > > > > > (8) The next logical question from (7) is where would UDF's or > > implementation-specific extensions be installed into the syntax? It would > > be very powerful to have a Function Object (including a function > > implementation) that can be exported along with the Indicators that contain > > Patterns that make use of that function. > > > > > > (9) Also from (7), any consideration for aggregation > > operators/functions? "Match when the combined file size of all attachments > > on the Email Message is greater than 5mb". Assuming the Email Message > > Object had a property "attachment_refs" of type object-ref list that's > > restricted to File Object types, that might look like "[sum(email-message:attachment_refs[*].size) > > > 5 * 1024 * 1024]". > > > > > > (10) It's not clear to me if the RHS from the example in (9) is even > > valid -- are arbitrary mathematical expressions legal for either LHS or RHS > > of comparison expressions? > > > > > > (11) Capability for backward references. I'd like to be able to refer to > > the value that matched a previous expression later in the pattern. For > > instance, "match when an email contains a URL hosted on baddomain.com< > > http://baddomain.com> and subsequent http traffic contains a 200 success > > request for that URL". One approach might be to borrow group backreferences > > from regex syntax, "[(email-message:url_refs[*].value MATCHES '*. > > baddomain.com/.*\.docx'<http://baddomain.com/.*\.docx'>)] FOLLOWED BY > > [network-traffic:extensions.http-request-ext.request_value = \1 AND > > network-traffic:extensions.x-examplecom-http-request-ext.status_code = > > 200]". I don't quite know how this would work -- you'd want to define > > capturing groups of arbitrary expressions and also allow for capturing > > groups to be embedded into the RHS of a MATCHES expression. Tricky. > > > > > > Thanks a lot! > > > Nick > > > > > > [0]: https://docs.google.com/document/d/1suvd7z7YjNKWOwgko- > > vJ84jfGuxSYZjOQlw5leCswPY -- John-Mark
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]