OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-cybox message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: simplifying the data model


Hello the list,


I tried to catchup on previous discussions before submitting and this idea seems already identified [0][1], so I’m going to emphasize it with practical examples. We all have different experiences with this new standard, but the real success of STIX/CybOX is that sharing has happened across communities, and I think simplifying the standard will make the adoption easier, and faster.

This submission is based on my own experience of actually implementing the standard, meaning writing code. I spent time on this standard but I don’t aim to be an expert of it, I surely have some misunderstanding. So yes, I’m just sharing thoughts and comments here :)


1. Too many options for one single indicator.

The standard permit too many ways of describing one single indicator. Because of this, it’s really difficult to implement the standard and being fully compliant with it, so it’s a blocker to a wider adoption (how many vendors are actually talking about supporting STIX/CybOX and how many really do it?). 

Example:
To describe an IPv4 address, we can today have the following representations:
- 127.0.0.1, 
- 127.0.0.1/32, 
- 127.0.0.1/255.255.255.0, 
- 127.0.0.1-127.0.0.2, 
- the awful ‘##’ notation (or sometimes a comma separated list of values),
- etc, and probably others.

If the standard were allowing only one way to describes IPv4 addresses, like using the CIDR notation (127.0.0.1/32), it would be super easy to anyone to actually implement the standard. And there is no loss of information because this CIDR notation cover all possibles IPs or Range of IPs. Eventually, for convenience, we may want to have 2 formats: the CIDR notation and the single IP notation, but no more than that.

Don’t get me wrong, I’m not saying an analyst shouldn’t be able to input different format of IPv4 in the software it uses (like Soltra Edge for example), I’m saying that particular need is out of the scope of the standard. This is the goal of the software to do the transformation to what the standard is expecting (the CIDR notation).

In short, I think the problem here is that the standard cover some things that should be part of software specifications and not part of the standard itself.


2. Logic errors

Because current objects are not atomic, that could lead to logic errors and a lot of confusion, like in the following example. Regarding the current specifications, the following object is valid (and validated by the script stix_validator.py):

    <cybox:Properties xsi:type="AddressObj:AddressObjectType" category=“e-mail">
        <AddressObj:Address_Value>pouet@whatever.tld</AddressObj:Address_Value>
        <AddressObj:VLAN_Name>This is the name of a VLAN</AddressObj:VLAN_name>
    </cybox:Properties>

This is a valid object mixing an email definition and a VLAN name, which in my understanding, has no meaning. Note also that I let the “Address_Value” for demonstration purpose, but the very same object is still valid without this field, which is even more awkward.


3. Too many objects

Another problem is that we can describe the same information with different objects. If we keep the previous example of the e-mail, to define an email address we can at least choose between an EmailMessageObject or an AddressObject.

This is even more complicated with DNS related objects: HostnameObject, AddressObject, DomainNameObject, DNSCacheObject, DNSQueryObject, DNSRecordObject, etc.


4. Wide objects but still missing coverage

To my understanding, the only way to describe a MAC Address in CybOX is by using an AddressObject with the field category set to “mac”. This cover the definition of a MAC address, but it doesn’t tell me the format of the MAC address itself (is the separator a hyphen? a semicolon? none? dots?, are the characters grouped by 2 or 4? what’s the constructor associated with the first 3 bytes of the Mac? etc).

By extension, to update an atomic indicator over time seems easier than updating complex types like the AddressObject is today. For example, if we extend the AddressObject type for a full coverage of MAC Addresses, we probably have greater chances of side effects in the existing products, rather than if we were using a dedicated atomic object.


5. Lists of Objects

Another fact is that CybOX allows the notation ‘##’ as an attempt to describe a list of objects. I think this notation is all except efficient nor convenient, so I see 2 options here:

1- We don’t need lists, the standard already allows to describe multiple objects of the same nature multiple times within the same IOC file, so no need at all of this in the standard. So, no lists are needed, this notation disappear.

2- We need lists, which means we need a proper object to handle lists, and not a trick like the current notation is. For example, something like  <ListObj name=“myList”><obj1>,<obj2>, ... </ListObj> (or maybe the “relatedTo” could do the job?)



6. Benefits

To my understanding, the benefits of this reduction or simplification of the standard are:
- Easier implementation, either its from scratch or using existing libraries
- Wider adoption, because of the previous point
- Objects becomes building blocks, easier to work with to start building real logic within the IOCs.
- By having atomic objects, we avoid logic errors.
- Faster code execution due to less conditional branching required in the code.






To follow up on Trey’s proposal on creating working groups, I would be happy to join the one on Simplifying the Data Model. I would like to see the standard going into a direction where atomic objects are defined, just like we have atomic types in C (int, char, char *, etc). Only based on those few types, we can build a whole operating system with complex rendering. Of course they are many implications of such move, starting with backward compatibility, but I think it’s good for the adoption of the standard by a wider audience. 


To conclude, and as a generic rule, indicators should be atomic, building blocks, meaning they should be in a form that cannot be reduced anymore, and they shouldn’t be ambiguous. In short, keep it simple :)


Happy to discuss during BH/Defcon.




Thanks,
Cedric


--

Cédric Le Roux

Principal Security Engineer

Minister of Segfault

Splunk Inc.

cleroux@splunk.com


Paris | San Francisco | Cupertino | London | Hong Kong | Washington D.C. | Seattle | Singapore | Munich









[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]