Re: [cti] JSON or what???

cti message

Subject: Re: [cti] JSON or what???

From: Joep Gommers <joep@eclecticiq.com>

To: "Jordan, Bret" <bret.jordan@bluecoat.com>, Paul Patrick <ppatrick@isightpartners.com>

Date: Thu, 19 Nov 2015 14:16:01 +0000

Paul, All,

See attached the work we shared previously on this list on how we implement STIX in JSON. This might help thinking about patterns, which I believe is partially being adopted in the STIX 2.0 discussions.

Be happy to discuss usage of our libraries offline if interested.

Best regards,

Joep

From: <cti@lists.oasis-open.org> on behalf of "Jordan, Bret" <bret.jordan@bluecoat.com>
Date: Wednesday, November 18, 2015 at 4:12 PM
To: Paul Patrick <ppatrick@isightpartners.com>
Cc: Jerome Athias <athiasjerome@GMAIL.COM>, "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>, Aharon Chernin <achernin@soltra.com>, "jwunder@mitre.org" <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: Re: [cti] JSON or what???

EclecticIQ might be willing to share an example or two from their JSON STIX implementation. You can see what we did in JSON TAXII and compare that with XML TAXII and get a really good idea of how it would look in STIX land.

1) Remember the reason it will be easier is developers prefer JSON and thus are more familiar with working with it.

2) JSON types map to code type

3) No namespace and xsi-type cruft to deal with

4) Generally a flatter and easier to consume structure.

Yes it will take a bit of work to get the JSON binding done. It is not as simple as just a direct conversion from XML. Using the UML models to go to JSON versions the XSDs to go to JSON is a LOT easier.

Thanks,

Bret

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Nov 18, 2015, at 06:50, Paul Patrick <ppatrick@isightpartners.com> wrote:

I tend agree with Jerome here. I hear a lot of statements about the simplicity of JSON and yet I hear that a straight transform from XML to JSON isn’t so pretty. I suspect a bunch of us have read papers, presentations, etc, but for me I like to see something real in a head to head comparison.

By any chance are they any samples that show a comparison between a STIX example in the XML format and the proposed JSON format? What would be great would be if someone would take a handful of the idioms for STIX and show the equivalent in JSON.

Paul Patrick

iSIGHT Partners

From: <cti@lists.oasis-open.org> on behalf of Jerome Athias <athiasjerome@GMAIL.COM>
Date: Friday, November 13, 2015 at 11:35 PM
To: "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>
Cc: "achernin@soltra.com" <achernin@soltra.com>, "bret.jordan@bluecoat.com" <bret.jordan@bluecoat.com>, "jwunder@mitre.org" <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: [cti] Re: JSON or what???

In short
Why JSON?

Could you put more efforts on showing us how you can convince majority of us?

Study papers, presentations with pros/cons...

Sell me your thing

On Friday, 13 November 2015, Taylor, Marlon <Marlon.Taylor@hq.dhs.gov> wrote:

Changed the thread title since the topic changed.

We had several discussions about JSON in the past with no result of a complete STIX implementation. XML to JSON, as a format, can be done. I think we should show the JSON validation mechanism(s) that will be used by the CTI/SC to assure producers/consumers that we can provide means of testing schema/spec conformity.

-Marlon

From: Aharon Chernin [mailto:achernin@soltra.com]
Sent: Friday, November 13, 2015 02:18 PM
To: Jordan, Bret <bret.jordan@bluecoat.com>; Jerome Athias <athiasjerome@GMAIL.COM>
Cc: Wunder, John A. <jwunder@mitre.org>; cti@lists.oasis-open.org <cti@lists.oasis-open.org>
Subject: Re: [cti] The Adaptive Object-Model Architectural Style

I am sold on JSON. Is there an argument against JSON? If so, let’s here it so that we can hash through it.

Aharon

From: <cti@lists.oasis-open.org> on behalf of "Jordan, Bret" <bret.jordan@bluecoat.com>
Date: Friday, November 13, 2015 at 10:42 AM
To: Jerome Athias <athiasjerome@GMAIL.COM>
Cc: "Wunder, John A." <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: Re: [cti] The Adaptive Object-Model Architectural Style

The reason I push for JSON is all of the developers and CTOs I have talked to in various organizations, companies, vendors, and open-source groups always ask for "anything but XML". I then ask what would you prefer, and they all say, without exception "JSON".

So a novel idea... Lets give them what they want, JSON and a simple STIX 2.0 model, and lets drive for massive adoption. Our number 1 goal should be adoption followed up by a model that can meet at least 70-80% of the market use cases.

Lets get STIX 2.0 support in every networking product, every security tool, and every security broker. Then, as we gain massive adoption, lets iterate and figure out what we need to do solve the problems we are running in to. Lets first get adoption, and I do not mean a few niche groups here and there and one large eco-system. I am talking about every networking and security product on the planet.

I want to remove as many hurdles development shops have against STIX. I want to make it so easy for them to adopt it that there is no question of them adopting it. I do not want to see more groups go off and do their own thing or move over to FB's ThreatExchange or OpenTPX.

It would be a great problem to have, where we had SO MUCH adoption and SO MANY STIX documents flowing across the network each day that we had to do something to address the load. That would be a GREAT problem to have.

Thanks,

Bret

Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Nov 13, 2015, at 11:26, Jerome Athias <athiasjerome@GMAIL.COM> wrote:

I do appreciate the let's do it, if it is not a just do it.
For JSON approach, I just would like to see (by facts) what the % of
use cases/requirements it can cover, and when.

2015-11-13 21:17 GMT+03:00 Jordan, Bret <bret.jordan@bluecoat.com>:

John this is really well said.

I feel like we listened to every possible user requirement out there for
STIX 1.0 and we tried to create a data-model that could solve every possible
use case and corner case regardless of how small. The one thing we sorely
forgot to do is figure out what can developers actually implement in code or
what are product managers willing to implement in code.

Lets make STIX 2.0 something that meets 70-80% of the use cases and can
actually be implemented in code by the majority of software development
shops. Yes, I am talking about a STIX Lite. People can still use STIX 1.x
if they want everything. Over time we can add more and more features to the
STIX 2.0 branch as software products that use CTI advance and users can do
more and more with it.

Lets start with JSON + JSON Schema and go from there. I would love to have
to migrate to a binary solution or something that supports RDF in the future
because we have SO MUCH demand and there is SO MUCH sharing that we really
need to do something.

1) Lets not put the cart before the horse
2) Lets fail fast, and not ride the horse to the glue factory
3) Lets start small and build massive adoption.
4) Lets make things so easy for development shops to implement that there is
no reason for them not to

Thanks,

Bret

Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can
not be unscrambled is an egg."

On Nov 13, 2015, at 08:09, Wunder, John A. <jwunder@mitre.org> wrote:

So I’ve been waiting for a good time to outline this and I guess here is as
good a place as any. I’m sure people will disagree, but I’m going to say it
anyway :)

Personally I think of these things as four levels:

- User requirements
- Implementations
- Instantiation of the data model (XML, JSON, database schemas, an object
model in code, etc)
- Data model

User requirements get supported in running software. Running software uses
instantiations of the data model to work with data in support of those user
requirements. The data model and specification define the instantiations of
the data and describe how to work with them in a standard way.

The important bit here is that there’s always running software between the
user and the data model. That software is (likely) a tool that a vendor or
open source project supports that contains custom code to work specifically
with threat intel. It might be a more generic tool like Palantir or whatever
people do RDF stuff with these days. But there’s always something.

This has a couple implications:

- Not all user requirements get met in the data model. It’s perfectly valid
to decide not to support something in the data model if we think it’s fine
that implementations do it in many different ways. For example,
de-duplication: do we need a standard approach or should we let tools decide
how to do de-duplication themselves? It’s a user requirement, but that
doesn’t mean we need to address it in the specs.

- Some user requirements need to be translated before they get to the data
model. For example, versioning: users have lots of needs for versioning.
Systems also have requirements for versioning. What we put in the specs
needs to consider both of these.

- This is the important part: some user requirements are beyond what
software can do today. I would love it if my iphone would get 8 days of
battery life. I could write that into some specification. That doesn’t mean
it’s going to happen. In CTI, we (rightfully) have our eyes towards this end
state where you can do all sorts of awesome things with your threat intel,
but just putting it in the data model doesn’t automatically make that
happen. We’re still exploring this domain and software can only do so much.
So if the people writing software are telling us that the user requirements
are too advanced (for now), maybe that means we should hold off on putting
it in the data model until it’s something that we can actually implement? In
my mind this is where a lot of the complexity in STIX comes from: we
identified user requirements to do all these awesome things and so we put
them in the data model, but we never considered how or whether software
could really implement them. The perfect example here is data markings:
users wanted to mark things at the field level, most software isn’t ready
for that yet, and so we end up with data markings that are effectively
broken in STIX 1.2. This is why many standards bodies have requirements for
running code: otherwise the temptation is too great to define specification
requirements that are not implementable and you end up with a great spec
that nobody will use.

Sorry for the long rant. Been waiting to get that off my chest for awhile
(as you can probably tell).

John

On Nov 13, 2015, at 9:17 AM, Jerome Athias <athiasjerome@GMAIL.COM> wrote:

sorry for the others if off-topic.

Remember that a software is good only if it satisfies the users (meet,
or exceed, their requirements).
You can write 'perfect/optimized' code. If the users are not
satisfied; it's a bad software.

Then,
"If you can't explain it simply, you don't understand it well
enough.", Albert Einstein

Challenges are exciting, but sometimes difficult. It's about
motivation and satisfaction.

There is not programming language better than an other (just like OS);
it is just you that can select the best for your needs.

I did a conceptual map for the 'biggest Ruby project of the internet'
(Metasploit Framework), it's just a picture, but represents 100 pages
of documentation.
I think we could optimize (like for a maturity model) our approach of
resolving problems.

2015-11-13 17:02 GMT+03:00 John Anderson <janderson@soltra.com>:

The list returns my mail, so probably you'll be the only one to get my
reply.

Funny, I missed that quote from the document. And it's spot on. As an
architect myself, I have built several "elegant" architectures, only to
find that the guys who actually had to use it just. never. quite. got it.
(sigh)

My best architectures have emerged when I've written test code first.
("Test-first" really does work.) I've learned that writing code--while
applying KISS, DRY and YAGNI--saves me from entering the architecture
stratosphere. That's why I ask the architects to express their creations in
code, and not only in UML.

I'm pretty vocal about Python, because it's by far the simplest popular
language out there today. But this principal applies in any language: If the
implementation is hard to explain, it's a bad idea. (Another quote from the
Zen of Python.) Our standard has a lot that's hard to explain, esp. to
new-comers. How can we simplify, so that it's almost a no-brainer to adopt?

Again, thanks for the article, and the conversation. I really do appreciate
your point-of-view,
JSA

________________________________________
From: Jerome Athias <athiasjerome@gmail.com>
Sent: Friday, November 13, 2015 8:45 AM
To: John Anderson
Cc: cti@lists.oasis-open.org
Subject: Re: [cti] The Adaptive Object-Model Architectural Style

Thanks for the feedback.
Kindly note that I'm not strongly defending this approach for the CTI
TC (at least for now).
Since you're using quotes:
"Architects that develop these types of systems are usually very proud
of them and claim that they are some of the best systems they have
ever developed. However, developers that have to use, extend or
maintain them, usually complain that they are hard to understand and
are not convinced that they are as great as the architect claims."

This, I hope could have our developers just understand
that what they feel difficult sometimes, is not intended to be
difficult per design, but because we are dealing with a complex domain
and
that the use of abstraction/conceptual approaches/ontology have benefits

Hopefully we can obtain consensus on a good balanced adapted approach.

2015-11-13 16:24 GMT+03:00 John Anderson <janderson@soltra.com>:

Jerome,
Thanks for the link. I really enjoy those kinds of research papers.

On Page 20, the section "Maintaining the Model" [1] states pretty clearly
that this type of architecture is very unwieldy, from an end-user
perspective; consequently, it requires a ton of tooling development.

The advantage of such a model is that it's extensible and easily changed.
But I'm not convinced that extensibility is really our friend. In my
(greatly limited) experience, the extensibility of STIX and CybOX have made
them that much harder to use and understand. I'm left wishing for "one
obvious way to do things." [2]

If I were given the choice between (1) a very simple data model that's not
extensible, but clear and easy to approach and (2) a generic, extensible
data model whose extra layers of indirection make it hard to find the actual
data, I'd gladly choose the first.

Keeping it simple,
JSA

[1] The full wording from "Maintaining the Model":
The observation model is able to store all the metadata using a
well-established
mapping to relational databases, but it was not straightforward
for a developer or analyst to put this data into the database. They would
have to learn how the objects were saved in the database as well as the
proper semantics for describing the business rules. A common solution to
this is to develop editors and programming tools to assist users with using
these black-box components [18]. This is part of the evolutionary process of
Adaptive Object-Models as they are in a sense, “Black-Box” frameworks,
and as they mature, they need editors and other support tools to aid in
describing and maintaining the business rules.

[2] From "The Zen of Python": https://www.python.org/dev/peps/pep-0020/

________________________________________
From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of
Jerome Athias <athiasjerome@gmail.com>
Sent: Friday, November 13, 2015 5:20 AM
To: cti@lists.oasis-open.org
Subject: [cti] The Adaptive Object-Model Architectural Style

Greetings,

realizing that the community members have different background,
experience, expectations and use of CTI in general, from an high-level
(abstracted/conceptual/ontology oriented) point of view, through a
day-to-day use (experienced) point of view, to a technical
(implementation/code) point of view...
I found this diagram (and document) interesting while easy to read and
potentially adapted to our current effort.
So just wanted to share.

http://www.adaptiveobjectmodel.com/WICSA3/ArchitectureOfAOMsWICSA3.pdf

Regards

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

--- Begin Message ---

From: Joep Gommers <joep@intelworks.com>
To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Date: Tue, 16 Jun 2015 16:09:17 +0200

Dear all,

With the excellent work going on from @Bret Jordan on STIX in JSON, we thought it helpful to share Intelworks approach to STIX in JSON and ensure the community learned from our mistakes and investments. Props to list- and team member @Wouter Bolsterlee for his work on this!

In short, lessons learned

Compound structures are objects
Attributes and child elements are key/value pairs
Relations are nested objects (or arrays of objects)
Flat is better than nested
And some ID and corner cases see below

Full details further down in this email. Your feedback is much appreciated.

We do have work-in-progress libraries available for (store-less) bi-directional transformation of XML, JSON and YAML notations which might help those implementing STIX in JSON down the road. If you¹d like to know more, please contact me off-list.

Best regards,

Joep

Founder & CEO

Intelworks Intelligence Powered Defence

www.intelworks.com

Find me at

+31 615489825

@joepgommers

====

The STIX language uses quite a few advanced XML modelling techniques (multiple namespaces, xsi:type substitutions in instance documents, QName identifiers, and so on), making it quite complex to work with/implement. The JSON format used by Intelworks tries to be much simpler to work with. Structurally it mirrors most of the original XML tree structure, but the resulting tree structures are not identical since the JSON representation favours flat objects over nested structures.

Compound structures are objects

In general, each compound structure is converted into a JSON object (dict in Python). These objects always have atype key to indicate the type of the structure:

{
  "type": "indicator",
  "...": "..."
}

Each of the main STIX constructs (see the STIX architecture) is represented as a JSON object. The type keys used are:

Defining schema	XML Schema type	Object `type` field
STIX (Core)	`STIXType`	`package`
STIX (Campaign)	`CampaignType`	`campaign`
STIX (Course of Action)	`CourseOfActionType`	`course-of-action`
STIX (Exploit Target)	`ExploitTargetType`	`exploit-target`
STIX (Incident)	`IncidentType`	`incident`
STIX (Indicator)	`IndicatorType`	`indicator`
STIX (TTP)	`TTPType`	`ttp`
STIX (Threat Actor)	`ThreatActorType`	`threat-actor`
CybOX	`ObservableType`	`observable`

Secondary constructs use these additional types (this list is NON EXHAUSTIVE! And just a representation of potential)

Defining schema	XML Schema type	Object `type` field
STIX (Common)	`IdentityType`	`identity`
STIX (Common)	`InformationSourceType`	`information-source`
STIX (Common)	`StatementType`	`statement`
STIX (Course of Action)	`ObjectiveType`	`objective`
STIX (Indicator)	`ValidTimeType`	`valid-time`
STIX (Markings)	`MarkingSpecificationType`	`marking-specification`
STIX (Markings)	`MarkingStructureType` (and extensions)	`marking-structure`
STIX (TTP)	`InfrastructureType`	`infrastructure`
STIX (TTP)	`MalwareInstanceType`	`malware-instance`
STIX (TTP)	`ResourceType`	`resource`
STIX (TTP)	`ToolInformationType`	`tool-information`
STIX (TTP)	`VictimTargetingType`	`victim-targeting`
CybOX	`MeasureSourceType`	`measure-source`
CybOX	`ObjectType`	`cybox-object`
CybOX	`ToolInformationType`	`tool-information`

Attributes and child elements are key/value pairs

Both the attributes and child elements defined for a compound structure usually map to additional key/value pairs of the JSON objects:

{
  "type": "indicator",
  "negate": false,
  "title": "This is the title."
}

Relations are nested objects (or arrays of objects)

For one-to-one relations, the value is a nested object, and the key is a singular noun (observable in the example):

{
  "type": "indicator",
  "observable": {
    "type": "observable",
    "...": "..."
  },
  "...": "..."
}

For one-to-many relations, the value is a JSON array containing the child objects, and the key is a plural noun (indicators in the example):

{
  "type": "package",
  "indicators": [
    {
      "type": "indicator",
      "...": "..."
    },
    {
      "type": "indicator",
      "...": "..."
    }
  ],
  "...": "..."
}

Additionally, the many RelatedXYZ constructs (and the surrounding container objects) in STIX are also flattened: the target of the relation is the child object (or a list of those), and any additional relationship information is embedded into the child object(s):

{
  "type": "indicator",
  "indicated_ttps": [
    {
      "type": "ttp",
      "relationship": "...",
      "relationship_information_source": "...",
      "...": "..."
    },
    {
      "type": "ttp",
      "relationship": "...",
      "relationship_information_source": "...",
      "...": "..."
    }
  ],
  "...": "..."
}

Flat is better than nested

The STIX XML representation is deeply nested, partly due to the way XML is typically used. The JSON representation tries to be a bit more pragmatic and adheres to the "flat is better than nested" adage.

In practice, this means that nested container structures are flattened as much as possible. Unnecessary container structures are simply removed. For example, the <stix:Indicators> container structure used in the XML representation does not exist as such in the JSON representation, since using an array is sufficient.

To further reduce the number of nested objects, various XML constructs using container elements with (optional) attributes are flattened into the parent object by using multiple related keys. This is best explained using an example.

For example, the StructuredTextType used in both STIX and CybOX is basically a string that can optionally carry astructuring_format attribute. A naive conversion would require a nested object to represent this:

{
  "type": "...",
  "description": {
    "structuring_format": "html",
    "value": "Description goes here."
  },
  "...": "..."
}

Since the structuring_format is optional, this approach would often result in a small nested object with only a single key/value pair (the value). To avoid this, objectivistix takes an alternative approach using two related keys in the containing object:

{
  "type": "...",
  "description": "Description goes here.",
  "description_structuring_format": "html",
  "...": "..."
}

In case the structuring_format is not specified, the description_structuring_format key/value pair would simply not be present:

{
  "type": "...",
  "description": "Description goes here.",
  "...": "..."
}

ID handling

All id and idref attributes in STIX XML are not simply string values, but qualified names (QName in XML), meaning that they contain a namespace prefix which resolves to a namespace URI. To avoid any explicit mappings for these prefixes and their associated namespace URI, the JSON representation always expresses id and idref values in their canonical form using the so-called Clark notation, which looks like this: {http://example.com/ns/uri}local-name.

The top level object may optionally contain an id_namespaces mapping that maps prefixes to namespace URIs. This mapping will be used to determine the prefixes used for id and idref attribute values when converting the object to XML, as illustrated by the example below:

{
  "type": "package",
  "id": "{http://example.org/}Package-b3ba766b-d3e6-4d92-82b2-5940f0cb763c",
  "id_namespaces": {
    "example": "http://example.org/"
  }
}

<stix:STIX_Package
  xmlns:stix="http://stix.mitre.org/stix-1"
  xmlns:example="http://example.com/"
  id="example:Package-b3ba766b-d3e6-4d92-82b2-5940f0cb763c">
  Š
</stix:STIX_Package>

In case no id_namespaces mapping is present, a unique namespace prefix will be used instead. The id_namespaces can safely be left out with no semantical loss, since the prefix is arbitrary and only used for serialized XML data, and not for the in-memory model.

Special conversion notes

STIX package header

The package header is not treated as a first-class structure. Since the STIX_Header construct only applies toSTIX_Package, it is merged completely into the main package object (this avoids having an additional nested object for the header):
```
{
  "type": "package",
  "description": "Description goes here.",
  "...": "..."
}
```
Structured text

The StructuredTextType construct is not transformed into a child object. Instead, the keys foo and (optionally)foo_structuring_format are added to the containing object.

Observable composition

An Œobservable composition¹ structure does not result in a nested object for the composition itself. Instead, thecomposition key contains the child objects, and the composition_operator specifies the operator:

{
  "type": "indicator",
  "observable": {
    "composition_operator": "or",
    "composition": [
      {
        "type": "observable",
        "...": "..."
      },
      {
        "type": "observable",
        "...": "..."
      }
    ]
  },
  "...": "..."
}

--- End Message ---

Follow-Ups:

Re: [cti] JSON or what???
- From: Paul Patrick <ppatrick@isightpartners.com>

References:

Re: [cti] The Adaptive Object-Model Architectural Style
- From: Aharon Chernin <achernin@soltra.com>
JSON or what???
- From: "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>
Re: JSON or what???
- From: Jerome Athias <athiasjerome@gmail.com>
Re: [cti] Re: JSON or what???
- From: Paul Patrick <ppatrick@isightpartners.com>
Re: [cti] JSON or what???
- From: "Jordan, Bret" <bret.jordan@bluecoat.com>