OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: The Adaptive Object-Model Architectural Style

The best writeup of JSON schema that I’ve seen is here: http://spacetelescope.github.io/understanding-json-schema/index.html

My understanding is that in terms of validation, the two are roughly equivalent. XML Schema can validate some things JSON Schema can’t, and JSON Schema can validate some things that XML Schema can’t.

Eldan is correct that JSON schema doesn’t use a type-based validation approach: it checks that the structure of the document is what you expect but does not do that via a type system like XML Schema does. I don’t personally have a problem with this but it’s probably the biggest difference between the two.


On Nov 16, 2015, at 3:05 AM, Jerome Athias <athiasjerome@GMAIL.COM> wrote:

Any pointers to some documentation on how you do this?

Best regards

On Monday, 16 November 2015, Jordan, Bret <bret.jordan@bluecoat.com> wrote:
You can validate the data, JSON scheme works great.  We are using it today in TAXII.


Sent from my Commodore 64

On Nov 16, 2015, at 12:26 AM, Jerome Athias <athiasjerome@gmail.com> wrote:

Well if I have information from one tool, like my FW, IDS, AV... And that I have to take them and send them to a human or other tool, and that I want to use JSON to do so  and that I have no constraint of validation on the format or language/fields in the middle.
I will do like I am doing today
And if I am not sure (can't validate) what I would receive (format of the content/data)
I won't invest in STIX 
I will do it my way

On Monday, 16 November 2015, Jordan, Bret <bret.jordan@bluecoat.com> wrote:
Forgive me in advance, but this is one area where I strongly disagree with the abstract view of the model world.  Talking about things as in English or French is not like dealing with STIX and TAXII.  Sorry...

So lets keep it real....

1) If you try to send an IODEF package to my TAXII server and tell me it is IODEF, then I will reject it and send a TAXII error status message back to you.

2) If you send the IODEF package and mask all of the content bindings and try and tell me it is STIX, then I will try and validate it (if I actually care about such things) or lets say I just send it to the parser  One of two things will then happen:
a) The parser will blow up and crash and as I recover from it I will send you a TAXII error status message
b) The parser will parse what it can and it might only be able to get 1 or 2 fields of data, at which point an exception will happen and I should send you a TAXII error status message

3) Say you send me a STIX package with only half of the STIX package filed out.  Now that is not enough for me to do something with because my internal implementation policy says I need XYZ.  So I will parse it, and figure out there is not enough detail there, and I will send a TAXII error status message

4) Say you send me a STIX package with a LOT of extra stuff in it because you are used to talking to trust GROUP XYZ that can understand the extra fields.  When I parse it one of two things will happen:
a) The parser will crash and I will send you a TAXII error status message
b) The parser will fill you what it can and throw away the rest.  At this point I may send you a TAXII status message or I may not, depending on my implementation / deployment policies. 

Today we have no way of knowing if the end solution will support all of the fields in the STIX package.  That is NOT a problem we can solve for, nor should we solve for it.  And this why getting bend around the axle on believing that we can enforce data markings is foolish.  Once the data leaves your shop and if you do not have an out-of-band legally binding document that says you will listen to it, there is no way to enforce that in desperate code. Someone may write a parser that just strips out the data-markings and throws them on the floor.  So you need the out-of-band legal frameworks, just like you do today.

So lets relate this whole mess to a tried and true standard, IETF TCP and IP.  The IETF specs say that certain fields should be used in certain ways, but this does not guarantee that they will be used that way.  NICs have to be able to handle these conditions and just work.  In fact, abusing TCP/IP has been a great way to perform data exfiltration and other bad stuff for years.  



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Nov 15, 2015, at 23:05, Jerome Athias <athiasjerome@GMAIL.COM> wrote:

Somehow yes
My point is more about validating that you can correctly understand the message that I will send
If I am allowed to send it, let's say in French (assuming that the only language I know), but I am non confident about your translator (parser/interpretor potentially weak), maybe I will decide to not send it at all. Or you could misinterpret it if sent

It's a risk. We just need to see if we accept it or how we deal with it

On Monday, 16 November 2015, Jordan, Bret <bret.jordan@bluecoat.com> wrote:
Well that problem exists regardless and in fact we have that problem today with STIX 1.2 XML.  



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Nov 15, 2015, at 20:26, Jerome Athias <athiasjerome@GMAIL.COM> wrote:

And that could be an issue if I send you "extra data" that are there for you to react and that you ignore them because of your parser. I have no decent confidence that you will do something with my data. (So why trying sending anything...)

On Monday, 16 November 2015, Jordan, Bret <bret.jordan@bluecoat.com> wrote:
Thanks for the feedback... As others have commented, the fact that JSON is less typed than XML is is not necessarily a bad thing, in fact it can be a strength.  Further, the fact that JSON maps better to structs in code, is a huge advantage.  

I do understand the need for schema validation in XML, due to the fact that XML parsers are notoriously bad about handling data they do not expect.  However, from what others have said on this list, and from my own research, this is not as big of a problem for JSON since most JSON parsers do not care if there is extra data or missing data.  



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Nov 15, 2015, at 04:12, Eldan Ben-Haim <ELDAN@il.ibm.com> wrote:


        Been following this in the past few days and thought I'll try to make my first contribution to this list.
        While I personally find it difficult to understand arguments tagging XML as "complex", I am pretty confident that most developers I know would prefer JSON over XML nowadays. I'll allow myself to speculate that the reasons to this are only remotely connected to the technicalities of building software -- but this is really not at all that important. The bottom line, I think, is that going for JSON would have tremendous value around adoption and interoperability.

        However, working extensively with both JSON and XML, I am also well aware that as far as validation/schema is concerned, JSON schema is far, far behind XSD. I would say that JSON schema may fit to some basic validation needs but in the absence of a real typing system it is hardly suitable for specification.

        Over here we have developed an augmented version of JSON schema which I'd be happy to share along with Java based validation code (we also have a full Eclipse feature set that provides assisted editing and validation based on this notation). We can probably contribute both to the community -- but this is yet another specification / standard that we'll need to maintain here and hardly the core of what we're tasked with. If there are any suggestions around this -- let me know. Either way, I think that relying on "vanilla" JSON schema for the specification is going to be a problem given the scale of STIX / TAXII as a specification.



Eldan Ben-Haim
CTO, Trusteer
Software Group, Security Systems

<Mail Attachment.png>

Phone:+972-73-225-4610 | Mobile:+972-54-779-7359
13 Noah Mozes Street
Tel Aviv, TA 67442

From:        Cory Casanave <cory-c@modeldriven.com>
To:        "Jordan, Bret" <bret.jordan@bluecoat.com>
Cc:        "Wunder, John A." <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Date:        11/13/2015 11:13 PM
Subject:        RE: [cti] The Adaptive Object-Model Architectural Style
Sent by:        <cti@lists.oasis-open.org>

I may have been too strong in that statement. I am fine with JSON and having a JSON schema. What I –also- want is that every tag, everywhere, can be easily and directly and with brutal consistency be tied to its formal definition and there is no possibility of term confusion across namespaces. Base JSON does not do this, even with JSON schema. All that seems to be needed is namespace prefixes in the names and something (perhaps a single line) that ties those prefix to referenceable URIs.
This would seem to have zero impact on ease of implementation using the approach you world like, from that perspective it is just a naming convention. From an understandability point of view knowing where to go find a term would seem to be good for everyone. So I think there is a simple and strategic approach that we just need to work out. I would hate to have lost that good-for-everyone solution.
I also hope you are right about the “billions of STIX documents”!
From: Jordan, Bret [mailto:bret.jordan@bluecoat.com]
Friday, November 13, 2015 3:55 PM
Cory Casanave
Wunder, John A.; cti@lists.oasis-open.org
Re: [cti] The Adaptive Object-Model Architectural Style

Your points are valid and well taken.  But this begs a very interesting question...
You say:
My implementations are mostly not hard-coded, I use the metadata provided by models to drive most of the behavior. I would find a “pure structure” specification like JSON Schema very hard to implement, understand or test for interoperability.
So I will respond with, who or what will be the biggest consumers and producers of STIX, if we are successful?  
We should understand the answer to that question and then make sure their lives are not painfully difficult.  It is my believe that it will be software, web applications, app applications, network devices, security products, and analyst tools that we will be vast majority of producers and consumers..  Further it is my belief that if we are successful those tools will be generating and consuming billions of STIX documents a day.  So I want to understand what is the scope of people like you that will find it hard to work with JSON..  
Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
On Nov 13, 2015, at 12:09, Cory Casanave <cory-c@modeldriven.com> wrote:
John had something under his skin, this is mine. Sorry if it is a bit of a soapbox!
My implementations are mostly not hard-coded, I use the metadata provided by models to drive most of the behavior. I would find a “pure structure” specification like JSON Schema very hard to implement, understand or test for interoperability.
There are different “edges” to simple, there are times when an overly simple “foundation” leads to vast complexity – I think the extension mechanisms you don’t like are like that, if these were in the foundation “data model”, it would not be an issue. The one-off extension mechanisms become complex because they are a force-fit to the foundation platform (XSD). Of course there are also times when unneeded complexity obscures an underlying simplicity. So I think “investments” up front to get the foundation data model right will pay-off 10x over the entire process. Solve problems like metadata, versioning, relations, marking, extension, external references, etc. up front and the domain model we all deal with becomes so much simpler, clearer and easier to implement.
Other edges of simplicity are (brutal) consistency and semantic precision. A bunch of special cases for the “edge cases” are the source of 80% of the implementation errors and complexity, when we deny those edge cases up front and then hack them in later, we get inconsistent and hard to implement specifications. Perhaps a bit more effort, and even “complexity” up front can reduce downstream costs and complexity.
I only engaged with STIX last year, it is a very hard road. It is not hard because of the syntax, but because some of the concepts are just not clear, consistent or put together in a way that would make sense (at least to me, but then I’m a bit odd). When you jumble together a data model to solve, what seems like, several very simple and direct problems you can lose what that data really means – this results in time, confusion, anti-operability  and hacks to deal with it. It is very hard to produce a specification that people not involved in the process can pick up, understand, implement and then be interoperable with the rest of the community. It requires more rigor than an internal implementation, and this introduces “complexity”, but one that is necessary for interoperability and supporting a community.
I think we have all seen “replacements” for technologies marketed as simple, then when all the real needs are met it ends up being more complex than what we started with. E.g. I think this happened to SOAP. Of course, we have also seen specs that no human can deal with. It is a tough balance.
Use of standards is also a complexity balance. Most standards are more “complex” than just adding your own specific tags as you initially think of it. On the other hand, they have support, libraries and people who have thought a lot about that one problem. Perhaps some of that perceived complexity will be needed. From a community standpoint, embracing (good) standards is a win.
Another edge of complexity, particularly in specifications, is overly constraining the specification to a perceived solution. As long as I can send you some “statement” and you understand it, we have interoperated. Having a lot of statements that can be made in a spec does not, in itself, make it complex (it may make it large). What keeps it simple is to be able to say things clearly and say just what we need, together, without a lot of baggage. The big difference for interop specs is I really don’t care why you want to know that, if you can make great us of it or anything else – the only thing we should focus on is that the statements we need to make to each other are well formed and well defined. What statements we need to make is our scope. As John said, this may be VERY different from the scope of applications using these interoperability specifications – many people who have built substantial applications have never had to think about the “other guy” in this way.
One final edge of complexity is stakeholder understandability, if we have the cyber threat concepts so intertwined with the technical representation the “real people” (if you know any) will not be able to understand or validate it. A clear separation of concerns has been good architecture since day 0.
So I understand that for you, where you already understand STIX deeply and expect to write code for each tag, pure JSON may be simple. For me it would not be. So keep it simple YES, but simple for real and for newcomers and other ways to use STIX.
Ok, end of soapbox.
From: cti@lists.oasis-open.org[mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Friday, November 13, 2015 1:18 PM
Wunder, John A.
Re: [cti] The Adaptive Object-Model Architectural Style

John this is really well said.  
I feel like we listened to every possible user requirement out there for STIX 1.0 and we tried to create a data-model that could solve every possible use case and corner case regardless of how small.  The one thing we sorely forgot to do is figure out what can developers actually implement in code or what are product managers willing to implement in code.  
Lets make STIX 2.0 something that meets 70-80% of the use cases and can actually be implemented in code by the majority of software development shops.  Yes, I am talking about a STIX Lite.  People can still use STIX 1.x if they want everything.  Over time we can add more and more features to the STIX 2.0 branch as software products that use CTI advance and users can do more and more with it.  
Lets start with JSON + JSON Schema and go from there.  I would love to have to migrate to a binary solution or something that supports RDF in the future because we have SO MUCH demand and there is SO MUCH sharing that we really need to do something.
1) Lets not put the cart before the horse
2) Lets fail fast, and not ride the horse to the glue factory
3) Lets start small and build massive adoption.  
4) Lets make things so easy for development shops to implement that there is no reason for them not to
Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
On Nov 13, 2015, at 08:09, Wunder, John A. <jwunder@mitre.org> wrote:
So I’ve been waiting for a good time to outline this and I guess here is as good a place as any. I’m sure people will disagree, but I’m going to say it anyway :)

Personally I think of these things as four levels:

- User requirements
- Implementations
- Instantiation of the data model (XML, JSON, database schemas, an object model in code, etc)
- Data model

User requirements get supported in running software. Running software uses instantiations of the data model to work with data in support of those user requirements. The data model and specification define the instantiations of the data and describe how to work with them in a standard way.

The important bit here is that there’s always running software between the user and the data model. That software is (likely) a tool that a vendor or open source project supports that contains custom code to work specifically with threat intel. It might be a more generic tool like Palantir or whatever people do RDF stuff with these days. But there’s always something.

This has a couple implications:

- Not all user requirements get met in the data model. It’s perfectly valid to decide not to support something in the data model if we think it’s fine that implementations do it in many different ways. For example, de-duplication: do we need a standard approach or should we let tools decide how to do de-duplication themselves? It’s a user requirement, but that doesn’t mean we need to address it in the specs.

- Some user requirements need to be translated before they get to the data model. For example, versioning: users have lots of needs for versioning. Systems also have requirements for versioning. What we put in the specs needs to consider both of these.

- This is the important part: some user requirements are beyond what software can do today. I would love it if my iphone would get 8 days of battery life. I could write that into some specification. That doesn’t mean it’s going to happen. In CTI, we (rightfully) have our eyes towards this end state where you can do all sorts of awesome things with your threat intel, but just putting it in the data model doesn’t automatically make that happen. We’re still exploring this domain and software can only do so much. So if the people writing software are telling us that the user requirements are too advanced (for now), maybe that means we should hold off on putting it in the data model until it’s something that we can actually implement? In my mind this is where a lot of the complexity in STIX comes from: we identified user requirements to do all these awesome things and so we put them in the data model, but we never considered how or whether software could really implement them. The perfect example here is data markings: users wanted to mark things at the field level, most software isn’t ready for that yet, and so we end up with data markings that are effectively broken in STIX 1.2. This is why many standards bodies have requirements for running code: otherwise the temptation is too great to define specification requirements that are not implementable and you end up with a great spec that nobody will use.

Sorry for the long rant. Been waiting to get that off my chest for awhile (as you can probably tell).


On Nov 13, 2015, at 9:17 AM, Jerome Athias <athiasjerome@GMAIL.COM> wrote:

sorry for the others if off-topic.

Remember that a software is good only if it satisfies the users (meet,
or exceed, their requirements).
You can write 'perfect/optimized' code. If the users are not
satisfied; it's a bad software.

"If you can't explain it simply, you don't understand it well
enough.", Albert Einstein

Challenges are exciting, but sometimes difficult. It's about
motivation and satisfaction.

There is not programming language better than an other (just like OS);
it is just you that can select the best for your needs.

I did a conceptual map for the 'biggest Ruby project of the internet'
(Metasploit Framework), it's just a picture, but represents 100 pages
of documentation.
I think we could optimize (like for a maturity model) our approach of
resolving problems.

2015-11-13 17:02 GMT+03:00 John Anderson <

The list returns my mail, so probably you'll be the only one to get my reply.

Funny, I missed that quote from the document. And it's spot on. As an architect myself, I have built several  "elegant" architectures, only to find that the guys who actually had to use it just. never. quite. got it. (sigh)

My best architectures have emerged when I've written test code first. ("Test-first" really does work.) I've learned that writing code--while applying KISS, DRY and YAGNI--saves me from entering the architecture stratosphere. That's why I ask the architects to express their creations in code, and not only in UML.

I'm pretty vocal about Python, because it's by far the simplest popular language out there today. But this principal applies in any language: If the implementation is hard to explain, it's a bad idea. (Another quote from the Zen of Python.) Our standard has a lot that's hard to explain, esp. to new-comers. How can we simplify, so that it's almost a no-brainer to adopt?

Again, thanks for the article, and the conversation. I really do appreciate your point-of-view,

From: Jerome Athias <
Sent: Friday, November 13, 2015 8:45 AM
To: John Anderson
Subject: Re: [cti] The Adaptive Object-Model Architectural Style

Thanks for the feedback.
Kindly note that I'm not strongly defending this approach for the CTI
TC (at least for now).
Since you're using quotes:
"Architects that develop these types of systems are usually very proud
of them and claim that they are some of the best systems they have
ever developed. However, developers that have to use, extend or
maintain them, usually complain that they are hard to understand and
are not convinced that they are as great as the architect claims."

This, I hope could have our developers just understand
that what they feel difficult sometimes, is not intended to be
difficult per design, but because we are dealing with a complex domain
that the use of abstraction/conceptual approaches/ontology have benefits

Hopefully we can obtain consensus on a good balanced adapted approach.

2015-11-13 16:24 GMT+03:00 John Anderson <

Thanks for the link. I really enjoy those kinds of research papers.

On Page 20, the section "Maintaining the Model" [1] states pretty clearly that this type of architecture is very unwieldy, from an end-user perspective; consequently, it requires a ton of tooling development.

The advantage of such a model is that it's extensible and easily changed. But I'm not convinced that extensibility is really our friend. In my (greatly limited) experience, the extensibility of STIX and CybOX have made them that much harder to use and understand. I'm left wishing for "one obvious way to do things." [2]

If I were given the choice between (1) a very simple data model that's not extensible, but clear and easy to approach and (2) a generic, extensible data model whose extra layers of indirection make it hard to find the actual data, I'd gladly choose the first.

Keeping it simple,

[1] The full wording from "Maintaining the Model":
The observation model is able to store all the metadata using a well-established
mapping to relational databases, but it was not straightforward
for a developer or analyst to put this data into the database. They would
have to learn how the objects were saved in the database as well as the
proper semantics for describing the business rules. A common solution to
this is to develop editors and programming tools to assist users with using
these black-box components [18]. This is part of the evolutionary process of
Adaptive Object-Models as they are in a sense, “Black-Box” frameworks,
and as they mature, they need editors and other support tools to aid in
describing and maintaining the business rules.

[2] From "The Zen of Python":

cti@lists.oasis-open.org<cti@lists.oasis-open.org> on behalf of Jerome Athias <athiasjerome@gmail.com>
Sent: Friday, November 13, 2015 5:20 AM
Subject: [cti] The Adaptive Object-Model Architectural Style


realizing that the community members have different background,
experience, expectations and use of CTI in general, from an high-level
(abstracted/conceptual/ontology oriented) point of view, through a
day-to-day use (experienced) point of view, to a technical
(implementation/code) point of view...
I found this diagram (and document) interesting while easy to read and
potentially adapted to our current effort.
So just wanted to share.



To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]