|Since I proposed the idea of this working group 12+ months ago, and begged Eric to run with it, a lot of what I was originally wanting and asking for has now been lost in really weird discussions about the object model. |
So lets rewind 12 months and get back to what I was asking for in the first place...
What I want out of this group is some guidelines and database schemas for developers wanting to write TAXII / STIX / CybOX implementations. Basically, from a backend database standpoint, how do they get started? Which database systems should they use based on various implementation strategies? What should the base configurations be for said databases? And maybe even some example implementations.
For example, say an open source APP developer was going to write a basic STIX/Cybox Indicator/Observable UI that could read data in from a TAXII server, add comments and context and spit it back out, what type of database should he/she use to store the data, and what should it look like.
For very simple things, where you are NOT doing all of STIX and Cybox, maybe a relational database would work fine and infant work really well. So in these cases, it would be nice if there was a .sql file the developer could pull down that would build all of the necessary tables structure for him/her. If they are doing something more complex, maybe they need a document database. Then it would be really great if we produced some documents / papers / implementations guides / and maybe even some examples that could help them get up and running faster.
The problems I see are:
1) STIX is massive and very complex. Just trying to learn it and figure out how much of it you have to implement is a monumental task.
2) Then if you need something in say Objective-C, you have to write the API to support STIX
3) Then yo need to write a basic TAXII system to do what you want.
4) Then you need to figure out what kind of database you are going to use to store the data.
If we could help out on #4, that might just help make things easier for people getting started..
Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
Eric, I definitely agree that there will need to be considerable coordination between STIX and CybOX efforts. I would hope that that is not a surprise to anyone. :-)
Even if it means I’m writing myself out of a ‘chair,’ I agree with Sean. The most important task that people are talking about for the “database subcommittee” is the formal modeling of STIX/CybOX (and to a lesser extent, TAXII). If that has to be a part of
STIX or CybOX SCs, so be it.
The downside is based on the work we have done so far at Georgetown, it is very difficult to build a model considering STIX and CybOX as separate entities.
We agree that these sub-topics have value and should be managed appropriately to ensure they are addressed consistently with minimal impact to other areas and sub-topics.
As the co-chairs for the CTI STIX SC we wanted to express our thoughts on how these various sub-topics might be addressed most effectively.
We propose that issues relevant to specific languages (language specifications at all levels of abstraction (ontology, logical data-model, specific schematic implementations (xml, json, etc.)), database and implementation guidance, etc.) should
be managed from within the appropriate STIX SC or CybOX SC. The concept of the language specifications existing at different levels of abstraction is already part of our way of doing things. For the last year or so we have been working to lift the STIX specification
from just XML Schema to a set of implementation-independent specifications based on a UML model and explanatory textual documents. These specification documents will form the normative basis of the language specification that will be transferred to OASIS.
It has also been the goal to eventually evolve the implementation-independent specification and lift it to a more formal and explicit semantic form. All of these differing levels of abstraction are part of the effort to specify the language. For consistency
sake they should all be part of a single evolutionary thread for the language and not separate parallel efforts. Similarly, specific guidance on database approaches or other implementations would practically be tied to the language they are implemented to
support and as such likely should fall within the scope of the SC working on those languages. Within the language SCs these topics can be broken down and managed using different work products as appropriate.
Issues that tend to be relevant to the broader ecosystem (engagement, interoperability, etc.) may best be managed as separate SCs under the TC.
We believe this approach will yield the best balance between focusing on specific issues, ensuring the right people are involved in the right efforts and achieving consistency across efforts and at this time will likely improve our focus and support
more rapid progress. If at some future time the TC decides that a different approach is needed, it will be possible to modify the approach at that time.
The first CTI STIX SC meeting next week will likely flesh out in a bit more detail how we see this approach taking form for the STIX SC.
We appreciate your consideration of our thoughts on the matter.
Sean Barnum and Aharon Chernin
CTI STIX SC Co-chairs
[+1] "I have not been comfortable with calling this group the “database subcommittee” specifically because it is the data model, not the data model implementation, that needs focus."
[+1] "...once people start looking at terms and concepts from a model perspective instead of XML (or SQL, etc) data structures they discover issues, complexities, simplifications and opportunities that are not very apparent looking at schema. This activity
represents a different viewpoint that when combined with the more “bottom up” implementation and representation concerns makes the specification that much better. For this reason it would be my suggestion that such a viewpoint should drive the vocabulary and
semantics and work in concert with but not be the same as the team that focuses on the best representation and implementation in XML or a DBMS. "
I have not been comfortable with calling this group the “database subcommittee” specifically because it is the data model, not the data model implementation, that needs focus. Cory nails it in one (second paragraph below). In order to build real
data migration tools, you really need to understand what you are migrating. I would offer the first task (as opposed to a parallel sub-subcommittee) is to do the modeling.
That is why we have been working on an OWL model for STIX/CybOX at Georgetown. Our purpose was for a different goal, but the result could be generally useful.
I purposely did not suggest a particular language for expressing the conceptual/logical model as that is a worthy topic of discussion for the group. In the related
OMG activity we are using a profile of UML that adds more semantic capabilities but has the tooling, established base and graphic support of UML. This profile is currently going through the standards process and is then able to generate OWL. You can say 90%
of what you can say in OWL with less complexity. We have also used OWL for other projects as it also has some valuable features, but is also far from perfect. This is a good topic for discussion. But, we get ahead of ourselves, the purpose and scope should
drive such choices.
What I have found in every similar activity is that once people start looking at terms and concepts from a model perspective instead of XML (or SQL, etc) data
structures they discover issues, complexities, simplifications and opportunities that are not very apparent looking at schema. This activity represents a different viewpoint that when combined with the more “bottom up” implementation and representation concerns
makes the specification that much better. For this reason it would be my suggestion that such a viewpoint should drive the vocabulary and semantics and work in concert with but not be the same as the team that focuses on the best representation and implementation
in XML or a DBMS.
In the best scenario the former would then generate the latter based on transformation rules that map the terms, structure and semantics onto the technology framework
of choice. The existing schema provide a valuable resource to start with whereas the models provide a better way to evolve and certainly a better way to support multiple technologies. This then can be considered a candidate strategy for phase 2, it is a different
SDLC than starting with XML schema. Coming to consensus on our approach and SDLC should, perhaps, precede forming subcommittees to start the work.
Building on Cory's suggestion... Jerome's observations... and Sean's note about using OWL or RDFS....
Would it make sense to establish a Sub-Committee that combines some of the issues associated with database design that have been discussed previously (RDBMS vs. NoSQL) with this need for clarification at the abstract level (conceptual & logical)?
If so.... would the scope of such a Sub-Committee also cover implementation and tooling issues as was earlier suggested by Patrick?
Further, what would be the tangible outputs, and how would they map to the STIX/TAXII/ & CYBOX Sub-Committees?
Jane Ginn, MSIA, MRP
Cyber Threat Intelligence Network, Inc.
-------- Original Message --------
From: "Barnum, Sean D." <firstname.lastname@example.org>
Sent: Wednesday, June 24, 2015 10:41 AM
To: Jerome Athias <email@example.com>,Cory Casanave <firstname.lastname@example.org>
Subject: Re: [cti] Database Subcommittee / conceptual/logical model subcommittee
CC: Eric Burger <Eric.Burger@georgetown.edu>,"email@example.com "
I just wanted to add a note of clarification here for the intent/scope of STIX and CybOX to date.
STIX and CybOX are intended to be Languages for expressing cyber threat information and cyber observable information respectively.
As such, they are more than simple data models or schemas. They also involve the conceptual model for their scope.
To date, the emergent and exploratory nature of this community seeking not only to formalize expressive representations for cyber threat information but to work collaboratively and iteratively to even figure out what that meant led to some necessary choices
to work from the bottom up.
This is why the language has initially been developed, refined and defined in the form of XML schema. The schematic level of abstraction gave us something concrete to discuss, model specific technical details and to experiment with real world data and implementations
in order to iterate and improve. XML schema was chosen not because it is some magical answer that everyone everywhere should use but rather because it is ubiquitous, supported by a mature body of tooling and synergistic standards (XPATH, Xpointer, Xquery,
etc.) and provides a powerful formal schema language to explicitly constrain syntax while enabling necessary flexibility. All of these things were needed to model and evolve a representation of an emergent knowledge space among a very diverse set of players.
This approach served us well to successfully get us where we are today but it has always been recognized that specifying the language at this level of abstraction has significant downsides. First, it is difficult to define semantics and high level concepts
effectively at this level and choosing any particular technical implementation (XML, JSON, etc.) inherently introduces technology-specific characteristics that really are not part of the more generalized language.
In recognition of this, it has always been the plan to move the specification of the languages to a more general form once an appropriate level of maturity and stability had been reached (very similar to the plan to move to a formal standards body at the appropriate
time). The first steps toward this were put into motion several months back when work began on an implementation independent specification for STIX and a separate but related one for CybOX. It was decided that based on community needs and maturity the appropriate
first step in generalization would be to capture language structure and syntax in the form of a UML model that would be accompanied by a set of textual specifications to explain and characterize the UML model in a more human consumable form. The draft set
of these specifications for STIX 1.1.1 are currently available in the STIXProject on github
the updated versions to STIX 1.2 should be completed within the next couple weeks. This will be the primary normative contribution to the CTI TC. There is a UML model for CybOX also available but the set of accompanying full textual specs similar to STIX will
not be created before transition to the CTI TC so that work will likely fall to the CybOX SC.
While UML models are formal and are abstracted from particular syntactic implementations (XML, JSON, etc.), they are not in all honesty really built to convey high-level conceptual models or explicit semantics of knowledge. They can be somewhat twisted to serve
this purpose (as we have done in the implementation independent specs) but the fact that they were designed to serve a systems engineering rather than knowledge engineering purpose leads to some shortcomings. The inability of UML models to effectively convey
high-level conceptual models and explicit knowledge semantics in a formal fashion is one of the key reasons the textual specification documents are required in addition to the UML. They not only provide more human-consumable characterizations of what is in
the UML but they are also needed to explain semantics that cannot effectively be expressed in the UML. The upside is that some of these semantics can now be explicit in the documents but it is in an informal form and still open to human interpretation. What
is ultimately needed for the language specs is a way to formally express the full range of language semantics and structure.
I have personally asserted for a long while, and I know many in the community agree, that the long term solution for specifications of the languages is to define and express them using mechanisms purpose built to define languages like this. That is, utilizing
semantic forms of specification such as OWL and RDFS. These forms while less familiar to many (part of the reason we decided to work from the bottom up) provide a way to clearly, explicitly, unambiguously and formally specify the high-level conceptual model
for the languages, directly map it to any number of more detailed conceptual models, and then directly map it to specific syntactic/schematic representations (logical models).
Many members of the community have been eager to begin working at this level but it was deemed important to first complete the abstraction work to the UML/textual specification level to serve as a XML-bias-free basis for initial semantic modeling. I propose
that some of the CTI TCs early work should be focused on these activities. In fact, I would fairly strongly assert that many of the refactoring issues on the table for STIX 2.0 (e.g., abstraction of several embedded structures (relationships, sources, assets,
victims, etc.) to separate constructs) will require semantic modeling in order to fully understand and get right. I think the semantic discussions and modeling as part of these activities could serve as some great initial steps towards more formal specifications
for the languages that serve not only better integration for each language across abstraction levels (conceptual to logical) but also better alignment and integration with related information representations within the cyber security sphere (MAEC, CAPEC, CVRF,
OVAL, OpenIOC, etc.) and outside the cyber security sphere.
So, that was a long contextual way of saying that I strongly agree with the need to understand and specify these languages across the abstraction spectrum (conceptual to lexical) but strongly feel that this should/must be done within the context of each language
(I.e. within the STIX and CybOX SCs with cross coordination via the TC) rather than as a separate activity.
I'm a great fan of conceptual models!
I skipped this step while reading the specifications to go directly to
a data relational model, but I can see a lot of benefits producing a
CMap, especially for new adopters (just because one picture can tell
thousands of words). It's easy to share also (e.g. CmapTools)
The issue that I think we would encounter, is not so much about the
level of abstraction (multiple CMaps could resolve that), while there
is not so much concepts there (in CTI). (I used to do CMap for complex
It is mainly, AGAIN, related to the taxonomy.
You could see that when dealing with the extensions points, figuring
out what would be the most appropriate standard/specification to map
CTI to. Things that are around CTI and that you have to deal with,
such like Assets, Vulnerabilities, Exploits, Shellcodes, etc.
But I assure you that it's fascinating ;)
And while all these things are somehow linked together, it makes quite
difficult to make choice to -split- this into multiple models.
(you could look at it in many ways, like asset-centric, risk-based,
There is certainly a value in a DBMS capability, perhaps one that can be implemented across multiple technologies. This may then also relate to the "conceptual model" initiatives which have already started. A conceptual model can bridge the exchange and repository
viewpoints and also allow for greater flexibility in implementation technologies. We have had great success in generating schema as well as transformations between them from models.
With this in mind perhaps a conceptual and/or logical model subcommittee should be considered. Depending on the approach this could provide some of the value that is being sought for the database. A separation of concerns would allow for the definition of the
database in models with implementation in one or more chosen technologies. Such implementation would probably be another activity.
There is some grey area in what people call conceptual and logical models and the levels of abstraction each represents. For me (and many others), a conceptual model is a model of how the world is understood - it is then a model of the terms and concepts of
the world, not a data model. An "instance" of a person in a conceptual model is a real person - not data. A logical model is then a technology independent data model about the world where choices are made as to structure and representation. An "instance" of
a person in a logical model is data. An initial activity of a conceptual/logical model subcommittee could be to define the purpose, scope and appropriate level of abstraction.
Of course the model activity is just as relevant to the exchange schema and can help make them more understandable as well as provide a basis for support of other technologies (essentially a model driven architecture approach). This works best when the models
are the normative definition and technology schema are generated from them. Since this tends to introduce more change (as well as more consistency), it would best be coupled with the second phase.
There has already been work on conceptual models this direction seems consistent with the communities direction. With the above in mind we may want to consider a conceptual and/or logical model subcommittee.
Sent: Wednesday, June 24, 2015 7:06 AM
Subject: Re: [cti] Database Subcommittee
I wonder if providing consumer-oriented XQuery examples (maybe with the STIX idioms) would help providing guidance and test/validation cases
Jerome (as he often does) gets this right in one (how about that - use a British colloquialism instead of a US one!).
We just submitted a paper for publication at MILCOM looking at STIX/TAXII/CybOX versus IODEF/RID from the perspective of humans versus machines doing the processing. My guess is you can guess the end of the story: STIX/TAXII/CybOX is much better for machines.
IODEF/RID is much better for people. Since the goal is for inter-machine communication, you get the point.
It does mean there is a lot riding on VERY clear, implementable, interoperable specifications. Debugging this stuff is going to be a nightmare, more especially if the language is so nuanced there are dozens of ways of saying the same thing.
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at:
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at: