Building on Cory's suggestion... Jerome's observations... and Sean's note about using OWL or RDFS....
Would it make sense to establish a Sub-Committee that combines some of the issues associated with database design that have been discussed previously (RDBMS vs. NoSQL) with this need for clarification at the abstract level (conceptual & logical)?
If so.... would the scope of such a Sub-Committee also cover implementation and tooling issues as was earlier suggested by Patrick?
Further, what would be the tangible outputs, and how would they map to the STIX/TAXII/ & CYBOX Sub-Committees?
Jane Ginn, MSIA, MRP
Cyber Threat Intelligence Network, Inc.
-------- Original Message --------
From: "Barnum, Sean D." <firstname.lastname@example.org
Sent: Wednesday, June 24, 2015 10:41 AM
To: Jerome Athias <email@example.com
>,Cory Casanave <firstname.lastname@example.org
Subject: Re: [cti] Database Subcommittee / conceptual/logical model subcommittee
CC: Eric Burger <Eric.Burger@georgetown.edu
I just wanted to add a note of clarification here for the intent/scope of STIX and CybOX to date.
STIX and CybOX are intended to be Languages for expressing cyber threat information and cyber observable information respectively.
As such, they are more than simple data models or schemas. They also involve the conceptual model for their scope.
To date, the emergent and exploratory nature of this community seeking not only to formalize expressive representations for cyber threat information but to work collaboratively and iteratively to even figure out what that meant led to some necessary choices
to work from the bottom up.
This is why the language has initially been developed, refined and defined in the form of XML schema. The schematic level of abstraction gave us something concrete to discuss, model specific technical details and to experiment with real world data and
implementations in order to iterate and improve. XML schema was chosen not because it is some magical answer that everyone everywhere should use but rather because it is ubiquitous, supported by a mature body of tooling and synergistic standards (XPATH, Xpointer,
Xquery, etc.) and provides a powerful formal schema language to explicitly constrain syntax while enabling necessary flexibility. All of these things were needed to model and evolve a representation of an emergent knowledge space among a very diverse set of
This approach served us well to successfully get us where we are today but it has always been recognized that specifying the language at this level of abstraction has significant downsides. First, it is difficult to define semantics and high level concepts
effectively at this level and choosing any particular technical implementation (XML, JSON, etc.) inherently introduces technology-specific characteristics that really are not part of the more generalized language.
In recognition of this, it has always been the plan to move the specification of the languages to a more general form once an appropriate level of maturity and stability had been reached (very similar to the plan to move to a formal standards body at the
appropriate time). The first steps toward this were put into motion several months back when work began on an implementation independent specification for STIX and a separate but related one for CybOX. It was decided that based on community needs and maturity
the appropriate first step in generalization would be to capture language structure and syntax in the form of a UML model that would be accompanied by a set of textual specifications to explain and characterize the UML model in a more human consumable form.
The draft set of these specifications for STIX 1.1.1 are currently available in the
STIXProject on github
and the updated versions to STIX 1.2 should be completed within the next couple weeks. This will be the primary normative contribution to the CTI TC. There is a UML model for
CybOX also available but the set of accompanying full textual specs similar to STIX will not be created before transition to the CTI TC so that work will likely fall to the CybOX SC.
While UML models are formal and are abstracted from particular syntactic implementations (XML, JSON, etc.), they are not in all honesty really built to convey high-level conceptual models or explicit semantics of knowledge. They can be somewhat twisted
to serve this purpose (as we have done in the implementation independent specs) but the fact that they were designed to serve a systems engineering rather than knowledge engineering purpose leads to some shortcomings. The inability of UML models to effectively
convey high-level conceptual models and explicit knowledge semantics in a formal fashion is one of the key reasons the textual specification documents are required in addition to the UML. They not only provide more human-consumable characterizations of what
is in the UML but they are also needed to explain semantics that cannot effectively be expressed in the UML. The upside is that some of these semantics can now be explicit in the documents but it is in an informal form and still open to human interpretation.
What is ultimately needed for the language specs is a way to formally express the full range of language semantics and structure.
I have personally asserted for a long while, and I know many in the community agree, that the long term solution for specifications of the languages is to define and express them using mechanisms purpose built to define languages like this. That is, utilizing
semantic forms of specification such as OWL and RDFS. These forms while less familiar to many (part of the reason we decided to work from the bottom up) provide a way to clearly, explicitly, unambiguously and formally specify the high-level conceptual model
for the languages, directly map it to any number of more detailed conceptual models, and then directly map it to specific syntactic/schematic representations (logical models).
Many members of the community have been eager to begin working at this level but it was deemed important to first complete the abstraction work to the UML/textual specification level to serve as a XML-bias-free basis for initial semantic modeling. I propose
that some of the CTI TCs early work should be focused on these activities. In fact, I would fairly strongly assert that many of the refactoring issues on the table for STIX 2.0 (e.g., abstraction of several embedded structures (relationships, sources, assets,
victims, etc.) to separate constructs) will require semantic modeling in order to fully understand and get right. I think the semantic discussions and modeling as part of these activities could serve as some great initial steps towards more formal specifications
for the languages that serve not only better integration for each language across abstraction levels (conceptual to logical) but also better alignment and integration with related information representations within the cyber security sphere (MAEC, CAPEC, CVRF,
OVAL, OpenIOC, etc.) and outside the cyber security sphere.
So, that was a long contextual way of saying that I strongly agree with the need to understand and specify these languages across the abstraction spectrum (conceptual to lexical) but strongly feel that this should/must be done within the context of each
language (I.e. within the STIX and CybOX SCs with cross coordination via the TC) rather than as a separate activity.
I'm a great fan of conceptual models!
I skipped this step while reading the specifications to go directly to
a data relational model, but I can see a lot of benefits producing a
CMap, especially for new adopters (just because one picture can tell
thousands of words). It's easy to share also (e.g. CmapTools)
The issue that I think we would encounter, is not so much about the
level of abstraction (multiple CMaps could resolve that), while there
is not so much concepts there (in CTI). (I used to do CMap for complex
It is mainly, AGAIN, related to the taxonomy.
You could see that when dealing with the extensions points, figuring
out what would be the most appropriate standard/specification to map
CTI to. Things that are around CTI and that you have to deal with,
such like Assets, Vulnerabilities, Exploits, Shellcodes, etc.
But I assure you that it's fascinating ;)
And while all these things are somehow linked together, it makes quite
difficult to make choice to -split- this into multiple models.
(you could look at it in many ways, like asset-centric, risk-based,
There is certainly a value in a DBMS capability, perhaps one that can be implemented across multiple technologies. This may then also relate to the "conceptual model" initiatives which have already started. A conceptual model can bridge the exchange and
repository viewpoints and also allow for greater flexibility in implementation technologies. We have had great success in generating schema as well as transformations between them from models.
With this in mind perhaps a conceptual and/or logical model subcommittee should be considered. Depending on the approach this could provide some of the value that is being sought for the database. A separation of concerns would allow for the definition
of the database in models with implementation in one or more chosen technologies. Such implementation would probably be another activity.
There is some grey area in what people call conceptual and logical models and the levels of abstraction each represents. For me (and many others), a conceptual model is a model of how the world is understood - it is then a model of the terms and concepts
of the world, not a data model. An "instance" of a person in a conceptual model is a real person - not data. A logical model is then a technology independent data model about the world where choices are made as to structure and representation. An "instance"
of a person in a logical model is data. An initial activity of a conceptual/logical model subcommittee could be to define the purpose, scope and appropriate level of abstraction.
Of course the model activity is just as relevant to the exchange schema and can help make them more understandable as well as provide a basis for support of other technologies (essentially a model driven architecture approach). This works best when the
models are the normative definition and technology schema are generated from them. Since this tends to introduce more change (as well as more consistency), it would best be coupled with the second phase.
There has already been work on conceptual models this direction seems consistent with the communities direction. With the above in mind we may want to consider a conceptual and/or logical model subcommittee.
Sent: Wednesday, June 24, 2015 7:06 AM
To: Eric Burger
Subject: Re: [cti] Database Subcommittee
I wonder if providing consumer-oriented XQuery examples (maybe with the STIX idioms) would help providing guidance and test/validation cases
Jerome (as he often does) gets this right in one (how about that - use a British colloquialism instead of a US one!).
We just submitted a paper for publication at MILCOM looking at STIX/TAXII/CybOX versus IODEF/RID from the perspective of humans versus machines doing the processing. My guess is you can guess the end of the story: STIX/TAXII/CybOX is much better for machines.
IODEF/RID is much better for people. Since the goal is for inter-machine communication, you get the point.
It does mean there is a lot riding on VERY clear, implementable, interoperable specifications. Debugging this stuff is going to be a nightmare, more especially if the language is so nuanced there are dozens of ways of saying the same thing.
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at:
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at: