cti message

Subject: Re: [cti] Database Subcommittee / conceptual/logical model subcommittee
From: Jerome Athias <athiasjerome@gmail.com>
To: Patrick Maroney <Pmaroney@specere.org>
Date: Mon, 6 Jul 2015 17:00:32 +0300
Just want to point out a good glossary found in Appendix A of DRM 2
https://www.whitehouse.gov/sites/default/files/omb/assets/egov_docs/DRM_2_0_Final.pdf


2015-07-06 8:03 GMT+03:00 Patrick Maroney <Pmaroney@specere.org>:
> [+1]   "I have not been comfortable with calling this group the “database
> subcommittee” specifically because it is the data model, not the data model
> implementation, that needs focus."
>
> [+1]  "...once people start looking at terms and concepts from a model
> perspective instead of XML (or SQL, etc) data structures they discover
> issues, complexities, simplifications and opportunities that are not very
> apparent looking at schema. This activity represents a different viewpoint
> that when combined with the more “bottom up” implementation and
> representation concerns makes the specification that much better. For this
> reason it would be my suggestion that such a viewpoint should drive the
> vocabulary and semantics and work in concert with but not be the same as the
> team that focuses on the best representation and implementation in XML or a
> DBMS. "
>
> Patrick Maroney
> Office: (856)983-0001
> Cell: (609)841-5104
> pmaroney@specere.org
> ________________________________
> From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Eric
> Burger <Eric.Burger@georgetown.edu>
> Sent: Sunday, July 5, 2015 3:17:55 AM
> To: cti@lists.oasis-open.org
>
> Subject: Re: [cti] Database Subcommittee / conceptual/logical model
> subcommittee
>
> I have not been comfortable with calling this group the “database
> subcommittee” specifically because it is the data model, not the data model
> implementation, that needs focus. Cory nails it in one (second paragraph
> below). In order to build real data migration tools, you really need to
> understand what you are migrating. I would offer the first task (as opposed
> to a parallel sub-subcommittee) is to do the modeling.
>
> That is why we have been working on an OWL model for STIX/CybOX at
> Georgetown. Our purpose was for a different goal, but the result could be
> generally useful.
>
> On Jun 24, 2015, at 4:45 PM, Cory Casanave <cory-c@MODELDRIVEN.COM> wrote:
>
> Team,
> I purposely did not suggest a particular language for expressing the
> conceptual/logical model as that is a worthy topic of discussion for the
> group. In the related OMG activity we are using a profile of UML that adds
> more semantic capabilities but has the tooling, established base and graphic
> support of UML. This profile is currently going through the standards
> process and is then able to generate OWL. You can say 90% of what you can
> say in OWL with less complexity. We have also used OWL for other projects as
> it also has some valuable features, but is also far from perfect.  This is a
> good topic for discussion. But, we get ahead of ourselves, the purpose and
> scope should drive such choices.
>
> What I have found in every similar activity is that once people start
> looking at terms and concepts from a model perspective instead of XML (or
> SQL, etc) data structures they discover issues, complexities,
> simplifications and opportunities that are not very apparent looking at
> schema. This activity represents a different viewpoint that when combined
> with the more “bottom up” implementation and representation concerns makes
> the specification that much better. For this reason it would be my
> suggestion that such a viewpoint should drive the vocabulary and semantics
> and work in concert with but not be the same as the team that focuses on the
> best representation and implementation in XML or a DBMS.
>
> In the best scenario the former would then generate the latter based on
> transformation rules that map the terms, structure and semantics onto the
> technology framework of choice. The existing schema provide a valuable
> resource to start with whereas the models provide a better way to evolve and
> certainly a better way to support multiple technologies. This then can be
> considered a candidate strategy for phase 2, it is a different SDLC than
> starting with XML schema. Coming to consensus on our approach and SDLC
> should, perhaps, precede forming subcommittees to start the work.
>
> Regards,
> Cory Casanave
>
> From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf
> Of Jane Ginn - jg@ctin.us
> Sent: Wednesday, June 24, 2015 2:05 PM
> To: sbarnum@mitre.org; Jerome Athias; Cory Casanave
> Cc: Eric.Burger@georgetown.edu; cti@lists.oasis-open.org
> Subject: Re: [cti] Database Subcommittee / conceptual/logical model
> subcommittee
>
>
> All:
>
> Building on Cory's suggestion... Jerome's observations... and Sean's note
> about using OWL or RDFS....
>
> Would it make sense to establish a Sub-Committee that combines some of the
> issues associated with database design that have been discussed previously
> (RDBMS vs. NoSQL) with this need for clarification at the abstract level
> (conceptual & logical)?
>
> If so.... would the scope of such a Sub-Committee also cover implementation
> and tooling issues as was earlier suggested by Patrick?
>
> Further, what would be the tangible outputs, and how would they map to the
> STIX/TAXII/ & CYBOX Sub-Committees?
>
> Jane Ginn, MSIA, MRP
> Cyber Threat Intelligence Network, Inc.
> jg@ctin.us
>
>
>
> -------- Original Message --------
> From: "Barnum, Sean D." <sbarnum@mitre.org>
> Sent: Wednesday, June 24, 2015 10:41 AM
> To: Jerome Athias <athiasjerome@gmail.com>,Cory Casanave
> <cory-c@modeldriven.com>
> Subject: Re: [cti] Database Subcommittee / conceptual/logical model
> subcommittee
> CC: Eric Burger <Eric.Burger@georgetown.edu>,"cti@lists.oasis-open.org "
> <cti@lists.oasis-open.org>
>
> I just wanted to add a note of clarification here for the intent/scope of
> STIX and CybOX to date.
> STIX and CybOX are intended to be Languages for expressing cyber threat
> information and cyber observable information respectively.
> As such, they are more than simple data models or schemas. They also involve
> the conceptual model for their scope.
> To date, the emergent and exploratory nature of this community seeking not
> only to formalize expressive representations for cyber threat information
> but to work collaboratively and iteratively to even figure out what that
> meant led to some necessary choices to work from the bottom up.
>
> This is why the language has initially been developed, refined and defined
> in the form of XML schema. The schematic level of abstraction gave us
> something concrete to discuss, model specific technical details and to
> experiment with real world data and implementations in order to iterate and
> improve. XML schema was chosen not because it is some magical answer that
> everyone everywhere should use but rather because it is ubiquitous,
> supported by a mature body of tooling and synergistic standards (XPATH,
> Xpointer, Xquery, etc.) and provides a powerful formal schema language to
> explicitly constrain syntax while enabling necessary flexibility. All of
> these things were needed to model and evolve a representation of an emergent
> knowledge space among a very diverse set of players.
>
> This approach served us well to successfully get us where we are today but
> it has always been recognized that specifying the language at this level of
> abstraction has significant downsides. First, it is difficult to define
> semantics and high level concepts effectively at this level and choosing any
> particular technical implementation (XML, JSON, etc.) inherently introduces
> technology-specific characteristics that really are not part of the more
> generalized language.
>
> In recognition of this, it has always been the plan to move the
> specification of the languages to a more general form once an appropriate
> level of maturity and stability had been reached (very similar to the plan
> to move to a formal standards body at the appropriate time). The first steps
> toward this were put into motion several months back when work began on an
> implementation independent specification for STIX and a separate but related
> one for CybOX. It was decided that based on community needs and maturity the
> appropriate first step in generalization would be to capture language
> structure and syntax in the form of a UML model that would be accompanied by
> a set of textual specifications to explain and characterize the UML model in
> a more human consumable form. The draft set of these specifications for STIX
> 1.1.1 are currently available in the STIXProject on github and the updated
> versions to STIX 1.2 should be completed within the next couple weeks. This
> will be the primary normative contribution to the CTI TC. There is a UML
> model for CybOX also available but the set of accompanying full textual
> specs similar to STIX will not be created before transition to the CTI TC so
> that work will likely fall to the CybOX SC.
>
> While UML models are formal and are abstracted from particular syntactic
> implementations (XML, JSON, etc.), they are not in all honesty really built
> to convey high-level conceptual models or explicit semantics of knowledge.
> They can be somewhat twisted to serve this purpose (as we have done in the
> implementation independent specs) but the fact that they were designed to
> serve a systems engineering rather than knowledge engineering purpose leads
> to some shortcomings. The inability of UML models to effectively convey
> high-level conceptual models and explicit knowledge semantics in a formal
> fashion is one of the key reasons the textual specification documents are
> required in addition to the UML. They not only provide more human-consumable
> characterizations of what is in the UML but they are also needed to explain
> semantics that cannot effectively be expressed in the UML. The upside is
> that some of these semantics can now be explicit in the documents but it is
> in an informal form and still open to human interpretation. What is
> ultimately needed for the language specs is a way to formally express the
> full range of language semantics and structure.
>
> I have personally asserted for a long while, and I know many in the
> community agree, that the long term solution for specifications of the
> languages is to define and express them using mechanisms purpose built to
> define languages like this. That is, utilizing semantic forms of
> specification such as OWL and RDFS. These forms while less familiar to many
> (part of the reason we decided to work from the bottom up) provide a way to
> clearly, explicitly, unambiguously and formally specify the high-level
> conceptual model for the languages, directly map it to any number of more
> detailed conceptual models, and then directly map it to specific
> syntactic/schematic representations (logical models).
> Many members of the community have been eager to begin working at this level
> but it was deemed important to first complete the abstraction work to the
> UML/textual specification level to serve as a XML-bias-free basis for
> initial semantic modeling. I propose that some of the CTI TCs early work
> should be focused on these activities. In fact, I would fairly strongly
> assert that many of the refactoring issues on the table for STIX 2.0 (e.g.,
> abstraction of several embedded structures (relationships, sources, assets,
> victims, etc.) to separate constructs) will require semantic modeling in
> order to fully understand and get right. I think the semantic discussions
> and modeling as part of these activities could serve as some great initial
> steps towards more formal specifications for the languages that serve not
> only better integration for each language across abstraction levels
> (conceptual to logical) but also better alignment and integration with
> related information representations within the cyber security sphere (MAEC,
> CAPEC, CVRF, OVAL, OpenIOC, etc.) and outside the cyber security sphere.
>
>
> So, that was a long contextual way of saying that I strongly agree with the
> need to understand and specify these languages across the abstraction
> spectrum (conceptual to lexical) but strongly feel that this should/must be
> done within the context of each language (I.e. within the STIX and CybOX SCs
> with cross coordination via the TC) rather than as a separate activity.
>
> Sean
>
> On 6/24/15, 11:39 AM, "Jerome Athias" <athiasjerome@gmail.com> wrote:
>
>
> I'm a great fan of conceptual models!
>
> I skipped this step while reading the specifications to go directly to
> a data relational model, but I can see a lot of benefits producing a
> CMap, especially for new adopters (just because one picture can tell
> thousands of words). It's easy to share also (e.g. CmapTools)
>
> The issue that I think we would encounter, is not so much about the
> level of abstraction (multiple CMaps could resolve that), while there
> is not so much concepts there (in CTI). (I used to do CMap for complex
> systems)
> It is mainly, AGAIN, related to the taxonomy.
> You could see that when dealing with the extensions points, figuring
> out what would be the most appropriate standard/specification to map
> CTI to. Things that are around CTI and that you have to deal with,
> such like Assets, Vulnerabilities, Exploits, Shellcodes, etc.
> But I assure you that it's fascinating ;)
> And while all these things are somehow linked together, it makes quite
> difficult to make choice to -split- this into multiple models.
> (you could look at it in many ways, like asset-centric, risk-based,
> vulnerability-based, etc.)
>
> My 2c
>
>
> 2015-06-24 18:18 GMT+03:00 Cory Casanave <cory-c@modeldriven.com>:
>
> There is certainly a value in a DBMS capability, perhaps one that can be
> implemented across multiple technologies. This may then also relate to the
> "conceptual model" initiatives which have already started. A conceptual
> model can bridge the exchange and repository viewpoints and also allow for
> greater flexibility in implementation technologies. We have had great
> success in generating schema as well as transformations between them from
> models.
>
> With this in mind perhaps a conceptual and/or logical model subcommittee
> should be considered. Depending on the approach this could provide some of
> the value that is being sought for the database. A separation of concerns
> would allow for the definition of the database in models with implementation
> in one or more chosen technologies. Such implementation would probably be
> another activity.
>
> There is some grey area in what people call conceptual and logical models
> and the levels of abstraction each represents. For me (and many others), a
> conceptual model is a model of how the world is understood - it is then a
> model of the terms and concepts of the world, not a data model. An
> "instance" of a person in a conceptual model is a real person - not data. A
> logical model is then a technology independent data model about the world
> where choices are made as to structure and representation. An "instance" of
> a person in a logical model is data. An initial activity of a
> conceptual/logical model subcommittee could be to define the purpose, scope
> and appropriate level of abstraction.
>
> Of course the model activity is just as relevant to the exchange schema and
> can help make them more understandable as well as provide a basis for
> support of other technologies (essentially a model driven architecture
> approach).  This works best when the models are the normative definition and
> technology schema are generated from them. Since this tends to introduce
> more change (as well as more consistency), it would best be coupled with the
> second phase.
>
> There has already been work on conceptual models this direction seems
> consistent with the communities direction. With the above in mind we may
> want to consider a conceptual and/or logical model subcommittee.
>
> Regards,
> Cory Casanave
> Representing OMG
>
>
> -----Original Message-----
> From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf
> Of Jerome Athias
> Sent: Wednesday, June 24, 2015 7:06 AM
> To: Eric Burger
> Cc: cti@lists.oasis-open.org
> Subject: Re: [cti] Database Subcommittee
>
> I wonder if providing consumer-oriented XQuery examples (maybe with the STIX
> idioms) would help providing guidance and test/validation cases
>
>
> 2015-06-22 14:20 GMT+03:00 Eric Burger <Eric.Burger@georgetown.edu>:
>
> Jerome (as he often does) gets this right in one (how about that - use a
> British colloquialism instead of a US one!).
>
> We just submitted a paper for publication at MILCOM looking at
> STIX/TAXII/CybOX versus IODEF/RID from the perspective of humans versus
> machines doing the processing. My guess is you can guess the end of the
> story: STIX/TAXII/CybOX is much better for machines. IODEF/RID is much
> better for people. Since the goal is for inter-machine communication, you
> get the point.
>
> It does mean there is a lot riding on VERY clear, implementable,
> interoperable specifications. Debugging this stuff is going to be a
> nightmare, more especially if the language is so nuanced there are dozens of
> ways of saying the same thing.
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
References:
- Re: [cti] Database Subcommittee / conceptual/logical model subcommittee
  - From: Eric Burger <Eric.Burger@georgetown.edu>
- Re: [cti] Database Subcommittee / conceptual/logical model subcommittee
  - From: Patrick Maroney <Pmaroney@Specere.org>