|Thanks for the feedback. The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue. |
One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called "lang". This would be the language that the object is written in. This would enable tools to select and filter by a language.
Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for example, we have tossed around the idea of creating a "translation" object that could be sent either with the original TLO or separately.
Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:
1) A threat intel provider could issue TLOs in language specific versions if they wanted.
2) A threat intel provider could produce language translations and attach them to the TLO.
3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.
There was some initial concern about this model as some believe it might have issues with versioning. But I do not think so, as you would not want translated objects to auto point to a new version. They would be tied at the hip, to the version that were created for.
The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.
What would you think of something like this?
Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
I guess it depends. But what I see is scenarios like the following: - A Japanese entity receives CTI information pieces in English. The entity determines some of them are important/critical and worth translating them into Japanese, add descriptions in Japanese and redistribute them to other Japanese entities (if redistribution is allowed). The CTIM (CTI Management System) of a receiving party displays the Japanese description whenever possible, while allowing access to the original English descriptions. - Japanese entities produce CTI in Japanese (not in English, surprise!). An entity decides some of them are important/critical and worth translating them into English, add descriptions in English, and redistribute them to other countries (if redistribution is allowed). The CTIM of a receiving party displays the English description if so set, while allowing access to the original Japanese (likely more accurate)
- Will organizations producing threat intelligence produce one incident for each language?
- Or will they produce one big incident that contains all of the languages?
- For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?
- Or would you expect the TAXII server to send you both?
- What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?
Director of Security Architecture and Standards | Office of the CTO PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." Not UTF-8 thing (I understand most of modern programming languages and other standards deal with it correctly). It is about having text fields in multiple languages. For example, descriptions of a package in English and Japanese. The system will pick which language to display based on the language code (“en” or “jp”) in the field. Is it something already discussed in Slack? I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets? JSON gives us UTF-8 by default. So this alone should make things easier for our international friends.. If this is not what you mean. Please explain and give us some context. We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand the problem that we were trying to solve. Can you help us understand the problem? What works, what does not work, what you need it to do and why? I really want to make sure our baby works for everyone. But as I said on Slack, "I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a coke". Director of Security Architecture and Standards | Office of the CTO PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." Is there a place for “Internationalization” of text fields? I would like very much to see it in STIX 2.0 (or CTI Common?) and I am willing to contribute. As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target date for draft STIX 2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.
- February 29th - Run Indicators to the ground. Get these fundamentals worked through to enable us to talk to vendor on the RSA show floor about it. And have something to show them.
- March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor top level abstractions to ground.
- April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run Campaign to ground.
- May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic support (opinions, assertions, hypotheses, etc.) to ground.
- June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming conventions, structure patterns, etc) and quality.
This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is likely to cover patterning and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche. Please let us know if you see any issues with this tranche plan. The first tranche (Indicators) is the most relevant for now as it begins today. This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it. We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it. If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.
To discuss and reach consensus on all in-scope tracker issues for STIX 2.0 that are required to support common indicator use cases.
Target completion date:
February 29, 2016
- Raise and describe the issue with a brief wiki writeup
- Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams, schema,etc clearly marked as a proposal) to the wiki writeup and announce it to the list.
- Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus.
- Discuss key issues on weekly working call.
- If consensus (unanimous or at least no strong objections) reached:
- Capture normative language in pre-draft spec document
- Capture consensus changes in JSON Schema implementation
- Capture consensus changes in UML model
- Capture statement of consensus in issue tracker
- Mark issue tracker as “Consensus Achieved”
- Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion
- If consensus not achieved (strong objection exists) within allowed time window:
- Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone
- Capture current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche
- Decide to either hold formal vote to decide (more likely for core critical issues)
Proposed prioritization/plan for dealing with Indicator tranche issues (as laid out below):
- Very brief comment window (1 week) on all “Consensus asserted” items below and then tie them off
- Tackle CTI Common “Partial consensus asserted” items below
- IDable construct fields
- Source reference approach and fields
- Tackle General STIX & CybOX “Partial consensus asserted” items below
- Tackle Sightings and Indicator structure
- Tackle Patterning (Thinking on this is currently occurring and will not stop. This is only a time set aside for focused discussion.)
- Tackle Versioning (Likely okay if we don’t completely tie this one off)
- Tackle Time range format, Indicator_Type vocab and ability to assert indicator as false positive
- Object ID format and requirement (STIX #301, 221)
- Remove abstract base types for “top-level” objects (STIX #311, 386) (F2F consensus)
- Remove Short_Description (STIX #194) (F2F consensus)
- External_IDs property on all IDable constructs (STIX #358, 187) (F2F consensus)
- Controlled Vocabularies (STIX #141)
- Simplify structure for Controlled Vocabularies (F2F consensus)
- Refactor report object (STIX #385) (F2F consensus)
- Data Markings (STIX #8, 231, 379, 378, 185)
- Discrete Timestamp format (STIX #294)
- Key constructs all extend from a common IDable construct base type (STIX #148)
- Consensus on approach
- Open questions on which fields and names of fields
- Relationships (STIX #291, 201, 139)
- Develop one or more vocabularies for RelationshipType/Relationship (STIX #4)
- Separate Source construct (STIX #233, 263)
- Consensus on approach
- Open questions on how to relate it to content
- Which fields belong on Source?
- Separate fields or leverage ISO 8061 use of “/“ as extension of consensus discrete timestamp approach.
- Separate patterns and instances (STIX #375)
- Add capability for variable substitution in CybOX for patterning (CybOX #317)
- Add capability to incorporate temporal context and ordering into CybOX patterns (CybOX #316)
- Lists in CybOX object fields (CybOX #380)
- Separate Patterns and Instances in CybOX Observables and Objects (CybOX #381)
- Create Separate Patterning Syntax/Language (CybOX #420)
- Determine Patterning Language Operators (CybOX #421)
- Determine Patterning Language Syntax (CybOX #422)
- Indicator Composition (STIX #200)
- Refactor/Deprecate Base DataTypes (CybOX #416)
- Issues around Object Subclassing (CybOX #411)
- Common object refactoring complete
- Flatten all aggregating list layers (STIX #262)
- Flatten all the list types in STIXType STIX #382)
- Refactor TTP (STIX #360) (F2F consensus)
- Kill Chains (STIX #47, 117, 241, 208, 190, 191)
- Consensus asserted
- Partial consensus asserted (some open questions remain)
- Open topics
- Sightings (STIX #306, 359, 240, 198)
- 2-ended-Relationship or 1-ended-assertion?
- Indicator structure (refactoring so that Observable and Test Mechanism are integrated into a single approach)
- Indicator structure simplification (STIX #376)
- Indicator_Type vocab (STIX #243)
- Ability to assert that an indicator is a false positive (STIX #307)