Re: [cti] Idea for Internationalization

Further, you loose the ability to track the original source. Also, every time we require some 3rd party to rev the original TLO, we run the risk of losing data markings and other key things.

Options 3 allows for the creation of tiny add-on objects that contain just the fields that one might translate. Option 4, allows you to create effectively a dict of the object and translate any fields you want.

Thanks,

Bret

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Feb 2, 2016, at 05:25, Wunder, John A. <jwunder@mitre.org> wrote:

I think this option is what Bret was describing as #1. As he says, there’s a few downsides: first, it’s more complicated for single language content (not a huge deal) and, second, it doesn’t allow for translations by someone other than the individual producer (since they can’t revise the object). They’d have to use some relationship approach, which would give us two ways of doing it and is exactly what we want to avoid. I initially thought this might be uncommon, but it’s exactly what Chris just outlined as a use case.

For that reason I prefer option #3 from Bret’s e-mail. It supports both original translations from the producer as well as shared third-party translations. It doesn’t complicated things for single language content and it doesn’t have the relationship tracking downsides that Terry mentions below because it’s not a separate TLO (well, it is, but of a special type).

It would look something like this:

[
{
“type”: “indicator”,
“id”: “indicator—UUID”,
“lang”: “en-US”,
“title”: “English title”,
},
{
“type”: “indicator-translation”,
“id”: “indicator-translation—UUID”,
“object_ref”: “indicator—UUID”
“lang”: “jp”,
“title”: “Japanese Title”
}
]

The second object (the translation) would also need a way to point to a specific revision (if we use a different approach than relationships) and the producer, but we don’t have consensus on that so I omitted it. Option 4 would look the same, by the way, except the type would be “translation” and we wouldn’t have individual schemas specifying which fields are translatable.

Because the indicator is the only indicator TLO there, all domain relationships would go through it as the original.

John

From: <cti@lists.oasis-open.org> on behalf of Terry MacDonald <terry@soltra.com>
Date: Tuesday, February 2, 2016 at 5:37 AM
To: "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>, Chris Ricard <cricard@fsisac.us>
Cc: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: RE: [cti] Idea for Internationalization

Hi Bret / Ryu / All,

Embedding the different translations within a single TLO object (as Ryu has suggested) is my preferred option. If we use a relationship to join the two different translations of the same information, then we end up with two bits of information that are effectively the same thing. This causes problems when third-parties link their threat intel to the two translations of the same information.

Lets imagine OrgA creates IncidentA(en) and IncidentA(jp) and relates them together using translation-of relationship. OrgX has CampaignX(en) that they want to relate to the Incidents that they’ve seen.
·         It sizeably increases the size of data stored and moved. There are duplicate fields in both objects, and we require a separate relationship object to relate the two together. This is another relationship that needs to be walked and we then require extra storage unnecessarily used up by duplicate information.
·         Which object do they relate to their threat intel? The English version or the translated version? Both?
·         If OrgA updates IncidentA(en) and IncidentA(jp) isn’t updated, then which object is considered the “truth”?
·         What if a consumer only receives the IncidentA(en), yet a third-party has published a relationship linking IncidentA(jp) to another different Campaign? If the translations were embedded within the same single object then the single relationship would cover all translations.

This has bigger implications on the object lifecycle than many people realize. Anytime that we separate the same logical ‘thing’ into separate versions of that thing we are potentially opening ourselves up to this problem. If we are describing an Incident, then there should be one Incident object with the data relating to that object within that one object. New revisions of that object should update that same object, and updates to that object should not affect any relationships pointing to that object.

I’m worried that we are potentially making a future problem for ourselves if we head down this path.

Cheers

Terry MacDonald
Senior STIX Subject Matter Expert
SOLTRA | An FS-ISAC and DTCC Company
+61 (407) 203 206 | terry@soltra.com

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Masuoka, Ryusuke
Sent: Tuesday, 2 February 2016 6:02 PM
To: Jordan, Bret <bret.jordan@bluecoat.com>; Chris Ricard <cricard@fsisac.us>
Cc: cti@lists.oasis-open.org
Subject: RE: [cti] Idea for Internationalization

Hi, Bret, Ricard,

Thank you for feedback. I reply to this threat as the subject is
more appropriate.

The use case scenarios are not all about translating existing
(text) objects. In some cases:

- Some major text fields (title, description, etc.) are produced
in multiple languages at the time of package creation.

For example, a Japanese entity creates a new CTI package
and gives its title in Japanese as well as English so that
at least someone, who gets interested in the title and
does not read Japanese, can contact the original producer
for further details. (It is often the case for papers written
in Japanese. They provide titles and abstracts in English.)

It is not always clear which is original and which is translation.

I know that it might lose to some degree tractability of which text is original,
but how about making it possible for any text field to
have its text values as many languages as necessary,
specified in the object by “lang” tag. Something like the following.

-----
{
"type": "stix-package",
"id": "stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365",
“title”: [
    {
      “lang”: “en”,
      “value”: “Some really neat campaign that we found”
     },
     {
       “lang”: “ja”,
       “value”: “我々が見つけた、なかなか渋いキャンペーン”
      }
    ]
}
-----

What do you say?

Regards,

Ryu

From:cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Tuesday, February 02, 2016 12:30 PM
To: Chris Ricard
Cc: cti@lists.oasis-open.org
Subject: Re: [cti] Idea for Internationalization

Those are great use cases and match what Ryu brought up. We have not yet heard from the majority of the community yet, but I believe from conversations we have had on Slack that we have some general consensus around the idea that:
All TLOs should have a field called "lang" that defines the languages of the object (ex, en_us)
What we do not yet know is should this be required? Or should be be optional?
Beyond that we have a few different options:

Translated content is embedded inside the original TLO
This I believe is riddled with problems as individuals start making translations of a TLO and needing to re-issue the TLO even if they did not originally produce it.
We will get in to all sorts of versioning problems, similar to what we have today in STIX 1.2
Translated versions of a TLO will be represented as a new TLO
This could work, if there was a relationship object or a translation object that could connect them.
There might be some weirdness in the work flow that we have yet to identify.
Another option would be to create a translation object that just contains the fields that can be translated. You would then have the parent TLO written in lang=foo and the translations that are written in lang=fr, lang=de, lang=jp etc.
This is very similar to #2. However, this object contains just a subset of the fields.
This would allow desperate organizations to produce translations of a TLO independent of the original producers and then release it without needing to re-release the original TLO
You could schema validate this method
This would be super easy for consumers and parsers to deal with
The last option as I see it, is something similar to #3, but where the fields are not defined in the spec. A translator can flag any fields they want to translate and include them in the object. This object will be tied to the original in the same way as #3.
On the consuming side, you could do really interesting things in software with merging the data in your database.
The problem with this is you would not be able to schema validate the translation object as you would have no way of knowing ahead of time, which fields would be included.
This is very flexible, but may produce problems for consumers or parsers.

Thanks,

Bret

Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Feb 1, 2016, at 20:02, Chris Ricard <cricard@fsisac.us> wrote:

Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.

Background: We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.

We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients can evaluate the applicability and importance. The original, English verison is also included, so if the recipient deems it applicable, he/she can translate the rest.

This is all for human-readable reports, but the use case seems similar.

Proposal:

1.        Add an optional language tag in all top level constructs.
2.       Add an optional Alternate Language tag to Relationship objects.
3.       Producers can create multiple language-specific versions of whatever top-level objects they wish.
4.       Producers can create Alternate Language relationships between these alternate language objects.
5.       Consumers can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions.
6.       If the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.

Use Cases:

1.        I produce content in English, but have Japanese constituents. I publish everything in English, and a subset of the info in Japanese. For those objects that I also publish in Japanese, I link the English and Japanese versions together with an Alternate Language relationship.
2.       I consume the content in #1, and English is my primary language. Upon receipt, I discard the Japanese versions and the alternate language relationships.
3.       I consume the content in #1, but Japanese is my primary language. Upon receipt, I consume both the English and Japanese versions. When both exist for a given item, I display the Japanese version first, and provide a link to the English version. When only an English version exists, I display the English version.

For consideration,

Chris Ricard
FS-ISAC

From: Jordan, Bret
Sent: Monday, February 1, 2016 9:02 PM
To: Masuoka, Ryusuke
Cc: cti@lists.oasis-open.org
Subject: Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)

Thanks for the feedback. The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.

One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called "lang". This would be the language that the object is written in. This would enable tools to select and filter by a language.

Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for example, we have tossed around the idea of creating a "translation" object that could be sent either with the original TLO or separately.

Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:

1) A threat intel provider could issue TLOs in language specific versions if they wanted.

2) A threat intel provider could produce language translations and attach them to the TLO.

3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.

There was some initial concern about this model as some believe it might have issues with versioning. But I do not think so, as you would not want translated objects to auto point to a new version. They would be tied at the hip, to the version that were created for.

The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.

What would you think of something like this?

Thanks,

Bret

Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Feb 1, 2016, at 18:29, Masuoka, Ryusuke <masuoka.ryusuke@jp.fujitsu.com> wrote:

Hi, Bret,

I guess it depends. But what I see is scenarios like the following:

- A Japanese entity receives CTI information pieces in English.
  The entity determines some of them are important/critical
and worth translating them into Japanese, add descriptions in Japanese
and redistribute them to other Japanese entities (if redistribution is allowed).
  The CTIM (CTI Management System) of a receiving party displays
  the Japanese description whenever possible, while allowing access to
  the original English descriptions.

- Japanese entities produce CTI in Japanese (not in English, surprise!).
An entity decides some of them are important/critical and worth
translating them into English, add descriptions in English,
and redistribute them to other countries (if redistribution is allowed).
The CTIM of a receiving party displays the English description if so set,
while allowing access to the original Japanese (likely more accurate)
descriptions.

Regards,

Ryu

From: Jordan, Bret [mailto:bret.jordan@bluecoat.com]
Sent: Tuesday, February 02, 2016 10:17 AM
To: Masuoka, Ryusuke/益岡竜介; cti@lists.oasis-open.org
Subject: Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)

Some questions:
Will organizations producing threat intelligence produce one incident for each language?
Or will they produce one big incident that contains all of the languages?
For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?
Or would you expect the TAXII server to send you both?
What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?

Thanks,

Bret

Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Feb 1, 2016, at 18:07, Masuoka, Ryusuke <masuoka.ryusuke@jp.fujitsu.com> wrote:

Hi,

Not UTF-8 thing (I understand most of modern programming languages
and other standards deal with it correctly).

It is about having text fields in multiple languages.
For example, descriptions of a package in English and Japanese.
The system will pick which language to display based on
the language code (“en” or “jp”) in the field.

Is it something already discussed in Slack?
(Sorry if so.)

Regards,

Ryu

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Tuesday, February 02, 2016 9:59 AM
To: Masuoka, Ryusuke/益岡竜介
Cc: Barnum, Sean D.; cti@lists.oasis-open.org
Subject: Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)

I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets? JSON gives us UTF-8 by default. So this alone should make things easier for our international friends..

If this is not what you mean. Please explain and give us some context. We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand the problem that we were trying to solve. Can you help us understand the problem? What works, what does not work, what you need it to do and why?

I really want to make sure our baby works for everyone. But as I said on Slack, "I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a coke".

Thanks,

Bret

Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Feb 1, 2016, at 17:46, Masuoka, Ryusuke <masuoka.ryusuke@jp.fujitsu.com> wrote:

Hi,

Is there a place for “Internationalization” of text fields?
I would like very much to see it in STIX 2.0 (or CTI Common?)
and I am willing to contribute.

Regards,

Ryu

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Barnum, Sean D.
Sent: Tuesday, February 02, 2016 3:49 AM
To: cti@lists.oasis-open.org
Subject: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)

All,

As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target date for draft STIX 2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.

A proposed tranche plan is:
February 29th - Run Indicators to the ground. Get these fundamentals worked through to enable us to talk to vendor on the RSA show floor about it. And have something to show them.
March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor top level abstractions to ground.
April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run Campaign to ground.
May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic support (opinions, assertions, hypotheses, etc.) to ground.
June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming conventions, structure patterns, etc) and quality.
This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is likely to cover patterning and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche.
Please let us know if you see any issues with this tranche plan.

The first tranche (Indicators) is the most relevant for now as it begins today.
Below is a draft plan for the Indicator tranche. This draft Indicator Tranche plan is also in the wiki.
This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it.
We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it.
If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.

Indicator tranche plan
Objective:
To discuss and reach consensus on all in-scope tracker issues for STIX 2.0 that are required to support common indicator use cases.
Target completion date:
February 29, 2016
Proposed workflow:
Raise and describe the issue with a brief wiki writeup
Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams, schema,etc clearly marked as a proposal) to the wiki writeup and announce it to the list.
Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus.
Discuss key issues on weekly working call.
If consensus (unanimous or at least no strong objections) reached:
Capture normative language in pre-draft spec document
Capture consensus changes in JSON Schema implementation
Capture consensus changes in UML model
Capture statement of consensus in issue tracker
Mark issue tracker as “Consensus Achieved”
Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion
If consensus not achieved (strong objection exists) within allowed time window:
Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone
OR
o   Capture current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche

cti message

Indicator tranche plan

Objective:

Target completion date:

Proposed workflow: