RE: [cti] i18n (RE: MVP Discussion)

Hi, Patrick,

Thank you for JSON-LD info.

I was not aware of it and I will check it out.

(I quickly read JSON-LD’s “6.9 String Internationalization”,

but I could not get the idea in my first reading.)

In general, I think we should adopt available standards (of course, good ones)

as much as possible.

Regards,

Ryu

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Patrick Maroney
Sent: Sunday, April 10, 2016 10:29 PM
To: Masuoka, Ryusuke/益岡竜介; Jordan, Bret; cti@lists.oasis-open.org
Subject: Re: [cti] i18n (RE: MVP Discussion)

We can also consider adoption/inclusion of the JSON-LD methods for solving common global challenges like internationalization :

http://www.cetis.org.uk/inloc/JSON-LD#JSON-LD-RepresentingstringsandlanguagesinJSONLD

Representing strings and languages in JSON-LD

There are strict rules for representing strings in JSON. See e.g. RFC 4627. In particular, new lines and carriage returns must be escaped, so that a string in JSON cannot be split across several lines. Thus it is not an ideal format for human reading.

For potentially multilingual strings, as illustrated and explained in the examples above, this JSON binding is more restrictive than the XML binding. In general, equivalent strings in more than one language are represented by a language map, in which each string has its language specified.

The string to display in any situation should be:

if the language map has only one key:value pair, then that string value;

or if there is more than one key:value pair, then

the string corresponding to the desired language; or if this is not present

the string corresponding to the default language, if defined; or

any of the strings in the language map.

Patrick Maroney
President
Integrated Networking Technologies, Inc.
Desk: (856)983-0001
Cell: (609)841-5104
Email: pmaroney@specere.org

On Sun, Apr 10, 2016 at 2:15 AM -0700, "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com> wrote:

Hi, Bret, all,

Thank you for discussing i18n. Sorry about my out-of-sync responses

(due to time differences).

I should have done earlier (but I have been crazy busy), I have included

an updated use case list at the end of this message.

I have tried to list use cases according to (my own) priority and

(1) and (2) are most important ones.

Giving “lang: en” at the top level does not, I am afraid, address

the use case (1) in which a CTI file has titles/descriptions in

multiple languages from the start. (You can give “lang: en” at

the top level, but we need to be able to give a scope of

different language.)

Other thoughts:

(1) Regardless of whether we implement it or not (there are practical issues) and

it took me a while, but I realized the correct way to conceptualize language code

is a property of the text. Translations are not properties of the object, but of text.

This is, I understand, how Xliff - http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html,

is doing.

(2) If we were to ensure that we can give translations later on (use cases (2) and (4)),

we need to have a way to identify the textual element from the outside of the CTI file.

I thought about possibilities:

(1) Give every textual field an unique ID

(2) Use mechanisms like

JSON Reference - http://json-spec.readthedocs.org/en/latest/reference.html

JSON Pointer - http://json-spec.readthedocs.org/en/latest/pointer.html

Regards,

Ryu

------------------------------

Internationalization Use Cases

------------------------------

CN: Chinese

DE: German

EN: English

FR: French

JA: Japanese

------------------------------

(1) Providing an object texts in multiple languages simultaneously at the time of creation.

------------------------------

[ja/en (in case of Japan), en/fr/de (in case of EU countries), etc.]

This is the most likely use case (for me). The original CTI has titles/descriptions in

multiple languages from the start. When you create a CTI file, you include

both English and Japanese titles/descriptions for major objects in it

so that non-Japanese speaking people can at least find out what it is at the top level.

------

Example (1)

------

{

"type": "package",

...

"campaigns": [

{

"type": "campaign",

"id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",

"spec_version": "stix-2.0",

"created_at": "2015-12-03T13:13Z",

"created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",

"title": "Dridex Campaign - Botnet 121", <- Title in EN

"Dridex キャンペーン - ボットネット 121" <- Title in JP

"descriptions": [

"Dridex-based campaign leveraging Botnet 121" <- Description in EN

"ボットネット 121 を活用する Dridex を元にしたキャンペーン" <- Description in JP

"intended_effects": [

{"value": "theft-identity-theft"}

"status": "Ongoing"

}

...

}

------

------------------------------

(2) CTI Database Receiving CTI from Multiple CTI Sources in Different Languages

------------------------------

This is a case where you receive CTI from a English CTI source and

another CTI source in Japanese.

You put all CTI into MongoDB or some other No-SQL Database and

would like to do mix and match. I would like the CTI Database still

can track the language code of textual fields.

------------------------------

(2) EN CTI received by a Japanese entity, which provides EN translation

(Or vice versa, JA CTI received by a US entity, which provides EN translation

------------------------------

A Japanese entity receives CTI information pieces in English.

The entity determines some of them are important/critical

and worth translating them into Japanese, add descriptions in Japanese

and redistribute them to other Japanese entities (if redistribution is allowed).

The CTIM (CTI Management System) of a receiving party displays

the Japanese description whenever possible, while allowing access to

the original English descriptions."

Work Flow:

1. Company 1 in EN creates an Indicator and TTP and shares them to Company 2 in JP.

It is important to note that the flow may be direct or may be through a series of brokers and other entities.

1. This Indicator and TTP has a producer of Company 1 and a version of 1

2. Company 2 builds a translated version of the TTP and Indicator and releases it.

1. This new Indicator and TTP has a producer of Company 2 and a version of 2.

2. It is unrealistic to think that Company 2 can or will share the translated object back to Company 1 and that if Company 1 gets the translated object that they will do anything with it. Their legal departments will probably prohibit accepting 3rd party translations and then using them in their offerings.

------------------------------

(3) An English CTI report describing attacks against Japanese entities in EN

------------------------------

An English report on Cyber Attacks on Japan.

There are filenames of lure attachments in Japanese (original/real) and their

translations in English. Another similar report in English might have an email title along with

its translation in English next to it. That report also has a Windows pathname

in Chinese (not Japanese) found in a binary along with its translation in English.

These Japanese texts can be found in descriptions, not just

[Ex. Original File Name (JA): "医療費通知", Translated File Name (EN): "Medical expenses notice"]

Note: This should probably be okay as long as the standards require use of UTF-8 for encoding.

------------------------------

(4) Email subject/body, supposed to be in JP, but includes CN characters (by mistake of the attackers)

------------------------------

This can happen due to Chinese/Japanese/Korean sharing Unicode characters

(CJK characters - https://en.wikipedia.org/wiki/CJK_characters.)

This can be a very important clue as to the attackers.

Note: This should probably be okay as long as the standards require use of UTF-8 for encoding.

------------------------------

(5) CTI translation service

------------------------------

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Friday, April 08, 2016 7:02 AM
To: Masuoka, Ryusuke/益岡竜介
Cc: cti@lists.oasis-open.org
Subject: [cti] Re: MVP Discussion

Ryu,

Would something like this work for you? We have already defined this "translation-of" relationship. So it would be trivial to add some text to describe how it should work.

The first report is the "original" report... You will then see a second report at the bottom and a relationship object with a type of "translation-of" that links them together. Doing it this way will allow people other than the object creator to write a translation.

[
{
  "id": "report--cbf7a3eb-5ef0-42ef-a30c-14be2a14cc1d",
  "type": "report",
  "lang": "en",
  "created_at": "2016-01-29T21:18:33Z",
  "title": "Hi, this text is in English",
  "description": "So is this"
},
{
  "id": "relationship--7f3fcd28-9a4b-480b-852b-77e7b33db237",
  "type": "relationship",
  "source_ref": "report--cbf7a3eb-5ef0-42ef-a30c-14be2a14cc1d",
  "target_ref": "report--5cfa580e-ea87-4d11-b0cc-600af7a64968",
  "bidirectional": true,
  "value": "translation-of"
},
{
  "id": "report--5cfa580e-ea87-4d11-b0cc-600af7a64968",
  "type": "report",
  "lang": "es",
  "created_at": "2016-01-29T21:18:33Z",
  "title": "Hola, este texto es español",
  "description": "Asi es esto"
}
]

Thanks,

Bret

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Apr 7, 2016, at 04:45, Masuoka, Ryusuke <masuoka.ryusuke@jp.fujitsu.com> wrote:

Hi, Bret,

> 5) If the feature is not used in mass today,

> then it probably does not warrant being an MVP item.

> Not used == not used. I am sure between Soltra and EclecticIQ

> they can give us some great metrics.

I am a bit concerned with “used in mass today.”

Yes, I am thinking about Internationalization.

Without it, I cannot start accumulating CTI nor

implementing CTI systems seriously.

I am afraid that those who use a language

other than English as the primary language for his/her work

are probably a minority on this TC.

On the other hand, I cannot come up with very good and fair

criteria as to which to pick as MVP items...

Regards,

Ryu

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Wednesday, April 06, 2016 1:30 AM
To: cti@lists.oasis-open.org
Subject: [cti] MVP Discussion

All,

I have a few concerns with the current MVP items as discussed on the call today

1) We need a statistically significant number of people to vote, before we can decide if it is in or out.

2) I feel that some of the items in the list are not well understood, and thus we got mixed voting.

3) I think this voting needs to be moved to a SurveyMonkey and we need to add the options of "abstain" and "I do not know what this means".

4) Things that have 100% votes, should be in, and we should do those first.

5) If the feature is not used in mass today, then it probably does not warrant being an MVP item. Not used == not used. I am sure between Soltra and EclecticIQ they can give us some great metrics.

6) The current list represents a LOT of stuff. Keep in mind that it may take groups 2-5 years to full support everything in that list. That means in the mean time you will have a lot of products that are NOT compatible with each other. Can you imaging the conformance issues that this will cause? Keep in mind that even Soltra Edge does not fully support STIX 1.2 and how long ago did that come out.

7) If the 2.0 MVP does not have everything that a group needs, say the USG. Then they can keep using STIX 1.2 until such a time that the 2.x tree does have what they need. I do not believe any of us are saying that people need to switch from STIX 1.2 to STIX 2.0 on day one.

8) For orgs that are currently using STIX 1.2. You will probably not want to switch to the 2.x family until about 2.2 or 2.3, would be my finger to the wind guess.

9) For orgs that are not yet doing anything with STIX yet, what is the bare minimum that you need to make a solution work.

10) Things we do not understand well or that are not really used should be pushed to a 2.x release.

Thanks,

Bret

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

cti message