[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [cti] i18n (RE: MVP Discussion)
Hi, Patrick,
Thank you for JSON-LD info.
I was not aware of it and I will check it out.
(I quickly read JSON-LD’s
“6.9 String Internationalization”, but I could not get the idea in my first reading.) In general, I think we should adopt available standards (of course, good ones) as much as possible.
Regards, Ryu From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
On Behalf Of Patrick Maroney We can also consider adoption/inclusion of the JSON-LD methods for solving common global challenges like internationalization : Representing strings and languages in JSON-LD There are strict rules for representing strings in JSON. See e.g. RFC 4627. In particular, new lines and carriage returns must be escaped, so that a string in JSON cannot be split across several lines. Thus it is not
an ideal format for human reading. For potentially multilingual strings, as illustrated and explained in the examples above, this JSON binding is more restrictive than the XML binding. In general, equivalent strings in more than one language are represented
by a language map, in which each string has its language specified. The string to display in any situation should be: if the language map has only one key:value pair, then that string value; or if there is more than one key:value pair, then the string corresponding to the desired language; or if this is not present the string corresponding to the default language, if defined; or any of the strings in the language map. Patrick Maroney
On Sun, Apr 10, 2016 at 2:15 AM -0700, "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com> wrote: Hi, Bret, all, Thank you for discussing i18n. Sorry about my out-of-sync responses (due to time differences).
I should have done earlier (but I have been crazy busy), I have included an updated use case list at the end of this message. I have tried to list use cases according to (my own) priority and
(1) and (2) are most important ones. Giving
“lang: en”
at the top level does not, I am afraid, address the use case (1) in which a CTI file has titles/descriptions in
multiple languages from the start. (You can give
“lang: en”
at the top level, but we need to be able to give a scope of
different language.) Other thoughts: (1) Regardless of whether we implement it or not (there are practical issues) and
it took me a while, but I realized the correct way to conceptualize language code
is a property of the text. Translations are not properties of the object, but of text. This is, I understand, how Xliff -
http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html,
is doing.
(2) If we were to ensure that we can give translations later on (use cases (2) and (4)),
we need to have a way to identify the textual element from the outside of the CTI file. I thought about possibilities: (1) Give every textual field an unique ID (2) Use mechanisms like JSON Reference -
http://json-spec.readthedocs.org/en/latest/reference.html JSON Pointer -
http://json-spec.readthedocs.org/en/latest/pointer.html Regards, Ryu ------------------------------ Internationalization Use Cases ------------------------------ CN: Chinese DE: German EN: English FR: French JA: Japanese ------------------------------ (1) Providing an object texts in multiple languages simultaneously at the time of creation. ------------------------------ [ja/en (in case of Japan), en/fr/de (in case of EU countries), etc.] This is the most likely use case (for me). The original CTI has titles/descriptions in
multiple languages from the start. When you create a CTI file, you include
both English and Japanese titles/descriptions for major objects in it so that non-Japanese speaking people can at least find out what it is at the top level. ------ Example (1) ------ { "type": "package", ... "campaigns": [ { "type": "campaign", "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31", "spec_version": "stix-2.0", "created_at": "2015-12-03T13:13Z", "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde", "title": "Dridex Campaign - Botnet 121", <- Title in EN "Dridex
キャンペーン -
ボットネット 121" <- Title in JP "descriptions": [ "Dridex-based campaign leveraging Botnet 121" <- Description in EN "ボットネット 121
を活用する Dridex を元にしたキャンペーン" <- Description in JP ], "intended_effects": [ {"value": "theft-identity-theft"} ], "status": "Ongoing" } ], ... } ------ ------------------------------ (2) CTI Database Receiving CTI from Multiple CTI Sources in Different Languages ------------------------------ This is a case where you receive CTI from a English CTI source and
another CTI source in Japanese.
You put all CTI into MongoDB or some other No-SQL Database and
would like to do mix and match. I would like the CTI Database still
can track the language code of textual fields. ------------------------------ (2) EN CTI received by a Japanese entity, which provides EN translation (Or vice versa, JA CTI received by a US entity, which provides EN translation ------------------------------ A Japanese entity receives CTI information pieces in English. The entity determines some of them are important/critical and worth translating them into Japanese, add descriptions in Japanese and redistribute them to other Japanese entities (if redistribution is allowed). The CTIM (CTI Management System) of a receiving party displays the Japanese description whenever possible, while allowing access to the original English descriptions." Work Flow: 1. Company 1 in EN creates an Indicator and TTP and shares them to Company 2 in JP.
It is important to note that the flow may be direct or may be through a series of brokers and other entities.
1. This Indicator and TTP has a producer of Company 1 and a version of 1 2. Company 2 builds a translated version of the TTP and Indicator and releases it. 1. This new Indicator and TTP has a producer of Company 2 and a version of 2.
2. It is unrealistic to think that Company 2 can or will share the translated object back to Company 1 and that if Company 1 gets the translated object
that they will do anything with it. Their legal departments will probably prohibit accepting 3rd party translations and then using them in their offerings. ------------------------------ (3) An English CTI report describing attacks against Japanese entities in EN
------------------------------ An English report on Cyber Attacks on Japan. There are filenames of lure attachments in Japanese (original/real) and their translations in English. Another similar report in English might have an email title along with
its translation in English next to it. That report also has a Windows pathname
in Chinese (not Japanese) found in a binary along with its translation in English. These Japanese texts can be found in descriptions, not just
[Ex. Original File Name (JA): "医療費通知", Translated File Name
(EN): "Medical expenses notice"] Note: This should probably be okay as long as the standards require use of UTF-8 for encoding. ------------------------------ (4) Email subject/body, supposed to be in JP, but includes CN characters (by mistake of the attackers) ------------------------------ This can happen due to Chinese/Japanese/Korean sharing Unicode characters (CJK characters -
https://en.wikipedia.org/wiki/CJK_characters.) This can be a very important clue as to the attackers. Note: This should probably be okay as long as the standards require use of UTF-8 for encoding. ------------------------------ (5) CTI translation service ------------------------------ From:
cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
On Behalf Of Jordan, Bret Ryu, Would something like this work for you? We have already defined this "translation-of" relationship. So it would be trivial to add some text to describe how it should work. The first report is the "original" report... You will then see a second report at the bottom and a relationship object with a type of "translation-of" that links them together. Doing it this way will allow people other
than the object creator to write a translation.
Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards | Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
|
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]