OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti] i18n (RE: MVP Discussion) - Yet Another Updated Proposal


Hi Ryu/all,

It seems like we’re at a point where a call would be helpful to finalize things, either for this approach or for a single top-level language tag.

Ryu, you’re currently in Japan, right? Given the people who have been more active I set up a doodle poll with good options for Japan (morning) and OK options for the US (evening)…I realize it’s no good for Europe, we could have a separate call to bring you in.

Poll is here: http://doodle.com/poll/hi59bdbw87bximva



There’s a link on the right side of the page to change your time zone. I don’t consider myself critical for this, by the way.

Thanks,
John

On 4/26/16, 6:12 AM, "cti@lists.oasis-open.org on behalf of Masuoka, Ryusuke" <cti@lists.oasis-open.org on behalf of masuoka.ryusuke@jp.fujitsu.com> wrote:

>Hi, 
>
>Please find below yet another updated proposal for i18n.
>I think that it is simple, minimal, coherent, consistent, one-way-of-doing-things, 
>self-contained, and context-free and that it adds a lot of value
>to the standards. 
>
>As you already know there are two major design decisions.
>
>(1) Language code for every text field
>(2) Direct reference to text 
>
>Actually they come from the same design principal, that is, 
>to avoid dependence on object structure and to make it self-contained
>in text itself. Being self-contained without dependence on object structure:
>
>- It survives revisions and other changes made to objects so that it 
>  protects investment in quality translations.
>
>- Parser simple to implement (information is always available there,
>  you do not have to go through object structures or relations to find
>  the language code or resolve references.)
>
>- It can accommodate many use cases and increase standards' utility 
>  as its self-containedness allows flexibility (for example, multiple 
>  language in a single STIX file).  
>
>As for (2), I could do away with text_id for text fields 
>by using the (hash of) text itself, but could not do away with (1).
>But it is only 7 bytes extra and its utility is, from my point of
>view, huge. 
>
>Other minor design decisions
>
>- {"en": ...} is used instead of {"lang": "en", text_value: ...} both for 
>  text fields and translations for efficiency and consistency. 
>
>- I added necessary fields core provides for translations.
>  (Thanks, Jeffrey)
>
>- Changed Base64 encoding to Hexadecimal encoding for MD5 hash of the original text
>
>Regards,
>
>Ryu
>
>------------------------------------------------------------
>Internationalization - Another Updated Proposal - 20160426
>------------------------------------------------------------
>
>- STIX/CybOX should be UTF-8 Encoded.
>
>- Always give the language code as the keyword for every text field.
>  Only one text in a single language code is allowed. 
>
>- Always give "text_ref" and the language code as the keyword for every translation.
>  Use Hexadecimal-encoded MD5 hash of the original text in UTF-8 for the "text_ref" value to 
>  refer the original text.
>
>- One can provide the translation for one of translated texts other than the original text.
>
>
>-----
>- Pattern A - Translation given inside the same original package
>-----
>
>{
>  "type": "package",
>  ...
>  "campaigns": [
>    {
>      "type": "campaign",
>      "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>      "revision": 1,
>      "spec_version": "stix-2.0",
>      "created_at": "2015-12-03T13:13Z",
>      "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
>      "title": {"en": "Dridex Campaign - Botnet 121"}, 
>      "descriptions": {"en": "Dridex-based campaign leveraging Botnet 121"}, 
>      "intended_effects": [
>        {"value": "theft-identity-theft"}
>      ],
>      "status": "Ongoing"
>   }
>  ],
>  "translations": [
>    {"id":"trans-1",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["CTI-Provider-1"],
>     "version":1,
>     "text_ref: "41cb32a0d74d5d07f5362b3e66f245c9", 
>     "ja": "Dridex キャンペーン - ボットネット 121"}, 
>
>    {"id":"trans-2",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["CTI-Provider-1"],
>     "version":1,
>     "text_ref": "e8465d411f6580e8b67d778f25a78234", 
>     "ja": "ボットネット 121 を活用する Dridex を元にしたキャンペーン"}, 
>
>    {"id":"trans-3",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["CTI-Provider-1"],
>     "version":1,
>     "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", 
>     "de": "Some German Title"}, 
>
>    {"id":"trans-4",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["CTI-Provider-1"],
>     "version":1,
>     "text_ref": "e8465d411f6580e8b67d778f25a78234", 
>     "de": "Some German Description"}
>  ]
>  ...
>}
> 
>-----
>- Pattern B - Translation given by a third-party in some external database
>-----
>
>{
>  "translations": [
>    {"id":"trans-A",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["Translator-1"],
>     "version":1,
>     "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", 
>     "es": "Some Spanish Title"}, 
>
>    {"id":"trans-B",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["Translator-1"],
>     "version":1,
>     "text_ref": "e8465d411f6580e8b67d778f25a78234", 
>     "es": "Some Spanish Description"}, 
>
>    {"id":"trans-C",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["Translator-1"],
>     "version":1,
>     "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", 
>     "fr": "Some French Title"}, 
>
>    {"id":"trans-C",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["Translator-1"],
>     "version":1,
>     "text_ref": "e8465d411f6580e8b67d778f25a78234", 
>     "fr": "Some French Description"}
>  ]
>}
>
>-----
>- Pattern C - A Japanese CTI Provider creates CTI with its title in English and description in Japanese
>-----
>
>{
>  "type": "package",
>  ...
>  "campaigns": [
>    {
>      "type": "campaign",
>      "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>     "revision": 1,
>      "spec_version": "stix-2.0",
>      "created_at": "2015-12-03T13:13Z",
>      "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
>      "title": {"en": "Dridex Campaign - Botnet 121"}, 
>      "descriptions": {"ja": "ボットネット 121 を活用する Dridex を元にしたキャンペーン"}, 
>      "intended_effects": [
>        {"value": "theft-identity-theft"}
>      ],
>      "status": "Ongoing"
>   }
>  ], 
>  ...
>}
>
>------------------------------
>Notes - Simple, minimal, coherent, consistent, self-contained, context-free, future-proofed
>------------------------------
>
>- Only seven additional bytes (without white spaces) for each text field.
>
>- As it is refers to the text itself, it does not break if there is 
>  revisions of the objects as long as the text stays the same. 
>
>- As its scope is limited to text-fields and therefore it is self-contained:
>
>  - It is very unlikely this impacts other parts of STIX and other standards. 
>
>  - There will be very little (if not "no") considerations necessary 
>    for future standard developments/changes. 
>
>  - It would be easy to implement as the same and context-free codes can 
>    handle any text field. 
>
>- There is only one way to express text fields and translations
>
>- Resources spent for translation will not be wasted as long as the text stays same.
>
>  - Even if someone else reuses the same text, its translations are still applicable.
> 
>------------------------------
>Internationalization Use Cases
>------------------------------
>
>CN: Chinese
>DE: German
>EN: English
>FR: French
>JA: Japanese
>
>------------------------------
>(1) Providing text fields in multiple languages simultaneously at the time of creation.
>------------------------------
>
>  [ja/en (in case of Japan), en/fr/de (in case of EU countries), etc.]
>
>This is the most likely use case (for me). The original CTI has titles/descriptions in 
>multiple languages from the start. When you create a CTI file, you include 
>both English and Japanese titles/descriptions for major objects in it
>so that non-Japanese speaking people can at least find out what it is at the top level.
>
>Or another use case is the CTI provider in Japan writes a CTI file with its
>title in English and description in Japanese. This is because many Japanese
>can read short English titles, but many Japanese have difficulties to understand
>long and detailed descriptions in English. 
>
>------------------------------
>(2) CTI Database Receiving CTI from Multiple CTI Sources in Different Languages
>------------------------------
>
>This is a case where you receive CTI from a English CTI source and 
>another CTI source in Japanese. 
>You put all CTI into MongoDB or some other No-SQL Database and 
>would like to do mix and match. I would like the CTI Database still 
>can track the language code of textual fields.
>
>------------------------------
>(3) EN CTI received by a Japanese entity, which provides EN translation
>  (Or vice versa, JA CTI received by a US entity, which provides EN translation
>------------------------------
>
>  A Japanese entity receives CTI information pieces in English.
>  The entity determines some of them are important/critical
>  and worth translating them into Japanese, add descriptions in Japanese
>  and redistribute them to other Japanese entities (if redistribution is allowed).
>  The CTIM (CTI Management System) of a receiving party displays
>  the Japanese description whenever possible, while allowing access to
>  the original English descriptions."
>
>  Work Flow:
>  1. Company 1 in EN creates an Indicator and TTP and shares them to Company 2 in JP.  
>    It is important to note that the flow may be direct or may be through a series of brokers and other entities.  
>    1. This Indicator and TTP has a producer of Company 1 and a version of 1
>  2. Company 2 builds a translated version of the TTP and Indicator and releases it.
>    1. This new Indicator and TTP has a producer of Company 2 and a version of 2.  
>    2. It is unrealistic to think that Company 2 can or will share the translated object back to Company 1 and that if Company 1 gets the translated object that they will do anything with it.  Their legal departments will probably prohibit accepting 3rd party translations and then using them in their offerings.
>
>------------------------------
>(4) An English CTI report describing attacks against Japanese entities in EN  
>------------------------------
>
>  An English report on Cyber Attacks on Japan.
>  There are filenames of lure attachments in Japanese (original/real) and their
>  translations in English.  Another similar report in English might have an email title along with 
>  its translation in English next to it. That report also has a Windows pathname 
>  in Chinese (not Japanese) found in a binary along with its translation in English.
>
>  These Japanese texts can be found in descriptions, not just 
>
>  [Ex. Original File Name (JA): "医療費通知", Translated File Name (EN): "Medical expenses notice"]
>
>  Note: This should probably be okay as long as the standards require use of UTF-8 for encoding.
>
>------------------------------
>(5) Email subject/body, supposed to be in JP, but includes CN characters (by mistake of the attackers)
>------------------------------
>
>  This can happen due to Chinese/Japanese/Korean sharing Unicode characters
>  (CJK characters - https://en.wikipedia.org/wiki/CJK_characters.)
>
>  This can be a very important clue as to the attackers.
>
>  Note: This should probably be okay as long as the standards require use of UTF-8 for encoding.
>
>------------------------------
>(6) CTI translation service
>------------------------------
>
>  A CTI translation service provider keeps translations to target languages of text fields 
>  from publicly available and/or commercial/private CTI sources.
>  The service is available through some kind of online API.
>  Consumers of this translation service will use this service to translate text fields
>  in their CTI system through the API provided by the translation service provider.
>
>------------------------------
>(7) CTI provider
>------------------------------
>
>  A CTI provider (in English) plans to penetrate the Japanese and other APAC markets
>  and needs a standard way to add translations of their text fields.
>  The CTI provider gives its customer a CTI package with all the translations in it
>  or a CTI package with translations to the languages of user's choosing.  
>
>------------------------------------------------------------
>
>
>---------------------------------------------------------------------
>To unsubscribe from this mail list, you must leave the OASIS TC that 
>generates this mail.  Follow this link to all your TCs in OASIS at:
>https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]