OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [cti] i18n (RE: MVP Discussion) - Yet Another Updated Proposal


Hi, John,

> Ryu, you’re currently in Japan, right?

Yes, I am. And the Golden Week starts in Japan from Apr. 29.

  The Golden Week
  https://en.wikipedia.org/wiki/Golden_Week_(Japan)

So the earliest I could mark was May 6 (Fri). 
(Also I will be offline for Apr. 29 - May 5.)

Regards,

Ryu

-----Original Message-----
From: Wunder, John A. [mailto:jwunder@mitre.org] 
Sent: Tuesday, April 26, 2016 10:48 PM
To: Masuoka, Ryusuke/益岡 竜介; cti@lists.oasis-open.org
Subject: Re: [cti] i18n (RE: MVP Discussion) - Yet Another Updated Proposal

Hi Ryu/all,

It seems like we’re at a point where a call would be helpful to finalize things, either for this approach or for a single top-level language tag.

Ryu, you’re currently in Japan, right? Given the people who have been more active I set up a doodle poll with good options for Japan (morning) and OK options for the US (evening)…I realize it’s no good for Europe, we could have a separate call to bring you in.

Poll is here: http://doodle.com/poll/hi59bdbw87bximva



There’s a link on the right side of the page to change your time zone. I don’t consider myself critical for this, by the way.

Thanks,
John

On 4/26/16, 6:12 AM, "cti@lists.oasis-open.org on behalf of Masuoka, Ryusuke" <cti@lists.oasis-open.org on behalf of masuoka.ryusuke@jp.fujitsu.com> wrote:

>Hi,
>
>Please find below yet another updated proposal for i18n.
>I think that it is simple, minimal, coherent, consistent, 
>one-way-of-doing-things, self-contained, and context-free and that it 
>adds a lot of value to the standards.
>
>As you already know there are two major design decisions.
>
>(1) Language code for every text field
>(2) Direct reference to text
>
>Actually they come from the same design principal, that is, to avoid 
>dependence on object structure and to make it self-contained in text 
>itself. Being self-contained without dependence on object structure:
>
>- It survives revisions and other changes made to objects so that it
>  protects investment in quality translations.
>
>- Parser simple to implement (information is always available there,
>  you do not have to go through object structures or relations to find
>  the language code or resolve references.)
>
>- It can accommodate many use cases and increase standards' utility
>  as its self-containedness allows flexibility (for example, multiple
>  language in a single STIX file).  
>
>As for (2), I could do away with text_id for text fields by using the 
>(hash of) text itself, but could not do away with (1).
>But it is only 7 bytes extra and its utility is, from my point of view, 
>huge.
>
>Other minor design decisions
>
>- {"en": ...} is used instead of {"lang": "en", text_value: ...} both 
>for
>  text fields and translations for efficiency and consistency. 
>
>- I added necessary fields core provides for translations.
>  (Thanks, Jeffrey)
>
>- Changed Base64 encoding to Hexadecimal encoding for MD5 hash of the 
>original text
>
>Regards,
>
>Ryu
>
>------------------------------------------------------------
>Internationalization - Another Updated Proposal - 20160426
>------------------------------------------------------------
>
>- STIX/CybOX should be UTF-8 Encoded.
>
>- Always give the language code as the keyword for every text field.
>  Only one text in a single language code is allowed. 
>
>- Always give "text_ref" and the language code as the keyword for every translation.
>  Use Hexadecimal-encoded MD5 hash of the original text in UTF-8 for 
>the "text_ref" value to
>  refer the original text.
>
>- One can provide the translation for one of translated texts other than the original text.
>
>
>-----
>- Pattern A - Translation given inside the same original package
>-----
>
>{
>  "type": "package",
>  ...
>  "campaigns": [
>    {
>      "type": "campaign",
>      "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>      "revision": 1,
>      "spec_version": "stix-2.0",
>      "created_at": "2015-12-03T13:13Z",
>      "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
>      "title": {"en": "Dridex Campaign - Botnet 121"}, 
>      "descriptions": {"en": "Dridex-based campaign leveraging Botnet 121"}, 
>      "intended_effects": [
>        {"value": "theft-identity-theft"}
>      ],
>      "status": "Ongoing"
>   }
>  ],
>  "translations": [
>    {"id":"trans-1",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["CTI-Provider-1"],
>     "version":1,
>     "text_ref: "41cb32a0d74d5d07f5362b3e66f245c9", 
>     "ja": "Dridex キャンペーン - ボットネット 121"},
>
>    {"id":"trans-2",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["CTI-Provider-1"],
>     "version":1,
>     "text_ref": "e8465d411f6580e8b67d778f25a78234", 
>     "ja": "ボットネット 121 を活用する Dridex を元にしたキャンペーン"},
>
>    {"id":"trans-3",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["CTI-Provider-1"],
>     "version":1,
>     "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", 
>     "de": "Some German Title"},
>
>    {"id":"trans-4",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["CTI-Provider-1"],
>     "version":1,
>     "text_ref": "e8465d411f6580e8b67d778f25a78234", 
>     "de": "Some German Description"}
>  ]
>  ...
>}
> 
>-----
>- Pattern B - Translation given by a third-party in some external 
>database
>-----
>
>{
>  "translations": [
>    {"id":"trans-A",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["Translator-1"],
>     "version":1,
>     "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", 
>     "es": "Some Spanish Title"},
>
>    {"id":"trans-B",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["Translator-1"],
>     "version":1,
>     "text_ref": "e8465d411f6580e8b67d778f25a78234", 
>     "es": "Some Spanish Description"},
>
>    {"id":"trans-C",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["Translator-1"],
>     "version":1,
>     "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", 
>     "fr": "Some French Title"},
>
>    {"id":"trans-C",
>     "type":"translation",
>     "created_at":"2016-04-19",
>     "created_by_refs":["Translator-1"],
>     "version":1,
>     "text_ref": "e8465d411f6580e8b67d778f25a78234", 
>     "fr": "Some French Description"}
>  ]
>}
>
>-----
>- Pattern C - A Japanese CTI Provider creates CTI with its title in 
>English and description in Japanese
>-----
>
>{
>  "type": "package",
>  ...
>  "campaigns": [
>    {
>      "type": "campaign",
>      "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>     "revision": 1,
>      "spec_version": "stix-2.0",
>      "created_at": "2015-12-03T13:13Z",
>      "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
>      "title": {"en": "Dridex Campaign - Botnet 121"}, 
>      "descriptions": {"ja": "ボットネット 121 を活用する Dridex を元にしたキャンペーン"}, 
>      "intended_effects": [
>        {"value": "theft-identity-theft"}
>      ],
>      "status": "Ongoing"
>   }
>  ],
>  ...
>}
>
>------------------------------
>Notes - Simple, minimal, coherent, consistent, self-contained, 
>context-free, future-proofed
>------------------------------
>
>- Only seven additional bytes (without white spaces) for each text field.
>
>- As it is refers to the text itself, it does not break if there is
>  revisions of the objects as long as the text stays the same. 
>
>- As its scope is limited to text-fields and therefore it is self-contained:
>
>  - It is very unlikely this impacts other parts of STIX and other standards. 
>
>  - There will be very little (if not "no") considerations necessary 
>    for future standard developments/changes. 
>
>  - It would be easy to implement as the same and context-free codes can 
>    handle any text field. 
>
>- There is only one way to express text fields and translations
>
>- Resources spent for translation will not be wasted as long as the text stays same.
>
>  - Even if someone else reuses the same text, its translations are still applicable.
> 
>------------------------------
>Internationalization Use Cases
>------------------------------
>
>CN: Chinese
>DE: German
>EN: English
>FR: French
>JA: Japanese
>
>------------------------------
>(1) Providing text fields in multiple languages simultaneously at the time of creation.
>------------------------------
>
>  [ja/en (in case of Japan), en/fr/de (in case of EU countries), etc.]
>
>This is the most likely use case (for me). The original CTI has 
>titles/descriptions in multiple languages from the start. When you 
>create a CTI file, you include both English and Japanese 
>titles/descriptions for major objects in it so that non-Japanese speaking people can at least find out what it is at the top level.
>
>Or another use case is the CTI provider in Japan writes a CTI file with 
>its title in English and description in Japanese. This is because many 
>Japanese can read short English titles, but many Japanese have 
>difficulties to understand long and detailed descriptions in English.
>
>------------------------------
>(2) CTI Database Receiving CTI from Multiple CTI Sources in Different 
>Languages
>------------------------------
>
>This is a case where you receive CTI from a English CTI source and 
>another CTI source in Japanese.
>You put all CTI into MongoDB or some other No-SQL Database and would 
>like to do mix and match. I would like the CTI Database still can track 
>the language code of textual fields.
>
>------------------------------
>(3) EN CTI received by a Japanese entity, which provides EN translation
>  (Or vice versa, JA CTI received by a US entity, which provides EN 
>translation
>------------------------------
>
>  A Japanese entity receives CTI information pieces in English.
>  The entity determines some of them are important/critical  and worth 
> translating them into Japanese, add descriptions in Japanese  and 
> redistribute them to other Japanese entities (if redistribution is allowed).
>  The CTIM (CTI Management System) of a receiving party displays  the 
> Japanese description whenever possible, while allowing access to  the 
> original English descriptions."
>
>  Work Flow:
>  1. Company 1 in EN creates an Indicator and TTP and shares them to Company 2 in JP.  
>    It is important to note that the flow may be direct or may be through a series of brokers and other entities.  
>    1. This Indicator and TTP has a producer of Company 1 and a version 
> of 1  2. Company 2 builds a translated version of the TTP and Indicator and releases it.
>    1. This new Indicator and TTP has a producer of Company 2 and a version of 2.  
>    2. It is unrealistic to think that Company 2 can or will share the translated object back to Company 1 and that if Company 1 gets the translated object that they will do anything with it.  Their legal departments will probably prohibit accepting 3rd party translations and then using them in their offerings.
>
>------------------------------
>(4) An English CTI report describing attacks against Japanese entities 
>in EN
>------------------------------
>
>  An English report on Cyber Attacks on Japan.
>  There are filenames of lure attachments in Japanese (original/real) 
> and their  translations in English.  Another similar report in English 
> might have an email title along with  its translation in English next 
> to it. That report also has a Windows pathname  in Chinese (not Japanese) found in a binary along with its translation in English.
>
>  These Japanese texts can be found in descriptions, not just
>
>  [Ex. Original File Name (JA): "医療費通知", Translated File Name (EN): 
> "Medical expenses notice"]
>
>  Note: This should probably be okay as long as the standards require use of UTF-8 for encoding.
>
>------------------------------
>(5) Email subject/body, supposed to be in JP, but includes CN 
>characters (by mistake of the attackers)
>------------------------------
>
>  This can happen due to Chinese/Japanese/Korean sharing Unicode 
> characters  (CJK characters - 
> https://en.wikipedia.org/wiki/CJK_characters.)
>
>  This can be a very important clue as to the attackers.
>
>  Note: This should probably be okay as long as the standards require use of UTF-8 for encoding.
>
>------------------------------
>(6) CTI translation service
>------------------------------
>
>  A CTI translation service provider keeps translations to target 
> languages of text fields  from publicly available and/or commercial/private CTI sources.
>  The service is available through some kind of online API.
>  Consumers of this translation service will use this service to 
> translate text fields  in their CTI system through the API provided by the translation service provider.
>
>------------------------------
>(7) CTI provider
>------------------------------
>
>  A CTI provider (in English) plans to penetrate the Japanese and other 
> APAC markets  and needs a standard way to add translations of their text fields.
>  The CTI provider gives its customer a CTI package with all the 
> translations in it  or a CTI package with translations to the languages of user's choosing.
>
>------------------------------------------------------------
>
>
>---------------------------------------------------------------------
>To unsubscribe from this mail list, you must leave the OASIS TC that 
>generates this mail.  Follow this link to all your TCs in OASIS at:
>https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]