[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [cti] i18n (RE: MVP Discussion) - Yet Another Updated Proposal
Hi Ryu/all, It seems like we’re at a point where a call would be helpful to finalize things, either for this approach or for a single top-level language tag. Ryu, you’re currently in Japan, right? Given the people who have been more active I set up a doodle poll with good options for Japan (morning) and OK options for the US (evening)…I realize it’s no good for Europe, we could have a separate call to bring you in. Poll is here: http://doodle.com/poll/hi59bdbw87bximva There’s a link on the right side of the page to change your time zone. I don’t consider myself critical for this, by the way. Thanks, John On 4/26/16, 6:12 AM, "cti@lists.oasis-open.org on behalf of Masuoka, Ryusuke" <cti@lists.oasis-open.org on behalf of masuoka.ryusuke@jp.fujitsu.com> wrote: >Hi, > >Please find below yet another updated proposal for i18n. >I think that it is simple, minimal, coherent, consistent, one-way-of-doing-things, >self-contained, and context-free and that it adds a lot of value >to the standards. > >As you already know there are two major design decisions. > >(1) Language code for every text field >(2) Direct reference to text > >Actually they come from the same design principal, that is, >to avoid dependence on object structure and to make it self-contained >in text itself. Being self-contained without dependence on object structure: > >- It survives revisions and other changes made to objects so that it > protects investment in quality translations. > >- Parser simple to implement (information is always available there, > you do not have to go through object structures or relations to find > the language code or resolve references.) > >- It can accommodate many use cases and increase standards' utility > as its self-containedness allows flexibility (for example, multiple > language in a single STIX file). > >As for (2), I could do away with text_id for text fields >by using the (hash of) text itself, but could not do away with (1). >But it is only 7 bytes extra and its utility is, from my point of >view, huge. > >Other minor design decisions > >- {"en": ...} is used instead of {"lang": "en", text_value: ...} both for > text fields and translations for efficiency and consistency. > >- I added necessary fields core provides for translations. > (Thanks, Jeffrey) > >- Changed Base64 encoding to Hexadecimal encoding for MD5 hash of the original text > >Regards, > >Ryu > >------------------------------------------------------------ >Internationalization - Another Updated Proposal - 20160426 >------------------------------------------------------------ > >- STIX/CybOX should be UTF-8 Encoded. > >- Always give the language code as the keyword for every text field. > Only one text in a single language code is allowed. > >- Always give "text_ref" and the language code as the keyword for every translation. > Use Hexadecimal-encoded MD5 hash of the original text in UTF-8 for the "text_ref" value to > refer the original text. > >- One can provide the translation for one of translated texts other than the original text. > > >----- >- Pattern A - Translation given inside the same original package >----- > >{ > "type": "package", > ... > "campaigns": [ > { > "type": "campaign", > "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31", > "revision": 1, > "spec_version": "stix-2.0", > "created_at": "2015-12-03T13:13Z", > "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde", > "title": {"en": "Dridex Campaign - Botnet 121"}, > "descriptions": {"en": "Dridex-based campaign leveraging Botnet 121"}, > "intended_effects": [ > {"value": "theft-identity-theft"} > ], > "status": "Ongoing" > } > ], > "translations": [ > {"id":"trans-1", > "type":"translation", > "created_at":"2016-04-19", > "created_by_refs":["CTI-Provider-1"], > "version":1, > "text_ref: "41cb32a0d74d5d07f5362b3e66f245c9", > "ja": "Dridex キャンペーン - ボットネット 121"}, > > {"id":"trans-2", > "type":"translation", > "created_at":"2016-04-19", > "created_by_refs":["CTI-Provider-1"], > "version":1, > "text_ref": "e8465d411f6580e8b67d778f25a78234", > "ja": "ボットネット 121 を活用する Dridex を元にしたキャンペーン"}, > > {"id":"trans-3", > "type":"translation", > "created_at":"2016-04-19", > "created_by_refs":["CTI-Provider-1"], > "version":1, > "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", > "de": "Some German Title"}, > > {"id":"trans-4", > "type":"translation", > "created_at":"2016-04-19", > "created_by_refs":["CTI-Provider-1"], > "version":1, > "text_ref": "e8465d411f6580e8b67d778f25a78234", > "de": "Some German Description"} > ] > ... >} > >----- >- Pattern B - Translation given by a third-party in some external database >----- > >{ > "translations": [ > {"id":"trans-A", > "type":"translation", > "created_at":"2016-04-19", > "created_by_refs":["Translator-1"], > "version":1, > "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", > "es": "Some Spanish Title"}, > > {"id":"trans-B", > "type":"translation", > "created_at":"2016-04-19", > "created_by_refs":["Translator-1"], > "version":1, > "text_ref": "e8465d411f6580e8b67d778f25a78234", > "es": "Some Spanish Description"}, > > {"id":"trans-C", > "type":"translation", > "created_at":"2016-04-19", > "created_by_refs":["Translator-1"], > "version":1, > "text_ref": "41cb32a0d74d5d07f5362b3e66f245c9", > "fr": "Some French Title"}, > > {"id":"trans-C", > "type":"translation", > "created_at":"2016-04-19", > "created_by_refs":["Translator-1"], > "version":1, > "text_ref": "e8465d411f6580e8b67d778f25a78234", > "fr": "Some French Description"} > ] >} > >----- >- Pattern C - A Japanese CTI Provider creates CTI with its title in English and description in Japanese >----- > >{ > "type": "package", > ... > "campaigns": [ > { > "type": "campaign", > "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31", > "revision": 1, > "spec_version": "stix-2.0", > "created_at": "2015-12-03T13:13Z", > "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde", > "title": {"en": "Dridex Campaign - Botnet 121"}, > "descriptions": {"ja": "ボットネット 121 を活用する Dridex を元にしたキャンペーン"}, > "intended_effects": [ > {"value": "theft-identity-theft"} > ], > "status": "Ongoing" > } > ], > ... >} > >------------------------------ >Notes - Simple, minimal, coherent, consistent, self-contained, context-free, future-proofed >------------------------------ > >- Only seven additional bytes (without white spaces) for each text field. > >- As it is refers to the text itself, it does not break if there is > revisions of the objects as long as the text stays the same. > >- As its scope is limited to text-fields and therefore it is self-contained: > > - It is very unlikely this impacts other parts of STIX and other standards. > > - There will be very little (if not "no") considerations necessary > for future standard developments/changes. > > - It would be easy to implement as the same and context-free codes can > handle any text field. > >- There is only one way to express text fields and translations > >- Resources spent for translation will not be wasted as long as the text stays same. > > - Even if someone else reuses the same text, its translations are still applicable. > >------------------------------ >Internationalization Use Cases >------------------------------ > >CN: Chinese >DE: German >EN: English >FR: French >JA: Japanese > >------------------------------ >(1) Providing text fields in multiple languages simultaneously at the time of creation. >------------------------------ > > [ja/en (in case of Japan), en/fr/de (in case of EU countries), etc.] > >This is the most likely use case (for me). The original CTI has titles/descriptions in >multiple languages from the start. When you create a CTI file, you include >both English and Japanese titles/descriptions for major objects in it >so that non-Japanese speaking people can at least find out what it is at the top level. > >Or another use case is the CTI provider in Japan writes a CTI file with its >title in English and description in Japanese. This is because many Japanese >can read short English titles, but many Japanese have difficulties to understand >long and detailed descriptions in English. > >------------------------------ >(2) CTI Database Receiving CTI from Multiple CTI Sources in Different Languages >------------------------------ > >This is a case where you receive CTI from a English CTI source and >another CTI source in Japanese. >You put all CTI into MongoDB or some other No-SQL Database and >would like to do mix and match. I would like the CTI Database still >can track the language code of textual fields. > >------------------------------ >(3) EN CTI received by a Japanese entity, which provides EN translation > (Or vice versa, JA CTI received by a US entity, which provides EN translation >------------------------------ > > A Japanese entity receives CTI information pieces in English. > The entity determines some of them are important/critical > and worth translating them into Japanese, add descriptions in Japanese > and redistribute them to other Japanese entities (if redistribution is allowed). > The CTIM (CTI Management System) of a receiving party displays > the Japanese description whenever possible, while allowing access to > the original English descriptions." > > Work Flow: > 1. Company 1 in EN creates an Indicator and TTP and shares them to Company 2 in JP. > It is important to note that the flow may be direct or may be through a series of brokers and other entities. > 1. This Indicator and TTP has a producer of Company 1 and a version of 1 > 2. Company 2 builds a translated version of the TTP and Indicator and releases it. > 1. This new Indicator and TTP has a producer of Company 2 and a version of 2. > 2. It is unrealistic to think that Company 2 can or will share the translated object back to Company 1 and that if Company 1 gets the translated object that they will do anything with it. Their legal departments will probably prohibit accepting 3rd party translations and then using them in their offerings. > >------------------------------ >(4) An English CTI report describing attacks against Japanese entities in EN >------------------------------ > > An English report on Cyber Attacks on Japan. > There are filenames of lure attachments in Japanese (original/real) and their > translations in English. Another similar report in English might have an email title along with > its translation in English next to it. That report also has a Windows pathname > in Chinese (not Japanese) found in a binary along with its translation in English. > > These Japanese texts can be found in descriptions, not just > > [Ex. Original File Name (JA): "医療費通知", Translated File Name (EN): "Medical expenses notice"] > > Note: This should probably be okay as long as the standards require use of UTF-8 for encoding. > >------------------------------ >(5) Email subject/body, supposed to be in JP, but includes CN characters (by mistake of the attackers) >------------------------------ > > This can happen due to Chinese/Japanese/Korean sharing Unicode characters > (CJK characters - https://en.wikipedia.org/wiki/CJK_characters.) > > This can be a very important clue as to the attackers. > > Note: This should probably be okay as long as the standards require use of UTF-8 for encoding. > >------------------------------ >(6) CTI translation service >------------------------------ > > A CTI translation service provider keeps translations to target languages of text fields > from publicly available and/or commercial/private CTI sources. > The service is available through some kind of online API. > Consumers of this translation service will use this service to translate text fields > in their CTI system through the API provided by the translation service provider. > >------------------------------ >(7) CTI provider >------------------------------ > > A CTI provider (in English) plans to penetrate the Japanese and other APAC markets > and needs a standard way to add translations of their text fields. > The CTI provider gives its customer a CTI package with all the translations in it > or a CTI package with translations to the languages of user's choosing. > >------------------------------------------------------------ > > >--------------------------------------------------------------------- >To unsubscribe from this mail list, you must leave the OASIS TC that >generates this mail. Follow this link to all your TCs in OASIS at: >https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]