OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [cti] i18n (RE: MVP Discussion) - An Updated Proposal


Hi, John, Bret,

 

-----

Use Case (1) Second Paragraph

-----

> Or another use case is the CTI provider in Japan writes a CTI file with its

> title in English and description in Japanese. This is because many Japanese

> can read short English titles, but many Japanese have difficulties to

> understand long and detailed descriptions in English.

 

In the above case, I am talking about is something like the following.

 

-----

{

"type": "package",

indicators": [

{

   "type": indicator",

   "id": "indicator--a1201df6-c352-4a81-9c7c-5a6f896a4e31",

   "revision": 1,

   "created_at": "2015-12-03T13:13Z",

   "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",

   "title": {en: "Dridex Campaign - Botnet 121"},

   description": {ja: "ボットネット 121 を活用する Dridex を元にしたキャンペーン"}

},

...

}

-----

 

It is often the case for academic papers that we (Japanese) provide its title and

abstract in English, but its main body in Japanese.

 

Title and Abstract of the paper -> Title in the CTI example above

Main body of the paper -> Description in the CTI example above

 

-----

 

Bret:

> It is also important that we have one way of doing things, that is after all one of our core design goals.

John:

> That seems to violate our one way of doing things design rule

 

I completely agree with this design goal and I took a great care to make sure there is only one

way to do it in my last updated proposal.  I gave the example of multiple-language mapping case

in my last email just to illustrate how complex language situation outside the US can be.

 

My recommendation is (as in my last updated proposal):

 

- Always give the language code as the keyword for every text field

  (So that anyone can give translations to the field later, knowing

  which language it is in.)

 

So that it is

 

   "title": {"en": "Dridex Campaign - Botnet 121"},

 

or

 

    title: {"ja": "Dridex キャンペーン - ボットネット 121"},

 

never like

 

  title: {"en": "Dridex-based campaign leveraging Botnet 121",

    "ja": "Dridex キャンペーン - ボットネット 121,

    fr: Some French Title},

 

even if it is difficult to tell which one came first.

If there are texts in multiple languages for a single text field,

always pick a text in one language for the text field and texts

in the other languages as its translations.

 

Regards,

 

Ryu

 

From: Jordan, Bret [mailto:bret.jordan@bluecoat.com]
Sent: Friday, April 22, 2016 11:47 PM
To: Wunder, John A.
Cc: Masuoka, Ryusuke/
益岡 竜介; Mates, Jeffrey CIV DC3/DCCI; cti@lists.oasis-open.org
Subject: Re: [cti] i18n (RE: MVP Discussion) - An Updated Proposal

 

Oops, let me fix that JSON type'o..

 

{
 "type": "package",
 “indicators": [
 {
   "type": “indicator",
   "id": "indicator--a1201df6-c352-4a81-9c7c-5a6f896a4e31",

   "revision": 1,
   "created_at": "2015-12-03T13:13Z",
   "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
   "lang”:"ja",

   "title": "Dridex キャンペーン - ボットネット 121",
   “description": "
「これは偽のメッセージであるが、それは怖いかもしれません」"
 },
 “translations”: [
   {
     "type":"translation",

     "id":"trans--a1201df6-c352-4a81-9c7c-5a6f896afffffff",

     "created_at":"2016-04-19",
     "created_by_ref”: “identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
     "revision":1,

     "text_hash: "c2ef404a88425119ab8b9528627398d0",

     "data": [

       "en-us": "Text in English",

       "de": "Text in German"

     ]

   }
 ]
]

 

Thanks,

 

Bret

 

 

 

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

 

On Apr 22, 2016, at 08:35, Jordan, Bret <bret.jordan@BLUECOAT.COM> wrote:

 

Yes, the design that Wunder has is pretty close to the one I would propose.  It is also important that we have one way of doing things, that is after all one of our core design goals.  And this does not prevent someone from doing multiple translations at the same time.  Remember the product / tool / UI will hide all of this from the user...  Yes, we should use either ISO639 or RFC5646.

 

 

{
 "type": "package",
 “indicators": [
 {
   "type": “indicator",
   "id": "indicator--a1201df6-c352-4a81-9c7c-5a6f896a4e31",

   "revision": 1,
   "created_at": "2015-12-03T13:13Z",
   "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
   "lang”:"ja",

   "title": "Dridex キャンペーン - ボットネット 121",
   “description": "
「これは偽のメッセージであるが、それは怖いかもしれません」"
 },
 “translations”: [
   {
     "type":"translation",

     "id":"trans--a1201df6-c352-4a81-9c7c-5a6f896afffffff",

     "created_at":"2016-04-19",
     "created_by_ref”: “identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
     "revision":1,

     "text_hash: "c2ef404a88425119ab8b9528627398d0",

     "data": [

       "en-us": "Text in English"

       "de", "text": "Text in German"

     ]

   }
 ]
]

 

Thanks,

 

Bret

 

 

 

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

 

On Apr 22, 2016, at 06:25, Wunder, John A. <jwunder@mitre.org> wrote:

 

Hi Ryu/all,

Wouldn’t you be able to meet that requirement by just issuing the original object with a single language and using separate translation objects for the other one(s)? Something like this:

{
 "type": "package",
 “indicators": [
 {
   "type": “indicator",
   "id": "indicator--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
   "revision": 1,
   "spec_version": "2.0",
   "created_at": "2015-12-03T13:13Z",
   "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
   "title": "Dridex
キャンペーン - ボットネット 121",
   “description": "
「これは偽のメッセージであるが、それは怖いかもしれません」",
   "lang”:"jp"
 },
 “translations”: [
   {
     "id":"trans-1",
     "type":"translation",
     "spec_version": "2.0",

     "created_at":"2016-04-19",
     "created_by_ref”: “identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
     "revision":1,
     "text_ref: "c2ef404a88425119ab8b9528627398d0",
     "lang": “en",
     "text": " We detected a potentially malicious message with the text
     'this is a fake message but it might look scary' sent from your IP."
   },
   {
     "id":"trans-1",
     "type":"translation",
     "created_at":"2016-04-19",
     "created_by_ref”: “ identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
     "version":1,
     "text_ref: "c2ef404a88425119ab8b9528627398d0",
     "lang": “en",
     "text": "This is a quick and dirty shell not an actual object type”

   }
 ]
 ]


The concern I have with the approach you proposed is that it has two ways of doing translations: either via different languages in the language map, or via the separate translation object. That seems to violate our “one way of doing things” design rule…even if it’s not perfect for every use case, it would seem to me to be better to pick one way of doing it and do that in all cases rather than have two different ways. Scenarios much like this one were how we ended up with so many different ways of representing indicator composition in STIX 1.x, so we have to be careful now to not repeat the same mistakes.

Do you think it would be an acceptable compromise (understanding it’s not always the best solution) to accept just always doing translations via the translation object? That approach seems to be the more powerful one.

John




On 4/22/16, 7:04 AM, "cti@lists.oasis-open.org on behalf of Masuoka, Ryusuke" <cti@lists.oasis-open.org on behalf of masuoka.ryusuke@jp.fujitsu.com> wrote:


Hi,

Having the language code at the top level of object does not meet
the requirement of the second paragraph of the Use Case,
"(1) Providing text fields in multiple languages simultaneously at the time of creation."


Or another use case is the CTI provider in Japan writes a CTI file with its
title in English and description in Japanese. This is because many Japanese
can read short English titles, but many Japanese have difficulties to
understand long and detailed descriptions in English.


Actually I realized that there is a neighbor to the north of the US where
there are two official languages. In the country and many other countries,
texts in multiple languages may be produced at the time of CTI creation
and may not be able to tell which is the translation of which.
In that case, we can use multiple language mapping like

"title": {"en": "Dridex-based campaign leveraging Botnet 121",
  "ja": "Dridex
キャンペーン - ボットネット 121”

I believe it should be also much easier to implement parsing codes if the language
code is always available locally along with the text.
It may be just a matter of keeping the language code
when you parse an object from the top (assuming the language code is given before
any texts in the object). However, if you refer to some text in an object from outside
using a Relation, you need to go through the object structure to determine
the language code.


I don't know if that's a deal breaker,


It may be politics and deals at the end, but I really would not make it a "deal."
I would like it to be based on rational arguments based on requirements.
In order to make it so, I have tried to come up with reasonable use cases (including
those provided by others), given serious thoughts on this issue, and even
tried really hard to make the change as minimal as possible.
I just wish STIX and/or CybOX a truly international standards.
Toward that goal, as far as I can see now, dropping (1) language code for every text field,
and (2) use of direct references to texts, would be, I am afraid, a "deal breaker" for
future-proofed, truly international STIX and/or CybOX.

Regards,

Ryu

-----Original Message-----
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Mates, Jeffrey CIV DC3/DCCI
Sent: Thursday, April 21, 2016 10:48 PM
To: Masuoka, Ryusuke/
益岡 竜介; cti@lists.oasis-open.org
Subject: [cti] RE: i18n (RE: MVP Discussion) - An Updated Proposal

I definitely appreciate the importance of being able to extract language
information, and as a monolinguist I can't appreciate all of the nuances of
this field.  That said, what if instead of each text block having a language
key if the entire object had supported an array of language keys.

You would lose the ability to immediately tell if a single field was
primarily written in one language, but you could tell all of the languages
used in the object.  I don't know if that's a deal breaker, but you would
end up with something like:

{
"type": "package",
...
"random-type": [
  {
    "type": "sample-type",
    "id": "sample--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
   "revision": 1,
    "spec_version": "stix-2.0",
    "created_at": "2015-12-03T13:13Z",
    "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
    "title": "This is a quick and dirty shell not an actual object type",
    "log_message_body": "We detected a potentially malicious message with
the text
「これは偽のメッセージであるが、それは怖いかもしれません」sent from
your IP.",
    "langs":["en","ja"]
 }
]

Jeffrey Mates, Civ DC3/DCCI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computer Scientist
Defense Cyber Crime Institute
jeffrey.mates@dc3.mil
410-694-4335


-----Original Message-----
From: Masuoka, Ryusuke [mailto:masuoka.ryusuke@jp.fujitsu.com]
Sent: Thursday, April 21, 2016 2:56 AM
To: Mates, Jeffrey CIV DC3/DCCI; cti@lists.oasis-open.org
Subject: [Non-DoD Source] RE: i18n (RE: MVP Discussion) - An Updated
Proposal

Hi, Jeffrey, all,

If you ask me, I think it is not a good idea to drop language code.
It may look superfluous/wasteful from the point of view of the English-only
world, it is a key or critical clue to many things.

One of the things is that it is not immediately apparent which language is
used from just looking at the characters due to CJK unification of Unicode.

- CJK characters - https://en.wikipedia.org/wiki/CJK_characters
- Some of interesting examples are found at  What words have the same kanji
in China and Japan but different meanings?
- http://www.sljfaq.org/afaq/cj-false-friends.html

Of course, human being can determine which language is used for a long
enough sentence, but it is not necessarily the case for automatic
translation.

It may be apparent to the creator of CTI which language is used, but for
translation providers to give translations later (Use Cases (3), (6)), it
would be a great help to know which language is used.

There are cases where multiple languages are used in a single package (Use
Case (1) Second Paragraph). Again it is an important clue which language is
used.

-----
As for examples of mixed languages like,

"log_message_body": "We detected a potentially malicious message with
the text
「これは偽のメッセージであるが、それは怖いかもしれません」sent
from your IP.",

It is similar to the Use Case (5) "Email subject/body, supposed to be in JP,
but includes CN characters (by mistake of the attackers)".

If you are serious about dealing with such cases, there is really a simple
and straight-forward way to deal with it.

"log_message_body":
  [{"en": "We detected a potentially malicious message with the text"},
 {"ja": "
「これは偽のメッセージであるが、それは怖いかもしれません」"},
 {"en": "sent from your IP."}]

Actually in this case, we can have Japanese translations for two English
sentences

"We detected a potentially malicious message with the text"
"sent from your IP."

only once for potentially many different middle Japanese sentences, making
provision of translations much easier.
-----

Overall, dropping the language code from the text field, I am afraid, make
STIX and other standard NOT future-proofed.

Regards,

Ryu

-----Original Message-----
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf
Of Mates, Jeffrey CIV DC3/DCCI
Sent: Wednesday, April 20, 2016 9:29 PM
To: Masuoka, Ryusuke/
益岡 竜介; cti@lists.oasis-open.org
Subject: [cti] RE: i18n (RE: MVP Discussion) - An Updated Proposal

Ryu,

One of the concerns raised in the Face to Face is that language code might
not be necessary for text entries given the verbosity of the translation
object.  This would also avoid some issues that might appear when dealing
with mixed language text blocks where one of the two languages would need to
be arbitrarily chosen as the primary one.  Would something like this be
acceptable?

{
"type": "package",
...
"random-type": [
  {
    "type": "sample-type",
    "id": "sample--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
   "revision": 1,
    "spec_version": "stix-2.0",
    "created_at": "2015-12-03T13:13Z",
    "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
    "title": "This is a quick and dirty shell not an actual object type",
    "log_message_body": "We detected a potentially malicious message with
the text
「これは偽のメッセージであるが、それは怖いかもしれません」sent from
your IP.",
 }
],
"translations": [
  {"id":"trans-1",
    "type":"translation",
    "created_at":"2016-04-19",
    "created_by_refs":["translator-1"],
    "version":1,
    "text_ref: "c2ef404a88425119ab8b9528627398d0",
    "lang": "en",
    "text": " We detected a potentially malicious message with the text
'this is a fake message but it might look scary' sent from your IP."
  }
}

Jeffrey Mates, Civ DC3/DCCI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computer Scientist
Defense Cyber Crime Institute
jeffrey.mates@dc3.mil
410-694-4335


-----Original Message-----
From: Masuoka, Ryusuke [mailto:masuoka.ryusuke@jp.fujitsu.com]
Sent: Wednesday, April 20, 2016 6:33 AM
To: Mates, Jeffrey CIV DC3/DCCI; cti@lists.oasis-open.org
Subject: [Non-DoD Source] RE: i18n (RE: MVP Discussion) - An Updated
Proposal

Hi, Jeffrey,

Thank you very much for your valuable insights and input.


Tweaking your earlier example:


I am quite good with this tweaked example, but I think we need to learn
others' comments and thoughts on this.
Please find my comments on your inputs below.


I think the fact that this lets a translation be used across any
number of objects with the same text field value is a really strong
point,


I agree.


but I am slightly concerned about the additional backend requirements
this would add for anyone who wanted to use translations as no id
based

linking exist.

This isn't a huge issue, but it would mean that anyone using
translations would need to keep one or two additional indices.


I do not think it is always the case. In the use case (6), the consumer of
the CTI translation service can calculate the MD5 hash of the original text
on-the-fly to obtain its translation from the service.
In general, we can make the linking on-the-fly/as-needed-basis for the most
of the cases.


I'm also a bit uncertain about having the text field in the
translation object change depending on the language.  I think it might
make more sense to have two fields one for the text and a second for
the language.  That way any code reading a translation can always
point to the same target instead of performing a resorting to a loop.


I agree that is a better practice. In my previous example, I took the
efficiency of description (only additional seven bytes) and consistency
between text and translation field, but I am open to "lang" keyword, too.

I believe that it would meet all the requirements that I see as long as it
is based on references to text instead of references using object structure.
I think "lang", ID, created_at, created_by_refs and revision fields at a
minimum along with whatever else core provides for translations would not
break it.
Common hexadecimal encoding of the MD5 for the text_ref instead of the
base64 would not break it neither (I picked Base64 over hexadecimal due its
length.).

Regards,

Ryu

-----Original Message-----
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf
Of Mates, Jeffrey CIV DC3/DCCI
Sent: Tuesday, April 19, 2016 8:25 PM
To: Masuoka, Ryusuke/
益岡 竜介; cti@lists.oasis-open.org
Subject: [cti] RE: i18n (RE: MVP Discussion) - An Updated Proposal

I think the fact that this lets a translation be used across any number of
objects with the same text field value is a really strong point, but I am
slightly concerned about the additional backend requirements this would add
for anyone who wanted to use translations as no id based linking exist.
This isn't a huge issue, but it would mean that anyone using translations
would need to keep one or two additional indices.

I'm also a bit uncertain about having the text field in the translation
object change depending on the language.  I think it might make more sense
to have two fields one for the text and a second for the language.  That way
any code reading a translation can always point to the same target instead
of performing a resorting to a loop.

A minor nitpick is that translations will still need an ID, created_at,
created_by_refs and revision fields at a minimum along with whatever else
core provides.  Otherwise there would be no good way to differentiate when a
translation service updated the translation for an string or when two
different providers translated the same text differently.

They might have excluded them from your example to save space, but I wanted
to throw that out there just in case.

Finally would it make more sense to use the common 32 character version of
the MD5 for the text_ref instead of the base64?  The base64 saves space, but
every tool that supports base64ing an MD5 also supports printing it as a
string, while I have run across a number of tools that don't support a
base64 output.

Tweaking your earlier example:

{
"type": "package",
...
"campaigns": [
  {
    "type": "campaign",
    "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
   "revision": 1,
    "spec_version": "stix-2.0",
    "created_at": "2015-12-03T13:13Z",
    "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
    "title": {"en": "Dridex Campaign - Botnet 121"},
    "descriptions": {"en": "Dridex-based campaign leveraging Botnet 121"},

    "intended_effects": [
      {"value": "theft-identity-theft"}
    ],
    "status": "Ongoing"
 }
],
"translations": [
  {"id":"trans-1",
    "type":"translation",
    "created_at":"2016-04-19",
    "created_by_refs":["translator-1"],
    "version":1,
    "text_ref: "41cb32a0d74d5d07f5362b3e66f245c9",
    "lang": "ja",
    "text": "Dridex
キャンペーン - ボットネット 121"
  },
  {"id":"trans-2",
    "type":"translation",
    "created_at":"2016-04-19",
    "created_by_refs":["bad-translator"],
    "version":1,
    "text_ref: "41cb32a0d74d5d07f5362b3e66f245c9",
    "lang": "ja",
    "text":"Dridex
キャンペーン - ボットネット101"
  },
{"id":"trans-3",
    "type":"translation",
    "created_at":"2016-04-19",
    "created_by_refs":["translator-1"],
    "version":1,
    "text_ref: "e8465d411f6580e8b67d778f25a78234",
    "lang": "ja",
    "text":: "
ボットネット 121 を活用する Dridex を元にしたキャンペーン"
  },
  {"id":"trans-4",
    "type":"translation",
    "created_at":"2016-04-19",
    "created_by_refs":["bad-translator"],
    "version":1,
    "text_ref: "e8465d411f6580e8b67d778f25a78234",
    "lang": "ja",
    "text": "
ボットネット 101 を活用する Dridex を元にしたキャンペーン"
  },
  {"id":"trans-5",
    "type":"translation",
    "created_at":"2016-04-19",
    "created_by_refs":["google-translate"],
    "version":1,
    "text_ref: "41cb32a0d74d5d07f5362b3e66f245c9",
    "lang": "de",
    "text": "Dridex -basierte Kampagne nutzt Botnet 121"
  }
]
...
}

Jeffrey Mates, Civ DC3/DCCI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computer Scientist
Defense Cyber Crime Institute
jeffrey.mates@dc3.mil
410-694-4335

-----Original Message-----
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf
Of Masuoka, Ryusuke
Sent: Monday, April 18, 2016 10:09 PM
To: cti@lists.oasis-open.org
Subject: [Non-DoD Source] [cti] i18n (RE: MVP Discussion) - An Updated
Proposal

Hi, All,

Thank you very much for all the discussions.
I have come up with an updated proposal, which I believe is much more
acceptable for many, while meeting most of the requirements I have seen.
Please let me start a new thread on this issue with the details on the
updated proposal at the end of this message.

The difference from the last one is...
I realized that the text itself can serve as its text_id and that we can do
away with text_id. (This realization came to me during my sleep last night
around 3:30 am JST.) If we use Base64-encoded MD5 hash for the string in
UTF-8, it would be like:

-----
{
...
title: {"en": "Dridex Campaign - Botnet 121"},
 ...
translations: {
  {"text_ref": "QcsyoNdNXQf1Nis+ZvJFyQ==",
   "ja": "Dridex
キャンペーン - ボットネット 121"},
  ...
}

It is only additional 7 bytes (without white spaces) for each text field if
there is no translation. It is self-contained so it is very unlikely this
impacts other parts of STIX and other standards.
Any hashing algorithm works as long as it is reasonably collision-free and
we make the hashing algorithm a part of standard so everyone know which it
is referring.
(By the way, I used https://quickhash.com/ to calculate the hash.)
------

Regards,

Ryu

------------------------------------------------------------
Internationalization - An Updated Proposal
------------------------------------------------------------

- Always give the language code as the keyword for every text field
(So that anyone can give translations to the field later, knowing
which language it is in.)

- Always give "text_ref" and the language code as the keyword for every
translation.
Use Base64-encoded MD5 hash of the original text in UTF-8 for the
"text_ref" value to
refer the original text.

- One can provide the translation for one of translated texts other than the
original text.
(Example: A CTI text field created in Japanese, then it is given an
English translation.
 Then German and French translations are produced based on the English
translation.)

-----
- Pattern A - Translation given inside the same original package
-----

{
"type": "package",
...
"campaigns": [
  {
    "type": "campaign",
    "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
   "revision": 1,
    "spec_version": "stix-2.0",
    "created_at": "2015-12-03T13:13Z",
    "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
    "title": {"en": "Dridex Campaign - Botnet 121"},
    "descriptions": {"en": "Dridex-based campaign leveraging Botnet 121"},

    "intended_effects": [
      {"value": "theft-identity-theft"}
    ],
    "status": "Ongoing"
 }
],
"translations": [
  {"text_ref: "QcsyoNdNXQf1Nis+ZvJFyQ==",
   "ja": "Dridex
キャンペーン - ボットネット 121"},
  {"text_ref": "6EZdQR9lgOi2fXePJaeCNA==",
   "ja": "
ボットネット 121 を活用する Dridex を元にしたキャンペーン"},
  {"text_ref": "QcsyoNdNXQf1Nis+ZvJFyQ==",
   "de": "Some German Title"},
  {"text_ref": "6EZdQR9lgOi2fXePJaeCNA==",
   "de": "Some German Description"}
]
...
}

-----
- Pattern B - Translation given by a third-party in some external database
-----

{
"translations": [
  {"text_ref": "QcsyoNdNXQf1Nis+ZvJFyQ==",
   "es": "Some Spanish Title"},
  {"text_ref": "6EZdQR9lgOi2fXePJaeCNA==",
   "es": "Some Spanish Description"},
  {"text_ref": "QcsyoNdNXQf1Nis+ZvJFyQ==",
   "fr": "Some French Title"},
  {"text_ref": "6EZdQR9lgOi2fXePJaeCNA==",
   "fr": "Some French Description"}
]
}
-----

------------------------------
Notes - Simple, minimal, coherent, consistent, self-contained, context-free,
future-proofed
------------------------------

- Only seven additional bytes (without white spaces) for each text field.

- As it is refers to the text itself, it does not break if there is
revisions of the objects as long as the text stays the same.

- As its scope is limited to text-fields and therefore it is self-contained:

- It is very unlikely this impacts other parts of STIX and other
standards.

- There will be very little (if not "no") considerations necessary
  for future standard developments/changes.

- It would be easy to implement as the same and context-free codes can
  handle any text field.

- There is only one way to express text fields and translations

- Resources spent for translation will not be wasted as long as the text
stays same.

- Even if someone else reuses the same text, its translations are still
applicable.

------------------------------
Internationalization Use Cases
------------------------------

CN: Chinese
DE: German
EN: English
FR: French
JA: Japanese

------------------------------
(1) Providing text fields in multiple languages simultaneously at the time
of creation.
------------------------------

[ja/en (in case of Japan), en/fr/de (in case of EU countries), etc.]

This is the most likely use case (for me). The original CTI has
titles/descriptions in multiple languages from the start. When you create a
CTI file, you include both English and Japanese titles/descriptions for
major objects in it so that non-Japanese speaking people can at least find
out what it is at the top level.

Or another use case is the CTI provider in Japan writes a CTI file with its
title in English and description in Japanese. This is because many Japanese
can read short English titles, but many Japanese have difficulties to
understand long and detailed descriptions in English.

------------------------------
(2) CTI Database Receiving CTI from Multiple CTI Sources in Different
Languages
------------------------------

This is a case where you receive CTI from a English CTI source and another
CTI source in Japanese.
You put all CTI into MongoDB or some other No-SQL Database and would like to
do mix and match. I would like the CTI Database still can track the language
code of textual fields.

------------------------------
(3) EN CTI received by a Japanese entity, which provides EN translation
(Or vice versa, JA CTI received by a US entity, which provides EN
translation
------------------------------

A Japanese entity receives CTI information pieces in English.
The entity determines some of them are important/critical
and worth translating them into Japanese, add descriptions in Japanese
and redistribute them to other Japanese entities (if redistribution is
allowed).
The CTIM (CTI Management System) of a receiving party displays
the Japanese description whenever possible, while allowing access to
the original English descriptions."

Work Flow:
1. Company 1 in EN creates an Indicator and TTP and shares them to Company
2 in JP.
  It is important to note that the flow may be direct or may be through a
series of brokers and other entities.
  1. This Indicator and TTP has a producer of Company 1 and a version of 1
2. Company 2 builds a translated version of the TTP and Indicator and
releases it.
  1. This new Indicator and TTP has a producer of Company 2 and a version
of 2.
  2. It is unrealistic to think that Company 2 can or will share the
translated object back to Company 1 and that if Company 1 gets the
translated object that they will do anything with it.  Their legal
departments will probably prohibit accepting 3rd party translations and then
using them in their offerings.

------------------------------
(4) An English CTI report describing attacks against Japanese entities in EN
------------------------------

An English report on Cyber Attacks on Japan.
There are filenames of lure attachments in Japanese (original/real) and
their
translations in English.  Another similar report in English might have an
email title along with
its translation in English next to it. That report also has a Windows
pathname
in Chinese (not Japanese) found in a binary along with its translation in
English.

These Japanese texts can be found in descriptions, not just

[Ex. Original File Name (JA): "
医療費通知", Translated File Name (EN):
"Medical expenses notice"]

Note: This should probably be okay as long as the standards require use of
UTF-8 for encoding.

------------------------------
(5) Email subject/body, supposed to be in JP, but includes CN characters (by
mistake of the attackers)
------------------------------

This can happen due to Chinese/Japanese/Korean sharing Unicode characters
(CJK characters - https://en.wikipedia.org/wiki/CJK_characters.)

This can be a very important clue as to the attackers.

Note: This should probably be okay as long as the standards require use of
UTF-8 for encoding.

------------------------------
(6) CTI translation service
------------------------------

A CTI translation service provider keeps translations to target languages
of text fields
from publicly available and/or commercial/private CTI sources.
The service is available through some kind of online API.
Consumers of this translation service will use this service to translate
text fields
in their CTI system through the API provided by the translation service
provider.

------------------------------
(7) CTI provider
------------------------------

A CTI provider (in English) plans to penetrate the Japanese and other APAC
markets
and needs a standard way to add translations of their text fields.
The CTI provider gives its customer a CTI package with all the
translations in it
or a CTI package with translations to the languages of user's choosing.

------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

 

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]