cti message

Subject: RE: [cti] i18n (RE: MVP Discussion) - An Updated Proposal

From: "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>
To: John-Mark Gurney <jmg@newcontext.com>
Date: Wed, 27 Apr 2016 11:53:27 +0000

Hi, John-Mark,

Please note that my proposal uses no lang TLO and translations use 
direct references to the texts. (Please check my latest one, 
"i18n (RE: MVP Discussion) - Yet Another Updated Proposal".) 
It is also proposed each translation given separately. 
That would make your example something like the following.

-----
{ "type": "obj",
  "id": "obj--1",
  "title": { "en": "en title" },
  "description": {"ja": "ja desc" }
}

{ "type": "translation",
  "text_ref": "f0b6ca37c6f20f19bb9cffe11b1dbfd0", 
  "ja": "ja title"
}

{ "type": "translation",
  "text_ref": "17bac829a4c793566d3a581cf1e28ede", 
  "en": "en desc"
}
-----

The database would look like

-----
obj--1 -> title -> en -> "en title"
obj--1 -> description -> ja -> "ja description"

translation -> "f0b6ca37c6f20f19bb9cffe11b1dbfd0" -> ja -> "ja title"
translation -> "17bac829a4c793566d3a581cf1e28ede" -> en -> "en desc"
-----

One possible display algorithm to show the text of user's preference would be

-----
In case of user_preferred_lang = ja/object " obj--1 -> title"

- Find the language code for "obj--1 -> title" 
  -> "en"
- Because it is not "ja", 
  Calculate the MD5 hash value of the text for "obj--1 -> title" 
  -> "f0b6ca37c6f20f19bb9cffe11b1dbfd0"
- Find the translation objects with "f0b6ca37c6f20f19bb9cffe11b1dbfd0",
  find the one with "ja" and pick the text if there is. 
  -> Yes, there is one with the hash value with "ja" and the text is "ja title"
-----
In case of user_preferred_lang = ja/object " obj--1 -> description"

- Find the language code for "obj--1 -> description" 
  -> "ja"

- Simply use the text, "ja description"
-----
In case of user_preferred_lang = fr/object " obj--1 -> title"

- Find the language code for "obj--1 -> title" 
  -> "en"
- Because it is not "fr", 
  Calculate the MD5 hash value of the text for "obj--1 -> title" 
  -> "f0b6ca37c6f20f19bb9cffe11b1dbfd0"
- Find the translation objects with "f0b6ca37c6f20f19bb9cffe11b1dbfd0",
  find the one with "fr" and pick the text if there is. 
  -> No, there is one with the hash value with "fr".

- Display the original text, "en title" with indication that it is in English

-----

I think a decently sane programmer can implement this. 

Regards,

Ryu

Note: 
- jp is the country code for Japan and the language code for Japanese is ja.

-----Original Message-----
From: John-Mark Gurney [mailto:jmg@newcontext.com] 
Sent: Wednesday, April 27, 2016 4:17 AM
To: Masuoka, Ryusuke/益岡 竜介
Cc: cti@lists.oasis-open.org
Subject: Re: [cti] i18n (RE: MVP Discussion) - An Updated Proposal

Masuoka, Ryusuke wrote this message on Tue, Apr 26, 2016 at 04:59 +0000:
> > I would say that the lang id is not the language of the text, but who it is intended for..  
> > If the "Japanese" translation happens to be in English, then that is 
> > fine to label it was jp if that is the way they want it presented...
> 
> > So, your example would have jp as the language for the title too..
> 
> I am not exactly sure what you mean, but "en" means the title is in 
> English language, not its intended audience/consumers.
> Title (as a short text) in English would be understood by many 
> Japanese and English speaking people. (That is the reason why title 
> can be English even though it is produced by a Japanese CTI producer 
> in this use case.)

I'm thinking about how implementers are going to work w/ this.

It's all fine and dandy to say each field has it's own language, but the question is, if you get a jp TLO, w/ a field in en, that doesn't have an jp field, what are you suppose to do?  What happens if en isn't in their list of desired/prefered languages?  Do they just not display the en field?

Then you get into database management.  It'd be much easier to simply index your database by the tuple of id/lang w/ a field to denote if this row is a translation [object] or the original TLO.
When you query the db for what to display, you query by id, w/ an order by lang and take the first one...

Then what happens when you encounter the following object (btw, I'm not proposing this method, just using it as an example):
{ "type": "obj",
  "id": "obj--1",
  "title": { "en": "en title" },
  "description": {"jp": "jp desc" }
}

{ "type": "translation",
  "of_ref": "obj--1",
  "title": { "jp": "jp title" },
  "description": {"en": "en desc" }
}

Do you really think it's sane to always present a mix between the two objects?  The above looks crazy to me, but w/ the above would be legal, and have to be delt w/ by products...

IMO, if we don't choose a sane balance between implementable, and i18n, we'll have a problem where few products decide that i18n is worth to issues to implement it unless it's a strong requirement..

> -----Original Message-----
> From: John-Mark Gurney [mailto:jmg@newcontext.com]
> Sent: Tuesday, April 26, 2016 9:17 AM
> To: Masuoka, Ryusuke/益岡 竜介
> Cc: Jordan, Bret; Wunder, John A.; Mates, Jeffrey CIV DC3/DCCI; 
> cti@lists.oasis-open.org
> Subject: Re: [cti] i18n (RE: MVP Discussion) - An Updated Proposal
> 
> Masuoka, Ryusuke wrote this message on Mon, Apr 25, 2016 at 10:21 +0000:
> > Hi, John, Bret,
> > 
> > -----
> > Use Case (1) Second Paragraph
> > -----
> > > Or another use case is the CTI provider in Japan writes a CTI file 
> > > with its title in English and description in Japanese. This is 
> > > because many Japanese can read short English titles, but many 
> > > Japanese have difficulties to understand long and detailed descriptions in English.
> > 
> > In the above case, I am talking about is something like the following.
> > 
> > -----
> > {
> > "type": "package",
> > “indicators": [
> > {
> >    "type": “indicator",
> >    "id": "indicator--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
> >    "revision": 1,
> >    "created_at": "2015-12-03T13:13Z",
> >    "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
> >    "title": {“en”: "Dridex Campaign - Botnet 121"},
> >    “description": {“ja”: "ボットネット 121 を活用する Dridex を元にしたキャンペーン"}
> > },
> > ...
> > }
> > -----
> > 
> > It is often the case for academic papers that we (Japanese) provide 
> > its title and abstract in English, but its main body in Japanese.
> 
> I would say that the lang id is not the language of the text, but who it is intended for..  If the "Japanese" translation happens to be in English, then that is fine to label it was jp if that is the way they want it presented...
> 
> So, your example would have jp as the language for the title too..
> 
> There wouldn't be anything that would prevent someone creating a new translation object that completes the jp translation of the title...
> 
> I do like the proposal of being able to override the lang on a per field basis, but IMO, unneeded, per above.
> 
> Another way to state it is that the lang id field is who to present the text of this object to...  If you speak jp, then present these fields to them, etc.  It seems odd/crazy that we should allow an object to be created where different parts of the object are not present for which we need to present it to the user of that language...
> 
> In your example, is there a reason the program needs to know that the en title is in English?  It clearly doesn't need the info for rendering it, as that is already encoded via the Unicode characters..

--
John-Mark

References:
- Re: [cti] RE: i18n (RE: MVP Discussion) - An Updated Proposal
  - From: "Wunder, John A." <jwunder@mitre.org>
- Re: [cti] i18n (RE: MVP Discussion) - An Updated Proposal
  - From: "Jordan, Bret" <bret.jordan@bluecoat.com>
- Re: [cti] i18n (RE: MVP Discussion) - An Updated Proposal
  - From: "Jordan, Bret" <bret.jordan@bluecoat.com>
- RE: [cti] i18n (RE: MVP Discussion) - An Updated Proposal
  - From: "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>
- Re: [cti] i18n (RE: MVP Discussion) - An Updated Proposal
  - From: John-Mark Gurney <jmg@newcontext.com>
- RE: [cti] i18n (RE: MVP Discussion) - An Updated Proposal
  - From: "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>
- Re: [cti] i18n (RE: MVP Discussion) - An Updated Proposal
  - From: John-Mark Gurney <jmg@newcontext.com>