OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti] i18n (RE: MVP Discussion)


You assume that people will reuse know. IDs for text.  Going down the path you said, we should just make the title and descriptions controlled vocabularies.


Bret 

Sent from my Commodore 64

On Apr 18, 2016, at 6:24 AM, Jason Keirstead <Jason.Keirstead@ca.ibm.com> wrote:

The text ID is not referring to the field. It is referring to a piece of text that has been translated.

Example - say I publish an observable with a title "Dridex". That is a pretty common name and is likely to be used all over the place, with many instances of observables and sightings. You don't want to have to re-translate that over and over and over for each one. You want one translation, and for that to be able to be used everywhere that text occurs.

This is the exact same way that translation works for most all programming languages. You translate a piece of text, then forevermore you can refer to that text by an ID, in many different contexts.

-
Jason Keirstead
STSM, Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown


<graycol.gif>Mark Davidson ---04/18/2016 08:19:48 AM---I’m trying to catch up to the thread – please accept my apologies if my thoughts/questions are under

From: Mark Davidson <mdavidson@soltra.com>
To: "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>, John-Mark Gurney <jmg@newcontext.com>
Cc: Jason Keirstead/CanEast/IBM@IBMCA, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Date: 04/18/2016 08:19 AM
Subject: Re: [cti] i18n (RE: MVP Discussion)
Sent by: <cti@lists.oasis-open.org>





I’m trying to catch up to the thread – please accept my apologies if my thoughts/questions are under-informed.

How does the text_id help? The field is already uniquely identified by it’s location in the representation (I.e., indicator=2222.title). If I wanted to add a title, I’d insert a new key into the Title object/dictionary. Updates would be handled the same way.

To me, a reasonably minimal way of going about language on each field would be a structure like this:

“field-name”: {“lang-id”: “text”, “lang-id2”: “texto”}

However, in my mind, having the lang as a root attribute (and nowhere else) seems to be the simplest solution so I’d like to prod a little in that direction.

I’d like to ask about:
> For example, the titles are given in EN, but the descriptions are given in JP for CTI from a Japanese CTI provider.

Why is this the case? If it’s that the english title is suitable for the Japanese translation, then I’d say “lang=jp” is suitable for the whole object, even though not all text fields necessarily have a “lang=jp” entry.

I guess I’d be OK with a top level lang field AND a dictionary of “language-id”:”text” for each text field. It feels like it’s roughly the simplest mechanism for achieving i18n. I think that would handle revisions and updates fairly seamlessly also – a translation provider could send back an update to the content owner with the new languages for particular field(s).

Thank you.
-Mark

From: <cti@lists.oasis-open.org> on behalf of "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>
Date:
Monday, April 18, 2016 at 5:34 AM
To:
"Jordan, Bret" <bret.jordan@bluecoat.com>, John-Mark Gurney <jmg@newcontext.com>
Cc:
Jason Keirstead <Jason.Keirstead@ca.ibm.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject:
RE: [cti] i18n (RE: MVP Discussion)

Hi, Bret, all,

I know I am loud, but ...

By putting everything (text_id, language code) into the text itself like

"title": {"text_id": "text-a1b2c3",
en: "Dridex Campaign - Botnet 121"},

It does not affect other parts of the STIX and it does not get
affected changes in other parts of the STIX (like versioning
and object structure or how deep the object is).

As such, a single and simple code to produce and parse
the text will do.

Regards,

Ryu

From: Jordan, Bret [mailto:bret.jordan@bluecoat.com]
Sent:
Saturday, April 16, 2016 8:23 AM
To:
John-Mark Gurney
Cc:
Masuoka, Ryusuke/益岡竜介; Jason Keirstead; cti@lists.oasis-open.org
Subject:
Re: [cti] i18n (RE: MVP Discussion)

What I would like to know, yes I am partial to this design, is how this design will NOT work for people. With some workflow analysis, and keeping with the identifier and versioning designs in STIX, I believe this, as represented by John-Mark, is the simplest and most straight forward solution.

But if I am missing some really key point, please call it out.

Personally I believe this design is straight forward enough and simple enough, that we could get this in the Summer 2016 release of STIX. However, as I said, if it is missing some key thing, please let me know so I can better understand.

This solution does the following:

1) Provides a solution for single producer, a UI in a product can easily allow translations to be created under the covers for an object.

2) Provides a solution for third party translators.

3) Allows translations to be sent separate from the original file.

4) It does not require us to pre-identify every field that we want to translate and thus add UUIDs to all of the objects causing significant bloat on the wire and increased demands on processing.

There are few elements that this design does not cover...

a) Mixing languages in the same JSON object. This can be a good thing from a graph database standpoint and really simplifies the consumption of objects or request of the objects from a TAXII server.

b) It is not very clean for translating deeply nested objects. I am sure we can figure out a solution for this, but the current example does not show one. However, given our design criteria for STIX 2.x, we are desperately trying to flatten the objects as much as possible, so this may or may not end up being an issue.


Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
      On Apr 15, 2016, at 16:31, John-Mark Gurney <jmg@newcontext.com> wrote:

      Masuoka, Ryusuke wrote this message on Fri, Apr 15, 2016 at 02:20 +0000:

      - Always give "text_id" and "lang" for every text field
      (So that anyone can give translations to the field later, knowing
      which language it is in.)

      - Always give "text_ref", "text_id" and "lang" for every translation
      ("text_id" is for someone to provide translations to other than one in the original language.
      Example: A CTI text field created in Japanese, then it is given an English translation.
      Then German and French translations are produced based on the English translation.)


      A big issue with this is that now EVERY text field (that is
      translatable) will now have a UUID. For descriptions, this isn't a
      big issue, but when we are talking about titles and the like, it's
      possible that the UUID will be longer than the translation itself..

      I much prefer to handle translations by pointing to the object id,
      and then the fields that you want to translate..

      This is what I'm talking about:
      {
      "type": "package",
      ...
      "campaigns": [
      {
      "type": "campaign",
      "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
      "lang": "en",
      "revision": 1,
      "spec_version": "stix-2.0",
      "created_at": "2015-12-03T13:13Z",
      "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
      "title": "Dridex Campaign - Botnet 121",
      "descriptions": "Dridex-based campaign leveraging Botnet 121",
      "intended_effects": [
      {"value": "theft-identity-theft"}
      ],
      "status": "Ongoing"
      }
      ],
      "translations": [
      {
      "obj_ref": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
      "type": "translation"
      "lang": "ja",
      "text_id: "text-a1b2c3-ja-1",
      "title": "Dridex キャンペーン - ボットネット 121"
      "descriptions": "ボットネット 121 を活用する Dridex を元にしたキャンペーン"
      },
      {
      "obj_ref": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
      "type": "translation"
      "lang": "de",
      "title": "Some German Title”
      "description": "Some German Description"
      }
      ]
      ...
      }

      This is much more simple, It can be more simply handled in code by
      overlaying objects by language preferences, etc...

      As Bret pointed out, this does mean you can't have a base object w/
      mixed languages, but I don't see a strong value in that, as those
      other languages can be provided via translations...

      --
      John-Mark



GIF image



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]