OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [cti] Idea for Internationalization


Hi,

 

I have included updated internationalization use cases at the end of this message.

 

When I was going home last night (2/8 Evening JST) after sending my last message,

I come to realize two things ((A) and (B)).

 

(A) A translation should be a property of the original text, not a property of an object.

  I think this is the semantically correct way and that is why XLIFF is the way it is now.

This is better/practical as there can be an object with multiple text properties.

  So it is more like

  -----

  {

  title: {

    en: “…”,

    es: “…”

  }

  -----

  rather than

  -----

{

title_en: “…”,

 title_es: “…”,

 description_en: [“…”, “…”],

description_es: [“…”, “…”]

}

  -----

 

  XLIFF Version 1.2

  http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html

  Example translated into JSON would be like:

  {trans-unit: {

source-language: en,

target-language: fr,

source: Hello world,

target: Bonjour le monde,

alt-trans: [{lang: es, target: Hola mundo}],}}

 

(B) (As Bret mentioned) We cannot support adding translations later

  WITHOUT changing/modifying the original CTI. There would be legal and security

  implications. So in that sense, we should allow references (using relationships)

from the outside of the original CTI to add translations later.

(I am still for all translations in one object to support practically use cases

like (1), but it is a trade-off among consistency and other considerations.)

 

However, there is one practical issue to deal with. In order to support adding translations

later from the outside of the original CTI, we need ALWAYS to give an ID

for  every text in advance/at the time of creation.

 

Given:

-----

{

title: {trans-unit: {

ID: 1234-title,

source-language: en,

target-language: fr,

source: Some sophisticated cyber attack,

target: Certains cyberattaque sophistiquée,

},

description: [

  {text: {

  ID: 1234-desc-1

  source-language: en,

  source: We find some sophisticated cyber atacck by accident..., }

   },

   ...

]

}

  -----

  Then something like the following will be given later independently:

  -----

  {trans-unit: {

  ID: 1234-title-ja

  source-ID: 1234-title,

  target-language: ja,

target: ある高度なサイバー攻撃,}

}

  -----

  {trans-unit: {

  ID: 1234-desc-1-ja

  source-ID: 1234-desc-1,

  target-language: ja,

target: 我々はたまたまある高度なサイバー攻撃を見つけた...,}

}

-----

 

  I guess we need IDs for translations as well so that someone wants to

  base their additional intelligence on the translations.

 

What do you think?

 

Regards,

 

Ryu

 

------------------------------

Internationalization Use Cases

------------------------------

CN: Chinese

DE: German

EN: English

FR: French

JA: Japanese

 

(1) Providing an object texts in multiple languages simultaneously at the time of creation.

  [ja/en (in case of Japan), en/fr/de (in case of EU countries), etc.]

 

(2) EN CTI received by a Japanese entity, which provides EN translation

  (Or vice versa, JA CTI received by a US entity, which provides EN translation

 

  A Japanese entity receives CTI information pieces in English.

  The entity determines some of them are important/critical

  and worth translating them into Japanese, add descriptions in Japanese

  and redistribute them to other Japanese entities (if redistribution is allowed).

  The CTIM (CTI Management System) of a receiving party displays

  the Japanese description whenever possible, while allowing access to

  the original English descriptions."

 

 

  Work Flow:

  1. Company 1 in EN creates an Indicator and TTP and shares them to Company 2 in JP. 

    It is important to note that the flow may be direct or may be through a series of brokers and other entities. 

    1. This Indicator and TTP has a producer of Company 1 and a version of 1

  2. Company 2 builds a translated version of the TTP and Indicator and releases it.

    1. This new Indicator and TTP has a producer of Company 2 and a version of 2. 

    2. It is unrealistic to think that Company 2 can or will share the translated object back to Company 1 and that if Company 1 gets the translated object that they will do anything with it.  Their legal departments will probably prohibit accepting 3rd party translations and then using them in their offerings.

 

 

(3) iSIGHT Partner reports describing attacks against Japanese entities in EN 

 

  Report by iSIGHT Partners in EN on Cyber Attacks on Japan.

  There are filenames of lure attachments in Japanese (original/real) and their

  translations in English.  Another similar report in English has an email title along with

  its translation in English next to it. That report also has a Windows pathname

  in Chinese (not Japanese) found in a binary along with its translation in English.

  (If you have access to iSIGHT Partners reports, they are 15-00007028 and 15-00009810.)

  [Ex. Original File Name (JA): "医療費通知", Translated File Name (EN): "Medical expenses notice"]

 

(4) Email subject/body, supposed to be in JP, but includes CN characters (by mistake of the attackers)

 

(5) CTI translation service

 

------------------------------

 

 

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Tuesday, February 09, 2016 7:18 AM
To: Terry MacDonald
Cc: John A. Wunder; cti@lists.oasis-open.org
Subject: Re: [cti] Idea for Internationalization

 

I agree, a simple translation tag is probably the way to go.  At least for the initial versions of STIX 2.x..  

 

In regards to embedded translations, I just do not see them working in practice.  I think the relationship explosion, the potential lose of originating source, the lack of information source integrity and relationship integrity will mean nothing but problems.

 

So Ryu's first example of work flow is this:

 

"- A Japanese entity receives CTI information pieces in English.

  The entity determines some of them are important/critical
  and worth translating them into Japanese, add descriptions in Japanese
and redistribute them to other Japanese entities (if redistribution is allowed).
  The CTIM (CTI Management System) of a receiving party displays
  the Japanese description whenever possible, while allowing access to
  the original English descriptions."

 

 

Work Flow:

  1. Company 1 in EN creates an Indicator and TTP and shares them to Company 2 in JP.  It is important to note that the flow may be direct or may be through a series of brokers and other entities.  
    1. This Indicator and TTP has a producer of Company 1 and a version of 1
  1. Company 2 builds a translated version of the TTP and Indicator and releases it.
    1. This new Indicator and TTP has a producer of Company 2 and a version of 2.  
    2. It is unrealistic to think that Company 2 can or will share the translated object back to Company 1 and that if Company 1 gets the translated object that they will do anything with it.  Their legal departments will probably prohibit accepting 3rd party translations and then using them in their offerings.

You begin to see how existing relationships will fall apart and how the originating source will be lost, by creating a simple translation.  The only way embedded translations work, is if only the producer of the objects is creating the translation.  However, third parties will create translated objects.  It is going to happen.  And when it does, we will lose all of the existing relationships and information source integrity if we embed the translations.  

 

I really think that we need to consider a "translation" object that can ONLY be linked against the original so that people do not make arbitrary relationships against it.  We do not want people making new TTPs and new Threat Actors and linking against a Translation, you want them linked against the "original" object.  

 

Thanks,

 

Bret

 

 

 

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

 

On Feb 8, 2016, at 14:26, Terry MacDonald <terry@soltra.com> wrote:

 

Hi Bret,

For internationalisation, the w3c recommendations are that the language tags follow BCP47.

https://www.w3.org/International/techniques/developing-specs

BCP47  is made up from RFC5646 (http://tools.ietf.org/html/rfc5646) and RFC4647 (http://tools.ietf.org/html/rfc4647). Both have some good recommendations on selecting standard language tags.

They both point to the IANA language registry here: http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

We need to decide how in detailed we want to go with language tags. We can go extremely detailed if we want (e.g. sl-rozaj-biske) or we can go simple (e.g. de). My personal preference is that we go extremely simple for this first version of STIX 2.

I would suggest we just use the primary language subtags, and that we do not allow extended language tags at this time. This will make it far easier for implementers to implement the initial version of language support, without (hopefully) losing to much detail.

Regarding the open questions:

1) What happens when a system can not or chooses to not keep the extra language stuff?

In order for a system to be STIX compliant we need to ensure that it at least accepts the languages inbound. It also will need to be able to forward on the object to others exactly as it received the object (if allowed by data marking). This doesn't mean it has to use or display all the languages described within the STIX object... It only needs to ensure they aren't lost.



2) How do you keep the original producer when you need to republish a TLO with an additional translation?


The TWIGS proposal had a rule that only the creator of an object can update that object. This would mean that only the original object creator could publish an updated revision of the original object.

TWIGS does have a suggested-update relationship type which could help here. A third party who created a translation could link it back to the original with a suggested-update relationship, and send that through to the original creator. The creator could then issue an update.

Or, we could allow for suggested updates to be sent using the STIX request/response mechanism I've discussed before. A new translation could be made by a third party. The new translation object  could be sent to the original creator using that STIX request response mechanism, and then the original creator can review and modify the original object to a new revision and sent out.

Or we can just have a translation-of relationship, and have to handle the transitive relationship problem.

Or we just don't slow third party translations within STIX.




3) Will this open up threat intel to attacks by eliminating the chain or ownership?


There wouldn't be an elimination of ownership. The object creator is still the object creator and is still responsible for the lifestyle of the objects they create.



4) What is going to happen to the versioning aspect of TLOs and how will we track that and all of their relationships?

The TWIGS proposal recommended we mandate the use of incremental versioning. This is where the object keeps its object id for its entire life, and there is a version attribute that allows consumers to recognised there is an updated version. If extra translations are able to be 'suggested' back to the original creator using STIX request response mechanism then the original object will be updated, meaning that all other relationships associated with that object id remain valid.

I think this could actually work.

Cheers
Terry MacDonald

On 9/02/2016 4:17 AM, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote:


The one concern I have is it represents two ways of doing a translation.   And we get in to all kinds of weird issues with intel if we allow 3rd party translations, which I think is going to happen and was part of Ryu's original work flow.  Further, if a 3rd party wants to do a translation, then that will break the the original producer chain.  

If we are going to embed the language support then I would argue that this is the better way to go:

{
 “title”: {
   “en_us”: “…”,
   “es”: “…”
 },
 “description”: [
   {
     “en_us”: “…”,
     “es”: “…”
   },
   {
     “en_us”: “…”,
     “es”: “…”
   }
 ]
}


Some open questions with embedding:

1) What happens when a system can not or chooses to not keep the extra language stuff?

2) How do you keep the original producer when you need to republish a TLO with an additional translation?

3) Will this open up threat intel to attacks by eliminating the chain or ownership?

4) What is going to happen to the versioning aspect of TLOs and how will we track that and all of their relationships?




Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 


On Feb 8, 2016, at 07:09, Wunder, John A. <jwunder@mitre.org> wrote:

So to explore this a bit, were you imagining something like this:

{
 “title”: {
   “en”: “…”,
   “es”: “…”
 },
 “description”: [
   {
     “en”: “…”,
     “es”: “…”
   },
   {
     “en”: “…”,
     “es”: “…”
   }
 ]
}

It’s a bit more indirect but like I said earlier, while it looks uglier I don’t think the code to read/write is much worse. There could also be a more flattened approach:

{
 “title_en”: “…”,
 “title_es”: “…”,
 “description_en”: [“…”, “…”],
 “description_es”: [“…”, “…”]
}

We would need to specify what those keys would be, and probably standardize on whether we use country-specific codes or not. I.e. Will the keys be “en-US” and “en-UK” or just “en”? And we would need to define a relationship for “translation” to support third party translations…”translation-of” would make sense to me.


Can anybody see any major issues with an approach like this? The biggest one I see is that if you have third-party translations you’ll run into the mess of relationships only pointing to the original or any given translation and need to work through that. (I.e. People create relationships to my translation of your object rather than directly to your object, bifurcating our intelligence.) Anybody producing or consuming third party translations concerned about that?

John


On 2/8/16, 8:55 AM, "cti@lists.oasis-open.org on behalf of Coderre, Robert" <cti@lists.oasis-open.org on behalf of rcoderre@verisign.com> wrote:


It makes complete sense to have translations available for top level objects, and I agree with Ryu that that it also makes sense to include the translations in the same object.  In most cases (my subjective view) the translations will come from the same producer.  If an independent third party is translating content, then it should be a separate object and referenced back to the original.

As for CybOX observables, I think these would be independent objects, primarily for the reasons Trey mentions, which is they are specific to a particular region/language and will have enough subtle differences as to warrant that distinction.

Rob

-----Original Message-----
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Trey Darley
Sent: Monday, February 08, 2016 6:43 AM
To: Masuoka, Ryusuke
Cc: cti@lists.oasis-open.org
Subject: Re: [cti] Idea for Internationalization

On 08.02.2016 08:00:55, Masuoka, Ryusuke wrote:



May it be a title, a description, a filename, a subject of email,
etc., treating a translation as another property of the same object or
subproperty of the text object would be simpler and more natural than
treating the translation as another object.

For example, if it is a file object, it would be

-----
Case (A)
-----
File Object:
 ID: A123
 File Name (Original - JA): “
医療費通知
 File Name (Translation - EN): “Medical expenses notice”
 File Name (Translation - FR): “Frais médicaux Notez”
 File Extension: PDF
 Size in Bytes: 410,314
 Hashes:
    Hash Name: SHA1
    Hash Value: 1234567890123456789012345678901234567890
-----



I was tracking along with this I18N discussion right up until now.
Does it make sense to provide translations of CybOX observables?

Taking Ryusuke's example, assume that I'm a threat actor using an identical malicious payload to target victims in multiple languages.
If I send out a phishing mail entitled "
医療費通知", then the payload will be in Japanese. If I'm also targeting French-speakers, 1) the odds are minimal that I'll translate the file name exactly "Frais médicaux Notez" and even supposing that I do translate the filename exactly that way, the payload is going to be in French and so there's no chance in hell of the file hashes matching.

I18N makes total sense to me at the level of STIX TLOs with fields humans are likely to read. I don't see it providing much value at the CybOX observable level compared to the amount of complexity it will introduce.

We want to cater to humans, obviously, but if we make observables so complex as to practically preclude machine-parsing of them, then why not just send an old-fashioned email instead of using STIX/CybOX?

--
Cheers,
Trey
--
Trey Darley
Senior Security Engineer
4DAA 0A88 34BC 27C9 FD2B  A97E D3C6 5C74 0FB7 E430 Soltra | An FS-ISAC & DTCC Company www.soltra.com
--
"In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away."
--RFC 1925

 

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]