OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re:Spec Version


Splitting thread..

If the spec_version field is giving people issues, we could easily just do this in TAXII land and make the backend system keep track of it in their implementation specific way.  The TAXII client could tell the end point I am asking for or sending you stix-2.0 content.  

The beauty behind this method, is then you could configure a TAXII server to support more than just one format.  And you could easily just say I want content in format XYZ..

This would be a huge value add for companies producing TAXII servers.  This would enable them to differentiate between their competitors and provide "extra" value. 


Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Apr 15, 2016, at 11:42, Wunder, John A. <jwunder@mitre.org> wrote:

I would frame it this way:

Let’s understand which capabilities we actually need and how much we need them. We need to do this because there’s a cost to adding capabilities…I had a discussion yesterday with someone who was upset that we had `spec_version` on all top level objects, so adding 2 extra properties to all text fields would blow his mind. Now, we can say that in support of all these necessary use cases that it's an OK tradeoff to make, but, we need to acknowledge that it’s a tradeoff. People will not adopt STIX if we make it too complicated, and on the other hand people will not adopt STIX if we make it too simple and are unable to support even the basics. The trick is finding the balance.

So…do we need to support third-party translations? Is it OK to, instead, just let third-party translation services issue new objects? Similarly, do we need to track individual text strings rather than translations of complete TLOs? Maybe we want to, but is it worth the tradeoff when the answer is just to automatically reissue translations when the text strings don’t change?

So I’d suggest people look through the great writeups that Ryu has provided and think about that balance. Where’s the sweet spot between supporting everything and ensuring mass adoption?

My personal preference (feel free to ignore this part of the e-mail) is to support either:
  1. Language maps: {“description”: {“en”: “English”, “es”: “Spanish”}. It’s easy to understand and while it doesn’t allow for exact tracking of third party translations you could still do them just by reissuing the object with your added maps. OTOH, that person who was upset at spec_version would probably not do STIX.
  2. A lang tag at the TLO level: {“type”: “indicator”, “lang”: “en”}. Yes, we lose the ability to have a single TLO to point things to. Maybe eventually we add a translation object to help with that use case (or maybe we spend the time and do it now). If we don’t have that translation object, people doing translations just create a new object. The advantage is that, aside from adding (yet another) overhead field to TLOs we don’t have any additional abstraction.
I think either of those would strike a good balance between capability and simplicity. We’d probably lose a few people in the simplicity direction for #1 and a few people in the capability direction for #2 but in the end we’re never going to make everyone happy. It’s about finding solutions that we can all live with.

John

From: <cti@lists.oasis-open.org> on behalf of "Jordan, Bret" <bret.jordan@bluecoat.com>
Date: Friday, April 15, 2016 at 11:38 AM
To: "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>
Cc: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: Re: [cti] i18n (RE: MVP Discussion)

Some comments:

1) The one problem with this is we would need to pre-identify every field that someone might want to translate. This would make translations very rigid and prevent arbitrary fields from being translated.  

2) If you are the original creator of the document are you really going to write some of the text fields in one language and other text fields in a different language?  I am thinking that a JP producer of original STIX content would either write all of their text fields in either JP or in EN, but they would not write the title in English and the Description in JP and subfield X in DE..

3) Given #2, I believe it would be sufficient for the document to contain the "lang" tag, and not do it on every single field level.  

4) With what you have done, Ryu, I think there is an assumption being made that if people update the title, that they will ALSO update that field level ID.  This will not be guaranteed.  So you will still need to check to see if the description has changed so you can know if the translation is still valid.  

Ryu, I think we are getting close on something that might work.  The differences between your proposal and my proposal are very small.  In my proposal you would put all of the translated fields in to a single object and send it.  In your proposal you would split out each field in to its own object and send it.   My proposal also allows any arbitrary text to be translated, not just fields that we identify.  

I hope you will be at the F2F so we can flesh this out. If not, we should try and tackle this on a weekly call.  


Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Apr 14, 2016, at 20:20, Masuoka, Ryusuke <masuoka.ryusuke@jp.fujitsu.com> wrote:

Hi, Bret, all,
 
Sorry that it took me some time.
I think that I got a fairly simple and straight-forward solution.
It is Brets idea, just giving a reference to the text itself, 
instead of reference through object structure.
I also added one more use case, (7) CTI provider and added 
description to the use case, (6) CTI translation service.
 
I know time is limited, but I thought for this new set of
standards, we are taking big steps to lay a foundation for future.  
If we aspire to take those CTI standards to ISO eventually,
I believe that i18n is one of strategic and important elements and
that it should not be after-thought ad-hoc patches here and there.
(Especially if it is not a simple fix.)
 
Regards,
 
Ryu
 
----------
Internationalization Rules
-----------
- Always give "text_id" and "lang" for every text field
  (So that anyone can give translations to the field later, knowing
  which language it is in.)
 
- Always give "text_ref", "text_id" and "lang" for every translation
  ("text_id" is for someone to provide translations to other than one in the original language.
  Example: A CTI text field created in Japanese, then it is given an English translation.
    Then German and French translations are produced based on the English translation.)
 
- Use the same text_id for the text field if it stays the same over different versions
- Always use translation to give a translation 
  If there are multiple translations for a text, have multiple translations
 
-----
- Pattern A - Translation given inside the same package
{
  "type": "package",
  ...
  "campaigns": [
    {
      "type": "campaign",
      "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
     "revision": 1,
      "spec_version": "stix-2.0",
      "created_at": "2015-12-03T13:13Z",
      "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
      "title": {
        "text_id": "text-a1b2c3",
        "lang": "en",
        "value": "Dridex Campaign - Botnet 121"},
      "descriptions": {
        "text_id": "text-d4e5f6",
        "lang": "en",
        "value": "Dridex-based campaign leveraging Botnet 121"},
      "intended_effects": [
        {"value": "theft-identity-theft"}
      ],
      "status": "Ongoing"
   }
  ],
  "translations": [
    {
      "text_ref": "text-a1b2c3",
      "lang": "ja",
      "text_id: "text-a1b2c3-ja-1",
      "value": "Dridex キャンペーン - ボットネット 121"
    },
    {
      "text_ref": "text-d4e5f6"
      "lang": "ja", 
      "text_id": "text-d4e5f6-ja-1"
      "value": "ボットネット 121 を活用する Dridex を元にしたキャンペーン"
    },
    {
      "text_ref": "text-a1b2c3",
      "lang": "de",
      "text_id: "text-a1b2c3-de-1",
      "value": "Some German Title
    },
    {
      "text_ref": "text-d4e5f6",
      "lang": "de", 
      "value": "Some German Description"
    }
  ]
  ...
}
-----
- Pattern B - Translation given in some external sources
{
  "translations": [
  {
    "text_ref": "text-a1b2c3",
    "lang": "es",
    "text_id": "text-a1b2c3-es-1", 
    "value": "Some Spanish Title
  },
  {
    "text_ref": "text-d4e5f6",
    "lang": "es",
    "text_id": "text-d4e5f6-es-1",
    "value": "Some Spanish Description"
  },
  {
    "text_ref": "text-a1b2c3",
    "lang": "fr", 
    "text_id": "text-a1b2c3-fr-1",
    "text_value": "Some French Title"
  },
  {
    "text_ref": "text-d4e5f6",
    "lang": "fr", 
    "text_id": "text-d4e5f6-fr-1",
    "value": "Some French Description"
  }
  ]
}
-----
 
----------
Notes - Simplicity, coherency, consistency
----------
- Only one way to express a text field
- Only one way to give translations
  (It can be inside the package or sitting in external sources.)
- Additional ~50 bytes + byte length of text-id for each text field
- Resources spent for translation will not be wasted
  as long as the text stays same.
- Use Case (1) -> Patten A
- Use Case (2) -> Pattern A or B
  The text always has its language code
- Use Case (3) -> Pattern B
- Use Case (4) -> Using UTF-8
- Use Case (5) -> Using UTF-8
- Use Case (6) -> Pattern B
- Use Case (7) -> Pattern A (Dynamically produced from its DB)
 
----------
 
------------------------------
Internationalization Use Cases
------------------------------
CN: Chinese
DE: German
EN: English
FR: French
JA: Japanese
 
------------------------------
(1) Providing an object texts in multiple languages simultaneously at the time of creation.
------------------------------
 
  [ja/en (in case of Japan), en/fr/de (in case of EU countries), etc.]
 
This is the most likely use case (for me). The original CTI has titles/descriptions in
multiple languages from the start. When you create a CTI file, you include
both English and Japanese titles/descriptions for major objects in it
so that non-Japanese speaking people can at least find out what it is at the top level.
 
------------------------------
(2) CTI Database Receiving CTI from Multiple CTI Sources in Different Languages
------------------------------
 
This is a case where you receive CTI from a English CTI source and
another CTI source in Japanese.
You put all CTI into MongoDB or some other No-SQL Database and
would like to do mix and match. I would like the CTI Database still
can track the language code of textual fields.
 
------------------------------
(3) EN CTI received by a Japanese entity, which provides EN translation
  (Or vice versa, JA CTI received by a US entity, which provides EN translation
------------------------------
 
  A Japanese entity receives CTI information pieces in English.
  The entity determines some of them are important/critical
  and worth translating them into Japanese, add descriptions in Japanese
  and redistribute them to other Japanese entities (if redistribution is allowed).
  The CTIM (CTI Management System) of a receiving party displays
  the Japanese description whenever possible, while allowing access to
  the original English descriptions."
 
  Work Flow:
  1. Company 1 in EN creates an Indicator and TTP and shares them to Company 2 in JP. 
    It is important to note that the flow may be direct or may be through a series of brokers and other entities. 
    1. This Indicator and TTP has a producer of Company 1 and a version of 1
  2. Company 2 builds a translated version of the TTP and Indicator and releases it.
    1. This new Indicator and TTP has a producer of Company 2 and a version of 2. 
    2. It is unrealistic to think that Company 2 can or will share the translated object back to Company 1 and that if Company 1 gets the translated object that they will do anything with it.  Their legal departments will probably prohibit accepting 3rd party translations and then using them in their offerings.
 
------------------------------
(4) An English CTI report describing attacks against Japanese entities in EN 
------------------------------
 
  An English report on Cyber Attacks on Japan.
  There are filenames of lure attachments in Japanese (original/real) and their
  translations in English.  Another similar report in English might have an email title along with
  its translation in English next to it. That report also has a Windows pathname
  in Chinese (not Japanese) found in a binary along with its translation in English.
 
  These Japanese texts can be found in descriptions, not just
 
  [Ex. Original File Name (JA): "医療費通知", Translated File Name (EN): "Medical expenses notice"]
 
  Note: This should probably be okay as long as the standards require use of UTF-8 for encoding.
 
------------------------------
(5) Email subject/body, supposed to be in JP, but includes CN characters (by mistake of the attackers)
------------------------------
 
  This can happen due to Chinese/Japanese/Korean sharing Unicode characters
 
  This can be a very important clue as to the attackers.
 
  Note: This should probably be okay as long as the standards require use of UTF-8 for encoding.
 
------------------------------
(6) CTI translation service
------------------------------
 
  A CTI translation service provider keeps translations to target languages of text fields
  from publicly available and/or commercial/private CTI sources.
  The service is available through some kind of online API.
  Consumers of this translation service will use this service to translate text fields
  in their CTI system through the API provided by the translation service provider.
 
------------------------------
(7) CTI provider
------------------------------
 
  A CTI provider (in English) plans to penetrate the Japanese and other APAC markets
  and needs a standard way to add translations of their text fields.
  The CTI provider gives its customer a CTI package with all the translations in it
  or a CTI package with translations to the languages of user's choosing. 
 
------------------------------
 
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Wednesday, April 13, 2016 2:40 AM
To: Jason Keirstead
Cc: John-Mark Gurney; Masuoka, Ryusuke/
益岡 竜介; cti@lists.oasis-open.org
Subject: Re: [cti] RE: i18n (RE: MVP Discussion)
 
Given this debate, and the fact that it appears that we will not resolve this with a simple fix, I would suggest that this NOT be MVP.  I was hoping for an easy solution.  But apparently that is not possible.  
 
I also want to make sure we do not mix up the way a product might do this with how it needs to be conveyed across the wire.  A tool could use the text once, and reference it all over the place.  But that does NOT mean it needs to go over the wire that way. 

 

Thanks,
 
Bret
 
 
 
Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 
 
On Apr 12, 2016, at 10:33, Jason Keirstead <Jason.Keirstead@ca.ibm.com> wrote:
 
I think what Ryusuke is pointing out is that having a translation point to a specific revision does not solve his use case, which is a common one when i18n'ing software - as I know first hand, translation en-masse is very expensive and therefore you do not want to re-translate the same text multiple times. If you have the same title text used in 100 objects, you would want to only translate that text once and reference it 100 times.

This is what he is asking for - a way to tie the translation not to an object version, or even an object instance - but to the text property itself.

See Java resource bundles, or Gettext PO files, for example. His ask is for the equivalent to resource bundles and keys for STIX.

-
Jason Keirstead
STSM, Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown 


<graycol.gif>John-Mark Gurney ---04/12/2016 01:25:16 PM---Masuoka, Ryusuke wrote this message on Tue, Apr 12, 2016 at 10:05 +0000: > Thank you for your sugges

From: John-Mark Gurney <jmg@newcontext.com>
To: "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>
Cc: "Jordan, Bret" <bret.jordan@bluecoat.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Date: 04/12/2016 01:25 PM
Subject: Re: [cti] RE: i18n (RE: MVP Discussion)
Sent by: <cti@lists.oasis-open.org>




Masuoka, Ryusuke wrote this message on Tue, Apr 12, 2016 at 10:05 +0000:
> Thank you for your suggestion.
> I think it works and meets my needs.
> 
> One thing I am afraid of is that it might be brittle against versioning.
> It refers to an object and it is not useful once the object gets updated
> even if the object uses the same texts.

This is why the translation should point to a specific revision of the
object...  See minor modification below...

I believe I addressed this in my example that I sent a while back...

> From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
> Sent: Monday, April 11, 2016 7:09 AM
> To: Masuoka, Ryusuke/
益岡 竜介
> Cc: cti@lists.oasis-open.org
> Subject: [cti] Re: i18n (RE: MVP Discussion)
> 
> Ryu,
> 
> From your example #1, it would probably need to look something like this.  This would enable a tool to:
> a) translate any text, not just certain fields
> b) allow you or anyone else to create a translation at any point in time.
> 
> Please let me know if this would not meet your needs... In reading through your use-cases, which are good BTW, I believe this would address all of your points.
> 
> 
> 
> {
>   "type": "package",
>   ...
>   "campaigns": [
>     {
>       "type": "campaign",
>       "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>       "lang": "en",
       "revision": 1,
>       "spec_version": "stix-2.0",
>       "created_at": "2015-12-03T13:13Z",
>       "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
>       "title": "Dridex Campaign - Botnet 121",
>       "descriptions": "Dridex-based campaign leveraging Botnet 121",
>       "intended_effects": [
>         {"value": "theft-identity-theft"}
>       ],
>       "status": "Ongoing"
>    }
>   ],
>   "translations": [
>     {
>       "type": "translation",
>       "id": "translation--a1201df6-c352-4a81-9c7c-5a6f896a1111",
>       "lang": "jp",
>       "spec_version": "stix-2.0",
>       "created_at": "2015-12-03T13:13Z",
>       "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
       "translated_ref": [ "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31", 1 ],
>       "translated_text": [
>         "title": "Dridex 
キャンペーン - ボットネット 121",
>         "description": "
ボットネット 121 を活用する Dridex を元にしたキャンペーン"
>       ]
>     },
>     {
>       "type": "translation",
>       "id": "translation--a1201df6-c352-4a81-9c7c-5a6f896a2222",
>       "lang": "de",
>       "spec_version": "stix-2.0",
>       "created_at": "2015-12-03T13:13Z",
>       "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
       "translated_ref": [ "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31", 1 ],
>       "translated_text": [
>         "title": "Some German Title",
>         "description": "Some German Description"
>       ]
>     }
> 
>   ]
>   ...
> }

-- 
John-Mark

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php



Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]