OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

csaf message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: Notes on localization - analysis of STIX and applicability to CSAF


FYI, I did some further research on localization, specifically by conferring with our globalization team.

Several insights, and some conclusions.
Conclusion / proposals:
These three conclusions seem radically simpler than trying to support a fully multi-lingual JSON format, don't reinvent the XLIFF model, and will probably work well with existing practices.

Eric.



On Mon, Jun 4, 2018 at 6:21 PM, Eric Johnson <eric@tibco.com> wrote:
In furthering our knowledge of how best to do localization for CSAF, I've reviewed:
https://docs.google.com/document/d/1ShNq4c3e1CkfANmD9O--mdZ5H0O_GLnjN28a_yrEaco/edit#heading=h.cfz5hcantmvx

("Language content" - section 4.2 of STIX version 2.1 - last updated 27 April, 2018)

As noted in section 4.2, the language content must not have any relationships with other "data objects" of the STIX document. In fact, it is represented in such a way that it could be stored completely separately from the rest of a STIX document. The file representation for STIX (including language-content) is apparently defined in TAXI, which I've not reviewed for this updated STIX 2.1 version. So the STIX approach could definitely be similar to what we've discussed in the CSAF TC - that of having an external translation document.

The effort I've been working on is the JSON Scheme representation of a CVRF 1.2 equivalent, whereas STIX defines an object model using JSON which is then separately mapped to a file format. It is definitely possible that some discrepancies could emerge based on my limited analysis. I checked, but did not find any "language-content" examples in the GitHub repository for stix2 schema that might help clarify.

By way of example, supposing this is the STIX content to be localized:

{
  "type": "campaign",
  "id": "campaign--12a111f0-b824-4baf-a224-83b80237a094",
  "lang": "en",
  "created": "2017-02-08T21:31:22.007Z",
  "modified": "2017-02-08T21:31:22.007Z",
  "name": "Bank Attack",
  "description": "More information about bank attack"
}

... then this is the "language_content" providing a possible localization:

{
  "type": "language-content",
  "id": "language-content--b86bd89f-98bb-4fa9-8cb2-9ad421da981d",
  "created": "2017-02-08T21:31:22.007Z",
  "modified": "2017-02-08T21:31:22.007Z",
  "object_ref": "campaign--12a111f0-b824-4baf-a224-83b80237a094",
  "object_modified": "2017-02-08T21:31:22.007Z",
  "contents":
    {
    "de": {
      "name": "Bank Angriff 1",
      "description": "Weitere Informationen über Banküberfall"
    },
    "fr": {
      "name": "Attaque Bank 1",
      "description": "Plus d'informations sur la crise bancaire"
    }
  }
}

Several design choices to note:
  • STIX items apparently have a unique identifier (such as "campaign--12a..." in the example above)
  • The modified time of the object is apparently critical for determining the appropriateness of a given translation
  • There's a lot of metadata here about the source document, and the tracking of the translation itself
  • Different languages are optionally combined into a single translation.
How does that align with CSAF?
  • As mentioned above, the design allows for the possibility of storing translations externally from the document itself.
  • There's a need to be able to identify that the translation corresponds to what was translated.
  • Multiple languages supported
For context, here's what we might want to translate in CSAF JSON:
  • /vulnerabilities[]/acknowledgements[]/description
  • /vulnerabilities[]/id/text
  • /vulnerabilities[]involvements[]/description
  • /vulnerabilities[]/notes[]/text
  • /vulnerabilities[]/remediations[]/description
  • /vulnerabilities[]/remediations[]/entitlements[]
  • /vulnerabilities[]/threats[]/description
  • /vulnerabilities[]/title
  • /document_notes[]/text
  • /document_title
  • /document_references[]/description
(note that the above refer to "document_notes" and "document_title", although our intent is to change this to "document/notes", and "document/title" - written this way to be consistent with the current JSON schema available on GitHub.)

I have a very proposal for a very simple approach that perhaps radically simplifies the problem of localization. The challenge:
  • Referencing an object requires: adding a unique identifier on an object, leveraging an existing unique characteristic (such as a CVE # on a vulnerability), or using the JSON pointer references. All of these approaches have their failure modes.
  • As STIX highlights - the time that the source document was snapshotted for translation is provided may differ from when the document was last modified.
Suggested simplified approach:
  • Capture the *exact* strings that need to be localized, and one to N localized versions of the same.
  • Don't worry about anything else.
So rather than trying to define a localization for a title (from the sample document) such as "Cisco IOS and IOS XE Software Smart Install Remote Code Execution Vulnerability" by identifying some sort of reference to the "document/title" property, just list the string. That is, a localization document might look like this:

[
  {
    "original": "Cisco IOS and IOS XE Software Smart Install Remote Code Execution Vulnerability",
    "translations": {
      "de": "Remotecodeausführung Sicherheitsanfälligkeit durch Cisco IOS- und IOS XE-Software Install"
    }
  }
]

(Sorry, translation courtesy of Google Translate plus my edits)

Possible issues with this approach:
  • Applying the localization then means that anything processing the CSAF JSON instance needs to be aware of where translatable might strings appear, and check for the translations.
  • Failure-mode - two text strings in a CSAF document that share the same original representation, but translate differently.
  • Only possible to catch post-translation snapshot changes by way of identifying portions of the document not translated.
Benefits of this approach:
  • Document revisions don't matter - either the string to be translated matches, or it doesn't.
  • No need for references to the point in the original document where the translation is supposed to apply.
Thoughts?

Eric.




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]