Re: [EXT] Re: [csaf] [CSAF JSON Schema] Follow-on discussion about langu

Bret

Sent from my Commodore 64

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050

On May 7, 2018, at 9:23 PM, Allan Thomson <athomson@lookingglasscyber.com> wrote:

Eric –

I disagree that a translation will take place after the original vulnerability was defined. Its entirely possible that orgs publishing vulnerabilities are capable of producing the text in multiple languages simultaneously.
the exact requirements you are listing for localization for CSAF were (pretty much) the same set of requirements that STIXv2 had/has.

I suggest not re-inventing a new mechanism and consider using the approach already shared on a prior email regarding how it was done in STIXv2.

Vendors and orgs that will support both STIXv2 and CSAF will appreciate this consistency.

Allan

From: <csaf@lists.oasis-open.org> on behalf of Eric Johnson <eric@tibco.com>
Date: Monday, May 7, 2018 at 11:36 AM
To: "csaf@lists.oasis-open.org" <csaf@lists.oasis-open.org>
Subject: [csaf] [CSAF JSON Schema] Follow-on discussion about language support

Several further comments about language support in the schema.

I believe we should state the language in the document in some form. In my original email, I gave option (A), which presumes the question is simply out-of-scope. Nobody else has come forward in furtherance of that approach, so I suspect we should discard that option. However, I do think it is reasonable to state a default value for a language choice - so the schema may relay the default value, even if the value does not appear in the instance.

For completeness, here's where I see translation actually being a question:

/vulnerabilities[]/acknowledgments[]/description
/vulnerabilities[]/involvements[]/description
/vulnerabilities[]/notes[],
/vulnerabilities[]/remediations[]/description
/vulnerabilities[]/threats[]/description
/vulnerabilities[]/title
/document_title
/document_distribution
/document/notes[]

Based on what I see and understand, I suspect that nothing in the "product_tree" should be localized. Anyone disagree?

As for performing localization, I come down on the side of localization data being outside of the CSAF document. I think this makes sense simply because it has a different life-cycle from the vulnerability data in the existing documents. That is, translations of vulnerability descriptions and notes about vulnerabilities will not be available at the same time as the vulnerabilities themselves. Translations are also likely to be incomplete, as not all information might be translated to all languages. However, It behooves us to specify how it works, so that we can guarantee localization is handled consistently.

------

The requirements for localization, that I can put my finger on:

Provide the ability to translate any of the relevant strings (enumerated above)
Allow the life-cycle of translations to be different from the vulnerability document itself
Allow for translation to arbitrary languages
Allow for third-parties to provide translations
Tool / process / data format / specification defined for validating the translation of a CSAF document in a new language. This may include generating a new version of the CSAF document in the translated form.
Work well with the current ecosystem of translation tooling.

Any other requirements? (I confess I've not been involved in localization in a long time, so I suspect I'm missing something.)

------

To externalize the translation implies one of two approaches:

Use a pointer into the document to identify a specific translation. Obvious candidate for this is JSON Pointer.
Labels in the document identifying a thing to be translated - those could be referenced outside the document. There are two options for this:

The text itself that needs to be localized (could be long!)
An extra property that establishes a label for the data to be localized.

My recommendations:

Separate document for translation data (format and contents defined by TC).
Use the "label" approach to associate translation with the original text. (Using the text itself is prone to breakage. Using JSON pointer means accepting array indices in the pointer, which is also prone to breaking.)
Define a property in the schema that puts an label near what needs to be translated, so that translation can refer to those labels. To make that even better, define defaults - easy ones like "document_title", but also "vul-CVE-2018-XXXX-note-description", "vul-CVE-2018-XXXX-note-summary", where the latter can work if the defaults are unique. Where the labels are not unique, or would require defaulting to array indices, require the label property to have a unique value within the document.

Comments?

Eric.

csaf message