This week’s TC meeting includes agenda item 4.3.2, “Review localizability proposal [#84]”. Here is the proposal. I have highlighted open design questions where we might choose among multiple options.

1. Introduce a “message” object

Many SARIF objects include both a message property containing plain text and a richMessage property containing rich text. When we enable localizability, these objects will need additional properties to serve as “message identifiers”, which point to strings that are stored externally (or possibly stored elsewhere in the SARIF file).

Rather than repeat all these properties on every object that holds a message, we will introduce a message object to hold them. For instance, the message property on an annotatedCodeLocation within a codeFlow might look like this:

message: {

"text": "Variable 'a' is aliased to variable 'b'.",

"richText": "Variable `a` is aliased to variable `b`.",

"messageId": "MSG0001",

"richMessageId": "RICH0001",

"arguments": [ "a", "b" ]

}

Some message strings will require parameters, so the message object has an arguments property. In this example, the message string associated with id "MSG0001" might be:

"Variable '{0}' is aliased to variable '{1}'."

2. Allow a SARIF file to contain message strings

Even if the message strings are never translated into another language, a SARIF producer might choose to store all the message strings it uses in a message dictionary embedded within the SARIF file. This avoids repeating the same message string in many places throughout the SARIF file. For example, we might find this message object…

message: {

"messageId": "MSG0001",

"arguments": [ "a", "b" ]

}

… somewhere in this SARIF file:

{

"runs": [

{

"tool": {

"name": "SecurityScanner",

"language": "en-US"

"results": [

{

"ruleId": "SEC2021",

...

}

"resources": {

"messageStrings": {

"MSG0001": "Variable '{0}' is aliased to variable '{1}'."

}

]

}

In this example, the messageStrings dictionary contains strings for "en-US". Note that in this example the message object contains only a message identifier; it doesn’t contain the message text at all.

NOTE: The messageStrings dictionary is nested within a resources property because, as we will see later, we will also include rule metadata in the resources.

3. Allow multiple languages in a SARIF file?

An open design question is whether to allow a SARIF file to bundle message strings for multiple languages. That might look like this:

{

"runs": [

{

"tool": {

"name": "SecurityScanner",

"language": "en-US"

...

"resources": {

"en-US": {

"messageStrings":

"MSG0001": "Variable '{0}' is aliased to variable '{1}'."

}

"fr-FR": {

"messageStrings": {

"MSG0001": "..."

}

]

}

4. Locating message strings stored outside the SARIF file

In general, a tool wouldn’t want to store the message strings for all languages within every SARIF file. And a tool vendor would want to be able to add support for new languages just by shipping a new message string file. Therefore it must be possible for a SARIF consumer to look up message strings in an external file.

4.1 Message string root directory and file format

The proposed design requires that the tool store message strings for all languages under a single directory. The SARIF construct specifies that directory is as follows:

{

"runs": [

{

"tool": {

"name": "SecurityScanner",

"language": "en-US",

"version": "2.0.1",

"resourceLocation": { # A fileLocation object

"uri": ".",

"uriBaseId": "RESOURCES"

}

"originalUriBaseIds": {

"RESOURCES": "https://www.example.com/tools/SecurityScanner/resources/2.0.1"

...

}

]

}

In this example, the resources are found at location "." relative to the URI specified by the URI base id "RESOURCES". In this example, that URI base id by default points to a web address associated with the tool.

The proposed design requires that the base URI for the message strings be of the form resources/toolVersion.

NOTE: This SARIF sample uses a fileLocation object as well as the property run.originalUriBaseIds. Both of those constructs are new, and we will vote on them before we discuss the localization design.

4.2 Message string file naming rule

The proposed design requires that the message string files for each language are stored under that directory, in separate files with names lang.messageStrings.json. For example:

fr-FR.messageStrings.json. : message strings for French as spoken in France.
fr.messageStrings.json: message strings for region-neutral French.

NOTE: The SARIF specification must specify the rules for locating message strings. It can’t leave that up to each tool to decide, because a SARIF consumer such as a viewer must be able to located message strings for any language, regardless of the tool that provides the string.

NOTE: The examples I give here, and the message string file search order in Section 4.3 below, assume the simple “language-Region” naming convention specified in RFC 3066. But RFC 3066 has been superseded twice, most recently by RFC 5646, which defines a more complicated naming convention that includes, among other things, “variant subtags” (e.g., "de-CH-1996") and “script subtags” (e.g., "cmn-Hans-CN"). It is an open design question whether we should support this full complexity. If the answer is yes, it will complicate the “message strings file search order” defined below. Note that the SARIF spec defines the tool.language property in terms of RFC 5646.

4.3 Message string file search order

The message strings for the language and region specified by the end user might not always be available. The proposed design requires a SARIF client to look up message strings files in a particular order.

Suppose the end user’s chosen language is "fr-FR", and suppose the user displays a SARIF file whose declared language is "de-DE". Then the message string file search order is:

fr-FR.messageStrings.json # First, try to find exactly what they asked for.

fr.messageStrings.json # Fall back to resource-neutral language.

embedded message # If no matching message string file is found, use the message or resources embedded in the SARIF file.

The proposed design requires that every SARIF file be stand-alone

5. Message string file format

Having located the message string file, the SARIF consumer needs to be able to read it. Again, the SARIF specification must define the format; it can’t leave it up to each tool to decide. It is an open design question whether to define the message string file format as a simple JSON dictionary, like this:

{

"MSG0001": "...",

"RICH0001": "..."

}

… or whether to make the message string file format more self-describing, for example, by adding version and language properties, like this:

{

"version": "1.0.0",

"language": "es-ES",

"messageStrings": {

"MSG0001": "...",

"RICH0001": "..."

}

6. Result messages

Finally, there is the question of messages in result objects. This is complicated because the SARIF format already defines a format for embedding “rule metadata”, including message strings, within a SARIF file. As the spec stands today, it looks like this:

{

"runs": [

{

"results": [

{

"ruleId": "SEC2021",

"message": "...",

"richMessage": "...",

"templatedMessage": {

"templateId": "default",

"arguments": [ "C:\\password.txt" ]

}

"richTemplatedMessage": {

"templateId": "default",

"arguments": [ "C:\\password.txt" ]

}

"rules": {

"SEC2021": { # A rule object.

"id": "SEC2021",

"name": "DoNotStorePlainTextPasswords",

"shortDescription": "...",

"fullDescription": "...",

"richDescription": "...",

"messageTemplates": {

"default": "Plain text passwords found in file '{0}'."

}

]

}

I propose to modify the rule object so it looks like this:

{ # A rule object

"id": "SEC2021",

"name": "DoNotStorePlainTextPasswords",

"shortDescription": { # Now a message object: It can have both a plain text and a rich text version.

}, # As the spec stands today, rule.shortDescription is intentionally plain-text

# only, but if we use message objects everywhere, there’s no point in

# prohibiting shortDescription from offering both forms.

"fullDescription": { # Now a message object. As such, it encompasses both of the two former

} # string-valued properties fullDescription (replaced by fullDescription.text)

# and richDescription (replaced by fullDescription.richText).

"messageStrings": { # Optional. Permits a rule message to be looked up in the rule metadata

# before doing a full message string lookup.
# Replaces the messageTemplates property.

"default": "Plain text passwords found in file '{0}'.",

"special": "..."

"richMessageStrings": { # Replaces the richMessageTemplates property

"default": "Plain text passwords found in file '{0}'",

"special": "..."

}

A result object would look like this:

{ # A result object

"ruleId": "SEC2021",

"ruleMessageId": "default",

"message": { # A message object.

"messageId": "SEC2021_default",

"richMessageId": "SEC2021_RICH_default",

"arguments": [ "C:\\password.txt" ]

}

To allow rule metadata to be localized, and to allow rule metadata and message strings to be looked up in a uniform way, we do two things:

Push the run.rules property down into the run.resources property, like this:

{

"runs": [

{

"tool": {

"name": "SecurityScanner",

"language": "en-US"

...

"resources": {

"en-US": {

"messageStrings":

"MSG0001": "Variable '{0}' is aliased to variable '{1}'."

"rules": {

"SEC2021": {

"id": "SEC2021",

"fullDescription": {

}

"messageStrings": {

"default": "...",

"special": "..."

}

"fr-FR": {

"messageStrings": {

"MSG0001": "..."

"rules": {

"SEC2021": {

...

}

]

}

Rule message lookup would work as follows:

If result.ruleMessageId is present, then look for the rule message string designated by the combination of result.ruleId and result.ruleMessageId, for the end user’s selected UI language.
If result.ruleMessageId is not present, or if there is no rule message string for the specified ruleId, ruleMessageId, and selected language, then look up a “normal” message string according to this table:

Scenario	End user’s language matches SARIF file language	Message text is embedded in SARIF file	Consumer can display rich text	Result
1	Yes	Yes	Yes	Display result.message.richText, if present; otherwise display result.message.text.
2	Yes	Yes	No	Display result.message.text.
3	Yes	No	Yes	Look up and display result.message.richMessageId if present; otherwise look up and display result.message.messageId.
4	Yes	No	No	Look up and display result.message.messageId.
5	No	N/A	Yes	Look up and display result.message.richMessageId if present; otherwise look up and display result.message.messageId.
6	No	N/A	No	Look up and display result.message.messageId.

In any scenario that specifies a message id lookup, look for the message strings for the end user’s selected UI language, using the message string file search order specified in Section 4.3 above.

NOTE: In Scenarios 5 and 6, the lookup order defined in Section 4.3 above might result in the consumer displaying the string embedded in the SARIF file, even if its language does not match the user’s UI language.

Thanks,

Larry

sarif message

1. Introduce a “message” object

2. Allow a SARIF file to contain message strings

3. Allow multiple languages in a SARIF file?

4. Locating message strings stored outside the SARIF file

4.1 Message string root directory and file format

4.2 Message string file naming rule

4.3 Message string file search order

5. Message string file format

6. Result messages