Subject: SARIF must be UTF-8
We’ve previously discussed whether SARIF has to be encoded in UTF-8. The answer, it turns out, is “yes.”
As Stefan has correctly pointed out, the ECMA 404 JSON spec does not specify a text encoding for JSON. However, it does say the following:
It is expected that other standards will refer to this one, strictly adhering to the JSON syntax, while imposing semantics interpretation and restrictions on various encoding details. Such standards may require specific behaviours. JSON itself specifies no behaviour.
JSON text exchanged between systems that are not part of a closed
ecosystem MUST be encoded using UTF-8 [RFC3629].
Previous specifications of JSON have not required the use of UTF-8
when transmitting JSON text. However, the vast majority of JSON-
based software implementations have chosen to use the UTF-8 encoding,
to the extent that it is the only encoding that achieves
Since SARIF documents are JSON text that is exchanged between systems that are not part of a closed ecosystem, RFC 8259 requires them to be encoded in UTF-8.
Even if RFC 8259 did not exist, the language in ECMA 404 means that SARIF, as a standard that refers to ECMA 404, would be entitled to require a specific encoding. But the existence of RFC 8259 makes it clear that we should do that.