Subject: RE: Determining file encoding
I might be able to finesse this point. I could remove the whole part of the “text regions” section that presents this (old and busted) way of determining encoding. Then I could say something like this:
A SARIF producer SHALL only emit text-related region properties if it knows the character encoding of the file, in which case it SHALL also emit file.encoding (§3.17.9) or run.defaultFileEncoding (§3.11.17).
In the section on fixes I’d say something like:
If a SARIF consumer does not know the character encoding of a file, it SHALL NOT apply a fix unless the deletedRegion contains binary-related properties.
The spec is inconsistent in how it tells a consumer to determine a file’s encoding.
The sections on file.encoding and run.defaultFileEncoding say:
The section on “Text regions” (which was written before we introduced file.encoding and run.defaultFileEncoding) has a different idea. The reason this section cares about encoding is that it wants consumers to know how many bytes each character occupies, so they can correctly identify (and highlight) a text region:
(NOTE: Step 3 doesn’t actually identify an encoding, but it gives the consumer a best guess as to how to identify the region.)
We need to rationalize these. It might look like this:
A couple of things: