OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

sarif message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Closed #200: "Require everything to be UTF-8"

I added the following comment to the issue, explaining why the spec already says as much as it is able to about file encoding:

I agree that the spec says as much as it can about encoding:

·         The SARIF log file must be encoded in UTF-8 (§3.1).

·         As a result, embedded file content (fileContent.text, §3.2.2) must be UTF-8 (transcoded from the original file encoding if necessary).

·         file.encoding (§3.19.9) is optional, and if absent, the original file encoding is taken to be unknown.

I believe it's that last point that @katrinaoneil's colleague objects to, but it's unavoidable in some cases. For example, Semmle takes a snapshot of a code base, saves the snapshot in UTF-8, and then analyzes the snapshot. Once the snapshot is taken, Semmle does not remember the original file encoding.

That might seem to imply that the encoding in this case is UTF-8. The problem is that if the SARIF file includes fix objects, those fixes might refer to the wrong portion of the original file if that file is in any other encoding. In this scenario, the SARIF log file needs to record the fact that it just doesn't know the original file encoding.

I noted the closure in the Editor’s Report.




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]