Subject: RE: Change draft for #286: source language

Good question. Let’s open this up to the DL.


My preference would be to specify that sourceLanguage SHALL be the host language, in your example, "sourceLanguage": "html".


My rationale is that a syntax colorizer for the host language will in general understand the syntax of a language that it expects to embed. For example, the VS 2017 IDE does this:




This simple approach has a drawback: If a SARIF snippet consists entirely of the embedded language, then the syntax colorizer might not recognize it. For instance, in VS 2017:




That would probably be a common scenario for HTML analysis tools.


One alternative is to make sourceLanguage an array, for example: "sourceLanguages": [ "html", "_javascript_" ]. If the syntax colorizer somehow recognized that it couldn’t parse the snippet in the first language, it could try the second.


Another alternative, making sourceLanguage an array of structures that specifies a source language for each textual region, is too horrible to contemplate.


Again, my preference is to leave it as it is, but add the rule that the host language wins.




So, if a file contains a mix of languages (e.g. html and _javascript_), then how should sourceLanguage be specified?



I pushed a change draft for Issue #286: “Specify optional property file.sourceLanguage to guide in syntax-driven colorization of snippets”:




We will move its adoption at TC #29 on Wednesday December 12th.




