OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

sarif message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [sarif] RE: Please comment on #125

This is right. And it really does illustrate what an odd corner case this is. You have to use a uriBaseId to construct a reference for this because the otherwise identical absolute URLs don’t comprise unique keys for the files table.


Btw – this example sheds light on a need for some general guidance for how to handle embedded file contents. In this corner case, you must absolutely prefer the embedded content because it might not exist anywhere else (i.e., it was never in source control or it was on disk previously but was subsequently overwritten).


But does it make sense as a rule to tell people to prefer the embedded file contents if it exists? I think it does as I think about it. In cases where you are certain to have access to files properly matched to your SARIF results (such as if you just completed a local analysis or if your absolute URLs are versioned and point to accessible copies of the file), producers should not generate the embedded file contents. As a rule, injecting the file contents is a post-processing step that is explicitly completed because the log file is being prepared for ingestion into a results mgmt. system (attached to a work item, persisted to a common remote store, etc.).



From: sarif@lists.oasis-open.org <sarif@lists.oasis-open.org> On Behalf Of Larry Golding (Comcast)
Sent: Tuesday, April 10, 2018 9:41 AM
To: Michael Fanning <Michael.Fanning@microsoft.com>; 'James A. Kupsch' <kupsch@cs.wisc.edu>; sarif@lists.oasis-open.org
Subject: [sarif] RE: Please comment on #125


I see!  A SARIF producer enables consumers to access previous versions of an overwritten file not just by mentioning each version in the run.files dictionary, but by persisting their contents there. It seems so obvious now 😊 I can write the text for this now.


Editorial consideration: Explaining this, including an example, will take up a medium amount of space. And it’s not obvious where it does in the spec (in the run.files section? In the uriBaseId section?). So I propose to add a new non-normative Appendix to explain this corner case.


Example below. Note the interplay between originalUriBaseIds, result.location, and the property names in run.files. It’s actually kind of elegant. It gives me faith in our format that it can represent this corner case in such a natural way.




{                                      # A run object

  "originalUriBaseIds": {

    "generated-1": "file:///dev-machine/c:/project/out/obj",

    "generated-2": "file:///dev-machine/c:/project/out/obj"



  "results": [


      "ruleId": "CA4567",

      "location": {

        "physicalLocation": {

          "fileLocation": {

            "uri": "MainWindow.xaml.g.cs",

            "uriBaseId": "generated-1"


          "region": {

            "startLine": 42







  "files": {

    "#generated-1#MainWindow.xaml.g.cs": {

      "fileContent": {                 # THIS IS WHAT MAKES IT WORK

        "text": "..."



    "#generated-2#MainWindow.xaml.g.cs": {

      "fileContent": {

        "text": "..."






From: Michael Fanning <Michael.Fanning@microsoft.com>
Sent: Monday, April 9, 2018 7:59 PM
To: Larry Golding (Comcast) <larrygolding@comcast.net>; 'James A. Kupsch' <kupsch@cs.wisc.edu>; sarif@lists.oasis-open.org
Subject: RE: Please comment on #125


I’ve thought about this issue a bit. We should be thinking about an analysis that provides a hit in any generated file that isn’t under source control. For example, a generated XAML code-behind file. The corner case covers something even more problematic, a single analysis run where generated files are, for example, overwritten on a per-project basis (to a common location in some build intermediates folder). To answer your questions:


  1. This isn’t tool specific, it relates to scan targets which are themselves generated content not under source control (and which are fluid/overwritten even while some larger build analysis is taking place)
  2. The file is a valid scan target, whatever that means. A PCH file or other intermediate. A header file that is generated by some perl script. Etc.
  3. Producers SHOULD persist all files to run.files that aren’t managed by a version control system. This is just good general guidance.
  4. It may be necessary to represent multiple versions of this re-written file in the run.files dictionary, if multiple results instances exist that point to different versions of the generated content.
  5. Ditto, a viewer will need to access any version of the file referenced by any result.



From: Larry Golding (Comcast) <larrygolding@comcast.net>
Sent: Friday, April 6, 2018 2:31 PM
To: Michael Fanning <Michael.Fanning@microsoft.com>; 'James A. Kupsch' <kupsch@cs.wisc.edu>; sarif@lists.oasis-open.org
Subject: Please comment on #125


#125: Address corner case for generated files in run.files dictionary


This is the scenario where the same physical file is re-written in the course of an analysis. Please see my comments in the issue. What is the scenario here? – that is:


  • What tool is involved?
  • What is the nature of the file that’s being re-written?
  • Is it necessary to represent this file in the run.files dictionary?
  • Is it necessary to represent multiple versions of this re-written file in the run.files dictionary?
  • Would a viewer need access to any version of this file except the last one written?




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]