OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

sarif message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [sarif] Alternatives for embedding links


Hello all,

To add a little bit different perspective on the available options, the approach taken by TOIF involves a handful of semantic elements to identify “location relevant to understanding the results”. 

While the main elements are Finding and Code location, TOIF also introduced two so-called “semantic elements”: Statement and Data Element.

There are at least three important “roles” of a statement with regards to a finding (this assumes that many findings involve a certain dataflow): 
- sink of a finding. A sink statement is part of a “necessary condition” of the finding. For example, in a buffer overflow a “sink” is a statement that accesses a buffer.
- source of a finding. A source statement is part of the “sufficient condition” of the finding. 
- transit statement. A transit statement is part of the dataflow between the source(s) and the sink.

Note that this approach is used in the Julliet test set.

One advantage of this approach is that it makes the roles of code locations explicit as an integral part of the specification. Compliant tools can then introduce visual markup for the presentation purposes in a safe way.  

Identifying semantic element relevant to a finding cannot in general be done in a non-intrusive modification of an SCA tool, most “legacy” SCA tools do not output details of the statements involved in a finding, only code locations. 

Here are some definitions taken from the TOIF spec, followed by a couple of examples.

Statement

Definition: An basic identifieable unit of behavior in software such as a source code statement, a basic block, a operator.

Note: this corresponds to KDM ActionElement class
Note: Defined in Figure 7. UML class diagram Semantic Statement

Statement has code location Statement is involved in Finding

Synonym: Finding is associated with statement
Possibility: each Finding may be associated with many statement

Statement is part of sink of Finding
Note: In is a stronger form of the fact type Statement is involved in Finding where the role of Statement with

respect to the Logical Weakness Model is known (i.e. sink)

Statement is part of source of Finding
Note: In is a stronger form of the fact type Statement is involved in Finding where the role of Statement with

respect to the Logical Weakness Model is known (i.e. source)

Statement is preceded by Statement

page25image10680 page25image10840 page25image11000 page25image11160 page25image11320 page25image11480 page25image11640 page25image11800 page25image11960 page25image12120 page25image12280

Data element

Definition: An basic identifieable data item is software such as global and local variables, records, formal parameters and constants.

Note: This corresponds to the KDM DataElement class
Note: Defined in Figure 8. UML class diagram Semantic Data

Data  element is defined at Code Location 

Data  element is involved in Finding

Data  element has name
Data  element is involved in Statement 


An example of using this “semantic markdown” in TOIF:

<fact xmi:type="toif:Statement" xmi:id="s10"/>

<fact xmi:type="toif:StatementIsInvolvedInFinding" statement="s10" finding="f10"/>

<fact xmi:type="toif:StatementIsSinkOfFinding" statement="s20" finding="f20"/>

<fact xmi:type="toif:StatementIsSourceOfFinding" statement="s30" finding="f30"/>

<fact xmi:type="toif:StatementPrecedesStatement" statement1="s30" statement2="s20"/>

<fact xmi:type="toif:StatementPrecedesStatement" statement1="s10" statement2="s20"/>

<fact xmi:type="toif:StatementHasCodeLocation" statement1="s10" location="loc10"/>

<fact xmi:type="toif:Statement" xmi:id="s20"> <description text=”*pHandler( pData, 0x200 );” />

</fact>

<fact xmi:type="toif:Statement" xmi:id="s30"/>

<fact xmi:type="toif:Finding" xmi:id="f10"/> <fact xmi:type="toif:Finding" xmi:id="f20"/>

<fact xmi:type="toif:CodeLocation" xmi:id="loc10"> <linenumber linenumber="1856"/>

</fact>



<fact xmi:type="toif:DataElement" xmi:id="d10"> <name name="X"/>

<description text=”struct pData * X[ MAXDATA];” /> </fact>

<fact xmi:type="toif:DataIsInvolvedInFinding" data="" finding="f10"/>

<fact xmi:type="toif:DataIsInvolvedInFinding" data="" project="f20"/>

<fact xmi:type="toif:DataIsInvolvedInStatement" data="" statement="s20"/>

<fact xmi:type="toif:DataIsInvolvedInStatement" data="" statement="s30"/>

<fact xmi:type="toif:DataIsDefinedAtCodeLocation" data="" location="loc10"/>

<fact xmi:type="toif:Statement" xmi:id="s20"/> <fact xmi:type="toif:Statement" xmi:id="s30"/>

<fact xmi:type="toif:Finding" xmi:id="f10"/> <fact xmi:type="toif:Finding" xmi:id="f20"/>

<fact xmi:type="toif:CodeLocation" xmi:id="loc10"> <linenumber linenumber="1856"/>

</fact>


Cheers,

Nick

On Nov 9, 2017, at 7:02 PM, Larry Golding (Comcast) <larrygolding@comcast.net> wrote:

Hello all,
 
Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.
 
Both options involve using a syntax borrowed from Markdown to specify the link: [link text](link target). They differ in how link target is expressed.

Option 1: Mini-language

The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “physicalLocation” object (see Section 3.19 of the spec), with three properties:
 
  1. A uri property, which is what it sounds like: the URI of the source file.

  2. A region property, which specifies a region within the source file, using properties such as startLine and startColumn. For a full explanation of region, see Section 3.20 – but you don’t need to understand those details to understand this option.
 
  1. A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option.

This option looks like this:
 

{

  "version": "1.0.0",

  "runs": [

    {

      "tool": {

        "name": "TaintTracker"

      },

 

      "results": [

        {

          "ruleId": "CA2001",

          "locations": [

            {

                     "analysisTarget": {

                       "uriBaseId": "SRCROOT",

                       "uri": "src/db/sql.cs",

                       "region": {

                         "startLine": 63,

                         "startColumn": 12,

                         "endColumn": 18

                       }

              }

            }

          ],

          "message": "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"

        }

      ]

    }

  ]

}

 
The link text is “here”, and the link target is expressed in the mini-language as follows:
 

$(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data',

 
$(SRCROOT) refers to the uriBaseId, src/ui/input.cs is the uri, and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.
 
This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:
 

{

  "version": "1.0.0",

  "runs": [

    {

      "tool": {

        "name": "TaintTracker"

      },

 

      "results": [

        {

          "ruleId": "CA2001",

          "locations": [

            {

                     "analysisTarget": "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"

            }

          ],

          "message": "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"

        }

      ]

    }

  ]

}

 

The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other).

Option 2: Index into relatedLocations

The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:
 

… an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result.

 
In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this:
 

{

  "version": "1.0.0",

  "runs": [

    {

      "tool": {

        "name": "TaintTracker"

      },

 

      "results": [

        {

          "ruleId": "CA2001",

          "locations": [

            {

              "analysisTarget": {

                "uriBaseId": "SRCROOT",

                "uri": "src/db/sql.cs",

                "region": {

                  "startLine": 63,

                  "startColumn": 12,

                  "endColumn": 18

                }

              }

            }

          ],

          "message": "Tainted data is used to execute a SQL command. The data entered the system [here](0)",

          "relatedLocations": [

            {

              "message": "source of tainted data",

              "physicalLocation": {

                "uriBaseId": "SRCROOT",

                "uri": "src/ui/input.cs",

                "region": {

                  "startLine": 20,

                  "startColumn": 4

                }

              }

            }

          ]

        }

      ]

    }

  ]

}

 
The link text is “here”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“source of tainted data”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0.

Comparison of the options

Option 1 (mini-language) has these advantages:
  1. It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF).
  2. It enables someone reading the raw SARIF file to see the link target directly in the context of the message.
 
Option 2 (index into relatedLocations) has these advantages:
  1. It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language?
  2. It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”).
  3. It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”.
  4. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5).
  5. It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.
 
#4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.
 
We will discuss this further at the next TC meeting.
 
Thanks for reading all of this!
Larry



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]