RE: Proposal for log file deconstruction

~~Correction.~~

From: sarif@lists.oasis-open.org <sarif@lists.oasis-open.org> On Behalf Of Larry Golding (Myriad Consulting Inc)
Sent: Friday, August 10, 2018 4:19 PM
To: OASIS SARIF TC Discussion List <sarif@lists.oasis-open.org>
Subject: [sarif] Proposal for log file deconstruction
Importance: High

For the motivation behind this proposal, please see Michael’s Issues #210 and #211. ~~Issue #210~~ In this mail, I’ll just offer some design choices and some small changes, based on that initial proposal.

We add an optional property externalFiles of type object to the run object. Each property name within this value is one of the following:

"invocations"
"conversion"
"files"
"logicalLocations"
"graphs"
"resources"
"properties"
"results"

NOTE: Unlike Michael’s proposal, I don’t include tool or versionControlDetails in the list, because they are never large.

NOTE: Unlike Michael’s proposal, I do include results. I don’t know if that was an oversight on Michael’s part, or whether he reasoned that it was ok for the root file to include one large item as long as every large item was in a separate file. But I think there’s value in being able to read the entire root file quickly.

NOTE: Following Michael, I do include "invocations", because the invocation object includes the toolNotifications array, which is potentially large, and which you might like to write out in parallel to writing out the results. (invocation.configurationNotifications is unlikely to be large, and is almost certainly populated at the start of the run, but it goes along for the ride to the external “invocations” file.)

NOTE: Following Michael, I do include "conversion", because a conversion object contains an invocation object, and so is potentially large.

The value of each property is a URI reference that specifies the location of the corresponding external file. If the URI reference is a relative reference, it is taken to be relative to the location of the root file.

For example, suppose the root file is located at "https://www.example.com/logs/MyLog.sarif":

{

"version": "2.0.0",

"runs": [

{

"externalFiles": {

"results": "MyLog.run.sarif"

}

# There is no "results" property in this root file.

}

]

}

Then the external results file is located at "https://www.example.com/logs/MyLog.run.sarif".

The contents of each external file are exactly the same as the value of the property would be if it were inlined in the root file. In this example, MyLog.run.sarif would be a valid JSON file whose root is an array:

[

{

"ruleId": "TST1001",

...

]

The proposed design also allows for “configuration by convention,” that is, it specifies conventional names for the external files. If the log file is Example.sarif, then the conventional name for the external file containing property <prop> is Example.prop.sarif.

The lookup procedure for such an “externalizable” property is as follows:

If the element is present in the SARIF file, use it.

OTHERWISE

If the SARIF file specifies a URL (or an array of URLs, if we decide to allow that) for the element, get the element from there.

OTHERWISE

If there is a resource available at a conventionally named URL, use that.

OTHERWISE

The element is missing -- and if the element is required, that's an error.

This proposal has three problems I can think of:

Problem 1: If an optional property is missing, the proposed lookup procedure requires a consumer always to probe for the existence of a conventionally named external file, even if the property was intentionally omitted.

Problem 2: How do tools validate the external files? Do we need a separate schema for each, and do we then need to decompose the SARIF schema?

Problem 3: Can a SARIF log file with multiple runs refer to external files?

I can think of a couple of approaches:

Allow externalization only in a single-run log file. You could justify this by arguing that the whole reason we allowed multi-run files in the first place was to allow multiple runs to be shipped over the wire in a single file (which is true).
If multiple runs in a single log file use externalization, require them to specify different URIs for their corresponding external files (MyLog.Run0.results.sarif, MyLog.Run1.results.sarif). But that means that you can’t aggregate arbitrary runs into a single file.

We’ve had people question the value of multi-run files in the past. I’d be willing to give up and change the schema so that a log file could contain only a single run.

Larry

sarif message