OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

sarif message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [sarif] "uri" vs. "location"

Right. And Appendix F also lists run.originalUriBaseIds as containing non-deterministic content. And this conflict is the specification bug that we need to resolve.


  • run.originalUriBaseIds


I apologize for losing track of this detail in discussion when you mentioned the value of using fileLocations to DRY out URI references.

From: Larry Golding (Comcast) <larrygolding@comcast.net>
Sent: Tuesday, May 29, 2018 5:13 PM
To: Michael Fanning <Michael.Fanning@microsoft.com>; sarif@lists.oasis-open.org
Subject: RE: [sarif] "uri" vs. "location"


TL;DR: When I address #175 by changing the types of some properties from string to fileLocation or vice versa, I’ll also remove the language in §3.3.4 that permits deterministic fileLocation objects.




The spec as it stands today gives explicit guidance on how to use fileLocation objects to represent both deterministic and non-deterministic URIs:


A fileLocation object that represents a non-deterministic URI SHOULD specify a uri property  (§3.3.2) containing a relative URI that is deterministic, for example, the relative path from the root of a source code enlistment to the file. It SHOULD also specify a uriBaseId property (§3.3.3) that captures the non-deterministic portion of the URI, for example, the absolute path to the root of the source code enlistment.


A fileLocation object that represents a deterministic URI MAY specify either a uri property containing an absolute URI, or a uri property containing a relative URI together with a uriBaseId property. The use of a relative URI in this case might make the SARIF log file shorter if there are many deterministic URIs that share a common prefix.


Please take a look at §3.3.4 (Guidance on the use of fileLocation objects) in its entirety to see the pains I took to describe these two distinct uses of fileLocation. We explicitly approved this language, appealing to brevity as the justification for deterministic fileLocation objects.


I am not saying we should keep this language. I’m just pointing out that my notion of using URI base ids for the sake of brevity isn’t just something I was carrying around in my head – it’s actually in the spec.






From: Larry Golding (Comcast) <larrygolding@comcast.net>
Sent: Tuesday, May 29, 2018 2:36 PM
To: 'Michael Fanning' <Michael.Fanning@microsoft.com>; 'sarif@lists.oasis-open.org' <sarif@lists.oasis-open.org>
Subject: RE: [sarif] "uri" vs. "location"


Ouch indeed. In fact, now I remember we even discussed the possibility of adding another property that would function just like run.uriBaseIds, except the values would be required to be deterministic (say, run.deterministicUriBaseIds).


We rejected that option. But apparently we didn’t come to a common understanding of what it meant to “reject” it. You thought it meant “Don’t put deterministic URIs in run.uriBaseIds.” I thought it meant “Don’t define a separate property for deterministic URIs; just shove them in run.uriBaseIds”.


Unfortunately for me, your understanding was the correct one. It has to be, for the reason you point out: A post-processing tool that converts a SARIF file to deterministic form will omit run.uriBaseIds, and then I can’t find my work items any more.


So we agree that we use fileLocation only for non-deterministic URIs. So, as part of this issue (#175), I now have to go through the spec and fix up the properties that incorrectly use fileLocation. I’ll have that change draft ready for TC #19.




From: Michael Fanning <Michael.Fanning@microsoft.com>
Sent: Tuesday, May 29, 2018 2:17 PM
To: Larry Golding (Comcast) <larrygolding@comcast.net>; sarif@lists.oasis-open.org
Subject: RE: [sarif] "uri" vs. "location"


Ouch, you just pointed out an issue. We defined uriBaseIds in part as a mechanism to allow a post-processing tool to elide non-deterministic information. If people simply use this mechanism to DRY out their URLs, they potentially break their absolute URLs without providing an obvious way to restore them (as people can do for version controlled scenarios in some cases by pointing to a local enlistment).


You have a case for this being a useful distinction but it is a non-obvious one in the spec.


I think my personal preference here is that it’s much cleaner to separate things we expect to be absolute (named xxxUri) vs. things we expect to be rebased (fileLocation). Yes, we lose the convenience of reducing log file size but I think it’s a useful trade-off.




From: Larry Golding (Comcast) <larrygolding@comcast.net>
Sent: Tuesday, May 29, 2018 1:32 PM
To: Michael Fanning <Michael.Fanning@microsoft.com>; sarif@lists.oasis-open.org
Subject: RE: [sarif] "uri" vs. "location"


Yes, we have discussed it before. My counter-argument was that although we invented “URI base ids” to improve determinism, they have the nice side effect of reducing repetition and file size, even for deterministic URIs. For example:


{                                                                  # A run object

  "uriBaseIds": {

    "WORKITEMS": "https://github.com/Microsoft/sarif-sdk/issues"   # Deterministic, but nice to DRY out.



  "results": [


      "workItemLocation": {

        "uriBaseId": "WORKITEMS",

        "uri": "895"






That’s why I think “one/many” is an appropriate distinction. BUT! Even a singleton URI needs to be represented as a fileLocation if it’s non-deterministic. At the moment, there is no example in the spec of a property that’s both a singleton and non-deterministic. But to guard against that possibility, I’m ok with stating the guideline as:


Use fileLocation if the URI describes something of which there are potentially many per run (for example, a GitHub issue), OR if the URI is non-deterministic (it might vary from machine to machine or from run to run).

Use a URI string if the URI describes something of which there is only one per run, AND is deterministic (for example, a GitHub repo).


Note that there is no spec change involved here if we adopt this policy. If we adopt a different policy (one that requires us to change the type of an existing property), then we need a spec change.




From: Michael Fanning <Michael.Fanning@microsoft.com>
Sent: Tuesday, May 29, 2018 12:45 PM
To: Larry Golding (Comcast) <larrygolding@comcast.net>; sarif@lists.oasis-open.org
Subject: RE: [sarif] "uri" vs. "location"


I’m not sure I agree with this distinction. I think the location concepts makes most sense when we expect a URI (particularly a file URI) that might have a non-deterministic root.


We don’t expect any of the things you’ve cited below to be rooted in a non-deterministic way (off an enlistment, for example). So on first blush I’d expect all these things to be URIs.


I think I’ve made this point before, though, and we still have locations here, so perhaps I’m forgetting a line of reasoning someone put forward that’s relevant?


From: sarif@lists.oasis-open.org <sarif@lists.oasis-open.org> On Behalf Of Larry Golding (Comcast)
Sent: Tuesday, May 29, 2018 11:49 AM
To: sarif@lists.oasis-open.org
Subject: [sarif] "uri" vs. "location"


At TC #17, while discussing Issue #163 (“Add result.workItemLocation”), I pointed out that some “URI-valued” properties are represented as strings, while others are represented as fileLocation objects. I took an action to decide on a policy for deciding which type to use, and filed Issue #175, “Decide on policy for fileLocation vs. URI,” where I wrote:

We have two URI-like properties that are represented by URI-valued strings:

·         tool.downloadUri

·         versionControlDetails.uri (the URI of the repo)

... while all the others (14 of them) are represented by fileLocation objects, including:

·         result.workItemLocation

·         rule.helpLocation

Basically I've used fileLocation where there are potentially "many" of something (work items, rules), and URI-valued strings where there's only one of something (tool download location, VCS repo).

Is this a valid distinction? Even if you can rationalize it, is it confusing? Should we use fileLocationeverywhere?


Having thought about this some more, I think the approach we’ve taken so far makes sense: If there are potentially many of something, use fileLocation; if there’s only one of it, use a URI-reference-valued string.


I’m going to propose this at TC #19 (it’s too late to put it on the agenda for tomorrow’s TC #18).



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]