[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [sarif] "uri" vs. "location"
Right. And Appendix F also lists run.originalUriBaseIds as containing non-deterministic content. And this conflict is the specification bug that we need to resolve.
I apologize for losing track of this detail in discussion when you mentioned the value of using fileLocations to DRY out URI references. From: Larry Golding (Comcast) <larrygolding@comcast.net>
TL;DR: When I address #175 by changing the types of some properties from string to
fileLocation or vice versa, I’ll also remove the language in §3.3.4 that permits deterministic
fileLocation objects. <rant> The spec as it stands today gives explicit guidance on how to use
fileLocation objects to represent both deterministic and non-deterministic URIs: A fileLocation object that represents a
non-deterministic URI
SHOULD specify a uri property (§3.3.2) containing a relative URI that
is deterministic, for example, the relative path from the root of a source code enlistment to the file. It
SHOULD also specify a uriBaseId property (§3.3.3) that captures the non-deterministic portion of the URI, for example, the absolute path to the root of the source code enlistment. … A fileLocation object that represents a
deterministic URI
MAY specify either a uri property containing an absolute URI, or a
uri property containing a relative URI together with a
uriBaseId property.
The use of a relative URI in this case might make the SARIF log file shorter if there are many deterministic URIs that share a common prefix. Please take a look at §3.3.4 (Guidance on the use of fileLocation objects) in its entirety to see the pains I took to describe these two distinct uses of
fileLocation. We explicitly approved this language, appealing to brevity as the justification for deterministic
fileLocation objects. I am not saying we should keep this language. I’m just pointing out that my notion of using URI base ids for the sake of brevity isn’t just something I was carrying around in my head – it’s actually in the spec. </rant> Larry From: Larry Golding (Comcast) <larrygolding@comcast.net>
Ouch indeed. In fact, now I remember we even discussed the possibility of adding another property that would function just like
run.uriBaseIds, except the values would be required to be deterministic (say,
run.deterministicUriBaseIds). We rejected that option. But apparently we didn’t come to a common understanding of what it meant to “reject” it. You thought it meant “Don’t put deterministic URIs in
run.uriBaseIds.” I thought it meant “Don’t define a
separate property for deterministic URIs; just shove them in
run.uriBaseIds”. Unfortunately for me, your understanding was the correct one. It has to be, for the reason you point out: A post-processing tool that converts a SARIF file to deterministic form will omit
run.uriBaseIds, and then I can’t find my work items any more. So we agree that we use fileLocation only for non-deterministic URIs. So, as part of this issue (#175), I now have to go through the spec and fix up the properties that incorrectly use
fileLocation. I’ll have that change draft ready for TC #19. Larry From: Michael Fanning <Michael.Fanning@microsoft.com>
Ouch, you just pointed out an issue. We defined uriBaseIds in part as a mechanism to allow a post-processing tool to elide non-deterministic information. If people simply use this mechanism to DRY out their URLs, they potentially break
their absolute URLs without providing an obvious way to restore them (as people can do for version controlled scenarios in some cases by pointing to a local enlistment). You have a case for
this being a useful distinction but it is a non-obvious one in the spec. I think my personal preference here is that it’s much cleaner to separate things we expect to be absolute (named xxxUri) vs. things we expect to be rebased (fileLocation). Yes, we lose the convenience of reducing log file size but I think
it’s a useful trade-off. Michael From: Larry Golding (Comcast) <larrygolding@comcast.net>
Yes, we have discussed it before. My counter-argument was that although we
invented “URI base ids” to improve determinism, they have the nice side effect of reducing repetition and file size, even for deterministic URIs. For example: { # A run object "uriBaseIds": { "WORKITEMS": "https://github.com/Microsoft/sarif-sdk/issues"
# Deterministic, but nice to DRY out. }, "results": [ { "workItemLocation": { "uriBaseId": "WORKITEMS", "uri": "895" } } ] } That’s why I think “one/many” is an appropriate distinction. BUT! Even a singleton URI needs to be represented as a
fileLocation if it’s non-deterministic. At the moment, there is no example in the spec of a property that’s both a singleton and non-deterministic. But to guard against that possibility, I’m ok with stating the
guideline as:
Use fileLocation if the URI describes something of which there are potentially many per run (for example, a GitHub issue),
OR if the URI is non-deterministic (it might vary from machine to machine or from run to run). Use a URI string if the URI describes something of which there is only one per run,
AND is deterministic (for example, a GitHub repo). Note that there is no spec change involved here if we adopt this policy. If we adopt a different policy (one that requires us to change the type of an existing property),
then we need a spec change. Larry From: Michael Fanning <Michael.Fanning@microsoft.com>
I’m not sure I agree with this distinction. I think the location concepts makes most sense when we expect a URI (particularly a file URI) that might have a non-deterministic root. We don’t expect any of the things you’ve cited below to be rooted in a non-deterministic way (off an enlistment, for example). So on first blush I’d expect all these things to be URIs. I think I’ve made this point before, though, and we still have locations here, so perhaps I’m forgetting a line of reasoning someone put forward that’s relevant? From: sarif@lists.oasis-open.org <sarif@lists.oasis-open.org>
On Behalf Of Larry Golding (Comcast) At TC #17, while discussing
Issue #163 (“Add result.workItemLocation”), I pointed out that some “URI-valued” properties are represented as strings, while others are represented as
fileLocation objects. I took an action to decide on a policy for deciding which type to use, and filed
Issue #175, “Decide on policy for fileLocation vs. URI,” where I wrote:
We have two URI-like properties that are represented by URI-valued strings:
·
tool.downloadUri
·
versionControlDetails.uri (the URI of the repo) ... while all the others (14 of them) are represented by fileLocation objects,
including:
·
result.workItemLocation
·
rule.helpLocation Basically I've used fileLocation where
there are potentially "many" of something (work items, rules), and URI-valued strings where there's only one of something (tool download location, VCS repo).
Is this a valid distinction? Even if you can rationalize it, is it confusing? Should we use fileLocationeverywhere? Having thought about this some more, I think the approach we’ve taken so far makes sense: If there are potentially many of something, use
fileLocation; if there’s only one of it, use a URI-reference-valued string. I’m going to propose this at TC #19 (it’s too late to put it on the agenda for tomorrow’s TC #18). Larry |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]