[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [sarif] Alternatives for embedding links
The solution needs to generalize for relatedLocations, stacks, and code flows. I suggest we consider the following:
Re: the name of this property, I see two sides. On the one hand, ‘id’ is a concise, descriptive general purpose name. If we take a feature such as Andrew has proposed (to render several SARIF constructs as graphs), this property could have
other value. Balancing this, those other scenarios do not exist and ‘embeddedLinkId’ is extremely descriptive of its purpose. Michael From: Larry Golding (Comcast) [mailto:larrygolding@comcast.net]
You’ve persuaded me of two things:
Larry From: Michael Fanning [mailto:Michael.Fanning@microsoft.com]
It’s very reasonable to expect that one event or more events from a code flow are of particular interest and for these to be called out. See Paul’s example in issue
#33 (this example, in fact, is what made me realize we have this deficit). PREfast defines a notion of key events. I believe SDV’s output could be improved to take advantage of this feature. Related locations is actually a bit of a dodge, allowing for tools
that don’t provide complete code flows to deliver some supporting information. If you really need to dig into an XSS issue, for example, you need the full code flow from untrusted input to dangerous call (not just a related location that tells you where the
source came in). There are many patterns that obviously could usefully connect to code flows, including complex buffer overrun issues (as in Paul’s example), use of dangling pointer, etc. For stacks, a memory leak detection tool might produce a result that
needs to double-click into the stack itself. There are many code paths that potentially lead to an allocator. Leak detection is often therefore organized around allocations associated with a specific stack. The result is that an embedded link in such an issue
needs to invoke the stack (i.e., there is no one related location that makes sense to call out on its own). The availability of a stack or code flow is a powerful tool for diagnosing root cause (and result validity). My opinion is that we should pursue embedded links that refer to them. I’m comfortable with the costs/complexity of using an ‘id’
property on a physical location to do this. This seems less costly than having to duplicate one or more locations (which provide a double-click that doesn’t connect you to the richer code-flow view experience). Michael From: Larry Golding (Comcast) [mailto:larrygolding@comcast.net]
I don’t see how the fact that a tool (such as the Microsoft “Static Driver Verifier”) which exclusively uses code flows thereby has a need to encode links in their result.message property. SDV does not do that; they use simple messages
like "The dispatch routine has returned without releasing the cancel spinlock" and
"The property is satisfied as the driver defines a AddDevice routine.". If they chose to produce a result.message that referenced one of the locations in the code flow (for example,
"The dispatch routine has returned without releasing the cancel spinlock, which was created [here](0)"), they could use the workaround we have discussed, of duplicating that location into
the result.relatedLocations array. Introducing an emdeddedLinkId property would avoid the need to duplicate the location, at the cost (as you say below) of the additional producer-side complexity of generating that property, and the additional consumer-side complexity of
searching for the location with that property. Larry From: Michael Fanning [mailto:Michael.Fanning@microsoft.com]
Well, any customer that is exclusively using code flows to troubleshoot issues would have an obvious need for it. The MS static driver verifier is such a tool. I’m not yet convinced there isn’t an idea out there that satisfies all concerns. Here’s a suggestion, for example, might trigger other thoughts if you don’t actually buy it: we could keep the format you describe, which is simply a number.
Instead of this number representing strictly the index into related locations, however, it could be an id that is guaranteed to exist as a property value for one and only one physical location which exists either in related locations, a stack or a code flow. Now we still have the simple format for the embedded link that Jim suggested. The additional complexity is a ‘embeddedLinkId’ property that needs to be added to physicalLocation (or just ‘id’). Now imagine what the consumer side looks like: on selecting an issue, a viewer must parse all of related locations, stacks and code flows associated with reach result (tools tend to only populate one of these, but this isn’t really relevant).
On detecting an id associated with a location, the viewer would cache that physical location in order to properly respond to an embedded link. Michael From: Larry Golding (Comcast) [mailto:larrygolding@comcast.net]
I can’t think of a way to do it that does not involve encoding the information about
which code flow you want to link to into the message, which leads to something like the mini-language from your message of this morning (Monday, November 13, 2017 10:21 AM). Lacking information about customer demand for this feature (“click on a link in a message, and navigate directly into the code flow specified by the link”), I would hesitate to complicate the format to support it.
From: Michael Fanning [mailto:Michael.Fanning@microsoft.com]
I definitely think that minimizing parsing is important. But it would be nice if you could click directly into the code-flow experience, in particular. Can you think of a way to do this? We could advise viewers to look for a stack location or code flow location that matches the related location and to make a guess about where the user is jumping to (into a stack or code flow). This seems like a lot of work (and locations
could be duplicated). From: Larry Golding (Comcast) [mailto:larrygolding@comcast.net]
Rather than introducing a JSONPath parser to parse references such as
codeFlow[0].annotatedCodeLocations[2].physicalLocation, I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears
in (for example) a codeFlow, then recapitulate the location in the relatedLocations array. From: Michael Fanning [mailto:Michael.Fanning@microsoft.com]
Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted
on in the TC following). My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame
location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question). [here]{relatedLocations[0]}" [here]{stacks[0].frames[2].physicalLocation}" [here]{codeFlow[0].annotatedCodeLocations[2].physicalLocation}" Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow. Pros:
Cons:
NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to
have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github
issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it. Michael From: sarif@lists.oasis-open.org [mailto:sarif@lists.oasis-open.org]
On Behalf Of Larry Golding (Comcast) Hello all, Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup
such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal. Both options involve using a syntax borrowed from Markdown to specify the link:
[link text](link target). They differ in how
link target is expressed. Option 1: Mini-language
The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “physicalLocation” object (see
Section 3.19 of the spec), with three properties:
This option looks
like this: {
"version":
"1.0.0",
"runs": [ {
"tool": {
"name":
"TaintTracker" },
"results": [ {
"ruleId":
"CA2001",
"locations": [ { "analysisTarget":
{ "uriBaseId":
"SRCROOT", "uri":
"src/db/sql.cs", "region":
{ "startLine":
63, "startColumn":
12, "endColumn":
18 } } } ],
"message":
"Tainted data is used to execute a SQL command. The data entered the system
[here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')" } ] } ] } The link text is “here”, and the
link target is expressed in the mini-language as follows: $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data',
$(SRCROOT) refers to the
uriBaseId,
src/ui/input.cs is the uri, and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your
SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text. This design has an interesting consequence: since the mini-language specifies everything that a
physicalLocation object specifies, we could consider
removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows: {
"version":
"1.0.0",
"runs": [ {
"tool": {
"name":
"TaintTracker" },
"results": [ {
"ruleId":
"CA2001",
"locations": [ { "analysisTarget":
"$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18" } ],
"message":
"Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')" } ] } ] } The analysisTarget property, whose value was previously a
physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not
require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the
physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations
The second option expresses the link target as an index into the
result.relatedLocations array. To understand this option, you need to know that SARIF defines a property
relatedLocations on the
result object. Section 3.17.12 explains that this property contains: … an array of one or more unique (§3.9)
annotatedCodeLocation objects (§3.25), each of which represents
a location relevant to understanding the result. In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks
like this: {
"version":
"1.0.0",
"runs": [ {
"tool": {
"name":
"TaintTracker" },
"results": [ {
"ruleId":
"CA2001",
"locations": [ {
"analysisTarget": {
"uriBaseId":
"SRCROOT",
"uri":
"src/db/sql.cs",
"region": {
"startLine": 63,
"startColumn": 12,
"endColumn": 18 } } } ],
"message":
"Tainted data is used to execute a SQL command. The data entered the system
[here](0)",
"relatedLocations": [ {
"message":
"source of tainted data",
"physicalLocation": {
"uriBaseId":
"SRCROOT",
"uri":
"src/ui/input.cs",
"region": {
"startLine": 20,
"startColumn": 4 } } } ] } ] } ] } The link text is “here”, and the
link target is expressed as an index into the
relatedLocations array. Note that in this option, the “hover message” (“source of tainted data”) appears as the
message property of the
annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options
Option 1 (mini-language) has these advantages:
Option 2 (index into relatedLocations) has these advantages:
#4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in
themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the
run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”. We will discuss this further at the next TC meeting. Thanks for reading all of this! Larry |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]