Subject: RE: [sarif] Another graph design issue: Do we need node.edgeIds
One more random thought: The first idea (nodes describe their outgoing edges, for example, node.targetNodeIds:string; dispense with the “edge” concept) means that you can’t associate properties like “weight” with an edge.
TL;DR: Let’s take DGML’s route and get rid of node.edgeIds.
I took a few minutes to study the DGML schema and the DOT wiki page. Yes, DGML is similar to our design, with their “link” playing the role of our “edge”.
Yes, I see how you could represent a simple DOT directed graph with an array of array of node ids, and yes, that would preclude adding properties to the edges. In any case, as visually appealing and efficient as the notation “a -> b -> c” is, visual appeal has never been a design goal of SARIF 😉
The “second idea” (which describes a traversal as a sequence of edge ids, as we currently do) does indeed ensure that no “edge traversal” can travel between nodes that aren’t connected. OTOH it doesn’t stop you from writing an invalid traversal [ "e1", "e3" ], where edge e1 links node n1 to node n2, and edge e3 links node n3 to node n4.
This is a good catch! Either nodes can describe their outgoing edges only(by describing all the nodes they link to), or we remove the edge ids from the nodes, as the edge objects define the source/targets for each edge. If we take the former case, a graph traversal could simply consist of the set of node ids in proper order. If the latter, we’d have edge ids.
The first idea is interesting because we dispense with an edge concept. The second idea is compelling because it naturally ensures no graph traversal propose a link between two nodes that isn’t mirrored in the comprehensive definition of the graph.
I see that DGML takes the nodes definitions + link defs approach.
DOT is interesting. Look at the digraph example and how it efficiently defines linkage between nodes. We could play this trick in JSON as well with an array of node id arrays to define the graph <g>. But there’d be no easy way to label the edges with this approach.
Here’s what I wrote about node.edgeIds:
A node object SHALL have a property named edgeIds whose value is an array of unique (§3.6.2) strings, each of which identifies an edge that starts at this node. Thus, each string in the array SHALL be equal to the sourcedNodeId property (§3.27.4) of one of the edge objects (§3.27) in the graph object (§3.25) in which it occurs. Likewise, the value of every such sourceNodeId SHALL occur in the edgeIds array.
That is, you don’t strictly need node.edgeIds; you could derive it by looking at every edge object and selecting those whose sourceNodeId property matched node.id. And then you wouldn’t have to worry about consistency between the node.edgeIds and the edge.sourceNodeIds.
OTOH, if the graph is large, it might be burdensome to compute node.edgeIds from the set of all edge.sourceNodeIds.