Subject: RE: [sarif] Change draft for #113 (hostname guidance)
TL;DR: I believe our thinking so far is valid (not off-track). I will change my recommendation on the form of a file URI with no authority, I will add more examples, and I will scrub the spec’s code samples as you suggest.
With regard to using "//" when the host name is omitted:
The forms "file:/dir/file.c" and "file:///dir/file.c" are both permitted by the grammar of RFC 8089. Appendix A, “Differences from Previous Specifications”, emphasizes this:
According to the definition in [RFC1738], a file URL always started
with the token "file://", followed by an (optionally blank) host name
and a "/". The syntax given in Section 2 makes the entire authority
component, including the double slashes "//", optional.
But RFC 8089 also says:
As a special case, the "file-auth" rule can match the string
"localhost" that is interpreted as "the machine from which the URI is
being interpreted," exactly as if no authority were present. Some
current usages of the scheme incorrectly interpret all values in the
authority of a file URI, including "localhost", as non-local. Yet
others interpret any value as local, even if the "host" does not
resolve to the local machine. To maximize compatibility with
previous specifications, users MAY choose to include an "auth-path"
with no "file-auth" when creating a URI.
If you follow the grammar through, that means to prefer the form "file:///dir/file.c" after all, so I’ll change the spec to say that.
With regard to Windows file paths with drive letters:
The blog you cite is from 2006. It states:
The standard for the file scheme doesn’t give specific instructions on how to convert a file system path for a specific operating system into a file URI. While the standard defines the syntax of the file scheme, it leaves the conversion from file system path to file URI up to the implementers.
But RFC 8089, dated February 2017, does offer guidance:
D.2. DOS- and Windows-Like Systems
When mapping a DOS- or Windows-like file path to a file URI, the
drive letter (e.g., "c:") is typically mapped into the first path
So "file:///c:/dir/file.c" is valid. It’s not just a file name “crammed into a URI”. The grammar in RFC 3986 is explicit on this point:
path-absolute = "/" [ segment-nz *( "/" segment ) ]
segment-nz = 1*pchar
segment = *pchar
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
That is, a path segment can include a colon.
In summary, any of these are valid:
"file:///c:/root/dir/file.c" # A local file.
"file://MYMACHINE/c:/root/dir/file.c" # A file on another machine.
"file://MYMACHINE/shared/dir/file.c" # A UNC path; "c:\root" was shared out with share name "shared".
Note that in the last form, "c$" is a valid segment because "$" is one of the "sub-delims".
If you emit the host name you should still include the // in the examples, so prefer file:///c:/file.cpp over your example file:/c:/file.cpp.
In practice, your example works.
I am beginning to wonder if we’re not off-track in our thinking. If you are rendering a Windows file path (which according to the blog below is not a URI but can be delivered as a URI), then the host name should not be included. The host name is included if the file is actually being referenced as a UNC path.
Consider this example, a file that exists in c:\public\file.cpp on MYMACHINE. Assume that c:\public is shared across the network. These renderings are acceptable:
// No host name. Why? We are referencing the content as a local windows file path, crammed into a URI
// An actual file URI. No drive letter, this file comes across that network from a shared location
// By convention, Windows shares drives automatically as C$, D$, etc. This is a valid URI. If you have access to this implicitly shared thing (available to admins), you can access this file
The change draft for Issue #113: “Provide guidance on including a hostname in a uriBaseIdValue” is available:
I’ll move its adoption at the next TC meeting.