OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

sarif-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: SARIF public comments


My comments are in response to the public request for comments on the SARIF 2.0 proposal. I am a former senior developer of the Fortify Static Code Analyzer, though I now work for a different company. At Fortify, my responsibilities includedÂFVDL/FPR outputÂand scalability of the scan engine.

Primary concern:Â

SARIF defines a âlog file viewerâ as a user interaction tool that displays scan results to human users as a motivating scenario for a common file format. However, the proposal failed to include scalability as a goal and hence produced a file format that scales poorly to large result sets. User interaction tools will be required to spend minutes to hours parsing and indexing the SARIF file before users can interact with the results, making the tools nearly useless. I suggest either removing user interaction tools as a motivating use case and instead focusing on offline transfer of results between cooperating utilities, or redeveloping the file format to support scalability.

To put concrete numbers on this: When I was a senior developer of Fortify Static Code Analyzer, I saw issue results from a single customer scan in excess of 8 million issues. The uncompressed XML file containing those results was 160 GB in size; it was one-sixth of my hard drive. I regularly saw scan results above 1 million issues. Scan results with multiple hundreds of thousands of issues occurred daily. Opening large result sets in a log file viewer took minutes to hours and dozens of GB of RAM. SARIF compounds this scale by permitting results from multiple scans to appear in the same file.

SARIF does not scale to large result sets. It requires full file text parsing from the start âsarifLog {â to the matching â}â at the last character of the file. There is nothing particularly wrong with this if consumer utilities are processing SARIF inputs in offline batch jobs with large memory allocations. However, it is a serious detriment to user interaction tools whose objective is to display structured results to a human analyst. The tools will block or offer no useful interaction while parsing the full file contents.

If scalability to large issue sets is desired, substantial rewrites to the proposed format would be required. The format would need to be random-access readable: a user interaction tool can then parse out only those objects necessary to satisfy the userâs current view of the issues by seeking to objects and parsing individual objects. It would also be desirable for the format to be more compact in storage than JSON can provide, without resorting to full-file compression, as compression destroys the ability to seek to random offsets when reading.

Nits:

Section 1.2. Property Bag. Avoid the implementation-specific âJSONâ term in your definitions. Define this as âan unordered set of properties with arbitrary camelCase namesâ.

Section 3.3. I donât see any discussion of canonicalization anywhere in this section. How are symlinks, hardlinks, ./, and ../ handled? (Hardlinks cannot be canonicalized). Perhaps the cited RFC addresses it, but I still think it would be worth calling the readers attention to the matter.

Thank you for your efforts and your consideration.

Jon


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]