ScanRE


Static Code Analysis Toolkit for Vulnerability Detection and Mitigation.

Explore the docs »

Features   ·   Demonstration   ·   Performance

Table of Content

Demonstration

https://github.com/SaketThota/ScanRE/assets/52862591/cbbc158d-dc63-4ffb-87a6-dea886d69955

Or view the video on YouTube: ScanRE — Static Code Analysis Toolkit

Motivation

Alert Fatigue!

Frequent alerts about cybersecurity threats can lead to so-called “alert fatigue” which numbs the staff to cyber alerts, resulting in longer response times or missed alerts. The fatigue, in turn, can create burnout in IT departments, which then results in more turnover among the staff. When replacement personnel are hired, the cycle begins again.

That’s according to a recently released report conducted by International Data Corporation (IDC) for Critical Start, a cybersecurity consulting and managed detection and response company. IDC surveyed more than 300 U.S.-based IT executives at companies with 500 or more employees. It found that:

image

Features

image

What is Static Code Analysis?

Static analysis is a method of debugging that is done by automatically examining the source code without having to execute the program. This provides developers with an understanding of their code base and helps ensure that it is compliant, safe, and secure. To find defects and violations of policies, checkers perform an analysis on the code.

image

They operate by querying or traversing the model, looking for particular properties or patterns that indicate defects. Sophisticated symbolic execution techniques explore paths through a control-flow graph. The data structure representing paths that might be traversed by a program during its execution. A warning is generated, if the path exploration notices an anomaly.

Architecture

In a hackathon, the main tradeoff is between the code quality, the features and finally, the time we have been given to come up with a complete solution. We, therefore, had to take a decision regarding which features we were to prioritize over others and which features we could leave until the end. It always helps to architect your solution before you put down any code, and that is what we did.

*
├───mh-backend
└───mh-frontend

Frontend

Backend

Detailed instructions can be found in the individual directories on setting up the project.

Have fun :)

A high level layout of our system is shown below.

</br>

image

Individual components

</br>

image

We decided to go ahead and integrate these two, i.e. semgrep and ORT together with ChatGPT (GPT-4) so that we could ensure that we get the best of all worlds (Scanning user code as well as that of dependencies as well as get suggested mitigations)

And that was the heart of our solution. It reduces alert fatigue, allows the security team to focus on what matters and helps the team better utilize existing resources.

A win-win all around :)

Screens

image

image

overview


findings


Findings Fixed

Check the boxes for which vulnerabilites have been fixed.

Vulnerability Fixed Percentage


image

Performance metrics

It made sense to include benchmarks to show expected performance from our system :) The time taken to complete a scan is totally dependent on the volume of code being scanned. Since the underlying system is primarily built on top of SemGrep, our performance is mainly determined by the performance of SemGrep. Semgrep is able to outperform GitGuardian and other code analysis tools, both, in terms of time taken and false positives flagged.

image

Tree matching has a nearly negligible cost when compared to most deep program analysis techniques, such as pointer analysis or symbolic execution, so this was clearly a winning battle. As Semgrep grew more advanced, more features were added which caused it to err closer to the side of semantics, such as taint analysis and constant propagation.

These analyses are not necessarily ones that can be done quickly. Taint analysis, in particular, requires running dataflow analysis over the entire control flow of a program, which can potentially be huge, when considering how large an entire codebase may be, with all of its function calls and tricky control flow logic. To do taint analysis in this way would be to pick a losing battle.

Semgrep succeeds in that it only carries out single-file analysis, so the control flow graph never exceeds the size of a file. In addition, taint can be done incrementally. Functions have well-defined points where they begin and end, as well as generally well-defined entrances in terms of the data they accept (the function arguments). Thus, Semgrep collects taint summaries, which essentially (per function) encode the information about what taint may be possible, depending on the taint of the inputs that flow in. image

Business Model

pricing

References

Team members


<img src=”https://github.com/ScanRE/ScanRE/assets/61280281/ac34b42b-7f1b-49c3-b156-70f3d7a95860” width=30% />



We’re almost certain we’ve forgotten something 😇