Machine Learning for Static Analysis

Static analysis tools (SATs) are automated programs that review code for rule violations without compiling or executing the underlying code. Rules can be low level like a missing semicolon, or high priority like potential memory leaks. While powerful, different SATs have different specialties and sensitivities - a common problem is the generation of false-positive alerts: alerts that claim there is an issue in the code when in reality there are no violations.

  • In large code bases, SATs may produce thousands of alerts. Such volume is difficult for programmers to navigate effectively.
  • There is a need for methodologies to effectively prioritize alerts, providing users with a list of which alerts to investigate and resolve to provide high value in a short amount of time
  • In response we developed a ramework for organizing static analysis alerts that utilizes information such as: expected time to resolve alerts, the importance of individual code segments, true-positive classification, and prioritization techniques.