Bug Hunting with Structural Code Search
Searching through source code is key to vulnerability hunting and performing code audits. Regular expression search (e.g., grep) remains the fastest and most accessible way to quickly search for buggy code patterns. But writing or understanding regular expressions is still hard. For example, this pattern was used to find an integer overflow in the libssh2 library (circa 2013): "ALLOC[A-Z0-9_]*\s*\([^,]*,[^;]*[*+-][^>][^;]*\)\s*;". Although useful, what it does is not obvious. Besides being difficult to read, regular expressions like these are rudimentary: they cannot generally match nested expressions (like code blocks) and can easily lead to noisy or spurious matches. If you are interested in a new way for matching code, its applications for bug hunting, and related static analysis techniques, then this talk is for you.
This talk presents a new technique and tooling to match code in a way that is simpler and more powerful than regular expressions. Patterns are expressed as declarative syntax templates where the key ideas are that (1) patterns match code structurally (e.g., they can match content inside balanced braces or parentheses, which can nest arbitrarily) and (2) patterns understand the difference between code, data (such as strings), and comments on a per-language basis. While more sophisticated, many existing static analyzers and frameworks require users to program custom checks in order to reason over these code properties (i.e., by accessing the abstract syntax tree), leading to increased effort. I will discuss some of these design tradeoffs, and explain and demo how using declarative templates for bug hunting can take less effort compared to existing approaches (e.g., for finding suspicious code like arrays being written into inside a loop, or unchecked function return values).
In this talk you will learn about a new declarative way to search over richer code structures and see practical examples for bug hunting. You will learn about a new open source tool, comby, that implements these ideas and can complement your toolkit for code auditing tasks. You will also walk away with a high-level understanding of tradeoffs and effectiveness in the design of static analysis tools in the context of this work.
Rijnard van Tonder (@rvtond)
Rijnard holds a PhD in Computer Science from Carnegie Mellon University, where his research focused on automated bug finding and bug fixing. He is currently a software engineer at Sourcegraph, where he works on large scale code search, analysis, and modification. Previously, Rijnard worked on improving analysis reasoning and performance of state-of-the-art static analyzers at Facebook (e.g., Pyre for Python). He continues to have a research interest in the overlap of automated program repair, program analysis, and program transformation, with an emphasis on bringing new advances in these areas to practice.