[Review] MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation
The paper proposes the concept of Trace Distillation, that is, to distill or extract the key system calls from the original system call sequence without lowering the coverage, and these distilled sequences will be used as the seed for mutation during fuzzing.
From the distillation process, the dependencies between the system calls will be inferred to help distillation. So actually, the root cause of the speed-up is the dependency inference.
Use static analysis to achieve the seed distillation: inferring both explicit and implicit dependencies between system calls.
MoonShine improved Syzkaller’s test coverage for the Linux kernel by 13% and discovered 17 new previously-undisclosed vulnerabilities in the Linux kernel.
Introduction
Kernel fuzzing, an old topic.
Challenges: dependencies between system calls, kernel states for specific bug triggering.
Existing hand-coded rules are not scalable or effective.
Implementation
Collects system call sequences from Linux Testing Project, Linux Kernel selftests(kselftests), Open Posix Tests, and Glibc Testsuite.
Refine the sequences by explicit dependency inference and implicit dependency inference.
Explicit dependency inference: build connections between the arguments and return values of two system calls.
Implicit dependency inference: If a system call will affect another system call by changing the states(like changing a global variable), we call there is an implicit dependency. Apply source code analysis to figure out this kind of dependency(check the assignment statement and the conditional statement).
Evaluation
RQ1: Can MoonShine discover new vulnerabilities?
RQ2: Can MoonShine improve code coverage?
RQ3: How effectively can MoonShine track dependencies?
RQ4: How efficient is MoonShine?
RQ5: Is distillation useful?
Baselines: Moonshine(Implicit+Explicit), MoonShine(Explicit), RANDOM(randomly choose system calls when distilling), default Syzkaller.
Result: 1. MoonShine found 17 new vulnerabilities that default Syzkaller cannot find out of which 10 vulnerabilities can only be found using implicit dependency distillation.
- MoonShine achieves 13% higher edge coverage than the default Syzkaller.
- MoonShine distills 3220 traces consisting of 2.9 million calls into seeds totaling 16,442 calls that preserve 86% of trace coverage.
- MoonShine collects and distills 110 gigabytes of raw program traces in under 80 minutes.
- Running Syzkaller with undistilled seeds slows the mutation rate by 53%. Running Syzkaller on distilled seeds only reduces the mutation rate to 88.4% of what is achieved by default Syzkaller.
Future work
- Dependency inference(still an important topic).
- Thread-related dependency inference: Some dependencies may occur between different threads, which is a challenge for traditional methods to infer these dependencies.
[Review] MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation