[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing
The paper proposes a new approach to Python fuzzing, called PyRTFuzz.
PyRTFuzz divides the fuzzing process into two levels:
- the generation-based level: generate the python applications.
 - the mutation-based level: apply mutation-based fuzzing to test the generated python applications.
 
Background:
Three existing problems for Python fuzzing:
- testing the Python runtime requires testing both the interpreter core and the language’s runtime libraries.
 - diverse and valid(syntactically and semantically correct) Python applications are needed.
 - data types are not available in Python, so type-aware input generation is difficult.
 
Implementation:

Runtime API Description Extraction
to extract the API description from Python’s official documentation.
Static Extraction: use the standard AST parser of Python to extract API descriptions.

Dynamic Refinement: given the untyped API description of a runtime API, run the unit tests to refine the untyped description to produce the typed API description.
Level-1 Fuzzing
- generation-based fuzzing
 - for a single API, generate a Python application for testing.
 - perform application generation, generate more diverse applications towards this API.
 
Level-2 Fuzzing
- given a application generated in level-1, perform mutation-based fuzzing for testing.
 - mutate the input data according to its data type.
 
Evaluation:
PyRTFuzz only generates Python APPs each using a single API, without considering the potential dependencies among APIs.
RQ1: How effective is PyRTFuzz on fuzzing Python runtime?
RQ2: How scalable is Python APP generation in PyRTFuzz?
RQ3: What are the factors affecting PyRTFuzz’s effectiveness?
Benchmarks: Python 3.9.15, Python 3.8.15, and Python 3.7.15.
For RQ1:
- demonstrate the coverage.
 - show the bug triggering ability.
 
For RQ2:
- show the impact of APP specification sizes towards time costs.
 - increasing the APP specification size can generally help generate more complex Python APPs.
 
For RQ3:
evaluate the influences towards effectiveness of the following three dimensions.
APP Specification Size.
- Level-2 Time Budget.
 - Typed versus Untyped API Descriptions.
 
Perform two case studies to introduce the bugs triggered.
Future work:
- apply combined fuzzing(both generation-based and mutation-based fuzzing) to other areas.
 - introduce a new method which can test multiple APIs together.
 
[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing
![[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing](/blog/images/28/cover.png)