[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing

[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing

Link here

The paper proposes a new approach to Python fuzzing, called PyRTFuzz.

PyRTFuzz divides the fuzzing process into two levels:

  1. the generation-based level: generate the python applications.
  2. the mutation-based level: apply mutation-based fuzzing to test the generated python applications.

Background:

Three existing problems for Python fuzzing:

  1. testing the Python runtime requires testing both the interpreter core and the language’s runtime libraries.
  2. diverse and valid(syntactically and semantically correct) Python applications are needed.
  3. data types are not available in Python, so type-aware input generation is difficult.

Implementation:

  1. Runtime API Description Extraction

    • to extract the API description from Python’s official documentation.

    • Static Extraction: use the standard AST parser of Python to extract API descriptions.

    • Dynamic Refinement: given the untyped API description of a runtime API, run the unit tests to refine the untyped description to produce the typed API description.

  2. Level-1 Fuzzing

    • generation-based fuzzing
    • for a single API, generate a Python application for testing.
    • perform application generation, generate more diverse applications towards this API.
  3. Level-2 Fuzzing

    • given a application generated in level-1, perform mutation-based fuzzing for testing.
    • mutate the input data according to its data type.

Evaluation:

PyRTFuzz only generates Python APPs each using a single API, without considering the potential dependencies among APIs.

RQ1: How effective is PyRTFuzz on fuzzing Python runtime?

RQ2: How scalable is Python APP generation in PyRTFuzz?

RQ3: What are the factors affecting PyRTFuzz’s effectiveness?

Benchmarks: Python 3.9.15, Python 3.8.15, and Python 3.7.15.

For RQ1:

  • demonstrate the coverage.
  • show the bug triggering ability.

For RQ2:

  • show the impact of APP specification sizes towards time costs.
  • increasing the APP specification size can generally help generate more complex Python APPs.

For RQ3:

  • evaluate the influences towards effectiveness of the following three dimensions.

  • APP Specification Size.

  • Level-2 Time Budget.
  • Typed versus Untyped API Descriptions.

Perform two case studies to introduce the bugs triggered.

Future work:

  • apply combined fuzzing(both generation-based and mutation-based fuzzing) to other areas.
  • introduce a new method which can test multiple APIs together.



[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing

https://gax-c.github.io/blog/2023/12/04/28_paper_review_18/

Author

Gax

Posted on

2023-12-04

Updated on

2023-12-04

Licensed under

Comments