[LeetCode] 4. Median of Two Sorted Arrays

[LeetCode] 4. Median of Two Sorted Arrays

Link here

An excellent binary search problem. It’s easy to use brute-force or mergesort to solve it, but the time complexity is specified as $O(log(m+n))$ in the problem.

It’s obvious to think of binary search when noticing the time complexity with “log”, but constructing such a algorithm remains to be a big problem.

Here are three ways to solve the problem:

Read more
[Review] Python Crypto Misuses in the Wild

[Review] Python Crypto Misuses in the Wild

Link here

The paper conducts a study on Python crypto API misuses. A tool called LICMA is implemented aiming at detecting crypto API misuses in python.

Several conclusions:

  1. 52.26 % of the Python projects using crypto APIs contain at least a potential misuse.
  2. Only 14.81 % of the projects directly contain a misuse of a crypto API. The rest is introduced through third-party code.
  3. Most Python applications are more secure compared with C or Java, and the distribution between the concrete types of misuses differ a lot.
Read more
[Review] Evaluation of Static Vulnerability Detection Tools with Java Cryptographic API Benchmarks

[Review] Evaluation of Static Vulnerability Detection Tools with Java Cryptographic API Benchmarks

Link here

The paper assesses the performance of the current static vulnerability detection tools in the era of Java cryptographic API misuse.

Main contributions:

  • provide two benchmarks: CryptoAPI-Bench, ApacheCryptoAPI-Bench.
    • CryptoAPI-Bench consists of 181 test cases covering 16 types of Cryptographic and SSL/TLS API misuse vulnerabilities, with basic level and advanced level.
    • ApacheCryptoAPI-Bench documents the API misuse vulnerabilities from 10 real-world Apache projects. This benchmark is for checking the scalability(the ability to induce low computational overhead to analyze large code-bases) of the detection tool.
  • evaluate four static analysis tools based on the two proposed benchmarks: specialized tools(CryptoGuard, CrySL), general purpose tools(SpotBugs, Coverity).
Read more
[Review] CryptoGuard: High Precision Detection of Cryptographic Vulnerabilities in Massive-sized Java Projects

[Review] CryptoGuard: High Precision Detection of Cryptographic Vulnerabilities in Massive-sized Java Projects

Link here

The paper designs a new architecture called CryptoGuard to detect the cryptographic API misuse.

Use 16 rules to figure out the misuses and 5 refinement methods to avoid false positive, which resulting a precision of 98.61%.

Creates a benchmark named CryptoApi-Bench with 112 unit test cases. CryptoApi-Bench contains basic intraprocedural instances, inter-procedural cases, field sensitive cases, false positive tests, and correct API uses.

Introduction:

For cryptographic API misuse detection, both static and dynamic analyses have their respective pros and cons.

Static methods do not require the execution of programs. They scale up to a large number of programs, cover a wide range of security rules, and are unlikely to have false negatives.

Dynamic methods require one to trigger and detect specific misuse symptoms at runtime. They tend to produce fewer false positives than static analysis.

API misuse mainly contain the following problems:

Read more
[Review] Automatic Detection of Java Cryptographic API Misuses: Are We There Yet?

[Review] Automatic Detection of Java Cryptographic API Misuses: Are We There Yet?

Link here

A large study of Java cryptographic API misuse.

Two main contributions are made:

  1. evaluate the effectiveness of existing cryptographic API misuse detection tools.
  2. conduct a study with the developers, measuring the real-world performance of detectors.

Introduction:

JCA (Java Cryptography Architecture), JSSE (Java Secure Socket Extension).

Java cryptographic API misuses are common, which may cause a extensive security problems.

13 Java types frequently mentioned in the API-misuse patterns.

Read more
[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing

[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing

Link here

The paper proposes a new approach to Python fuzzing, called PyRTFuzz.

PyRTFuzz divides the fuzzing process into two levels:

  1. the generation-based level: generate the python applications.
  2. the mutation-based level: apply mutation-based fuzzing to test the generated python applications.

Background:

Three existing problems for Python fuzzing:

  1. testing the Python runtime requires testing both the interpreter core and the language’s runtime libraries.
  2. diverse and valid(syntactically and semantically correct) Python applications are needed.
  3. data types are not available in Python, so type-aware input generation is difficult.
Read more
[Review] DynSQL: Stateful Fuzzing for Database Management Systems with Complex and Valid SQL Query Generation

[Review] DynSQL: Stateful Fuzzing for Database Management Systems with Complex and Valid SQL Query Generation

Link here

The paper designs a stateful DBMS fuzzer called DynSQL. DynSQL adopts two new methods: Dynamic Query Interaction and Error Feedback.

Instead of generating all of the SQL queries before performing them, Dynamic Query Interaction allows the fuzzer to fuzz the DBMSs “step-by-step”, that is, dynamically determine the next statement after executing every prior statement.

Also, Error Feedback allows the seed generation to generate more valid SQL statements and queries.

Background:

Former DBMS testing tools: SQLsmith, SQUIRREL, SQLancer.

Existing DBMS fuzzers are still limited in generating complex and valid queries to find deep bugs in DBMSs.

SQLsmith generates only one statement in each query, SQUIRREL produces over 50% invalid queries and tends to generate simple statements.

SQLancer aims to figuring out logic bugs of DBMSs rather than general bugs.

Read more
[Review] Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting

[Review] Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting

Link here

The paper explores the ability of ChatGPT(not LLMs, only ChatGPT) to find failure-inducing tests, and proposes a new method called Differential Prompting to do it. It can achieve a success rate of 75% for programs of QuixBugs and 66.7% of programs of Codeforces.

This approach may only be useful in some small scale programs(less than 100 LOC).

Background:

Failure-inducing tests is some testcases that can trigger bugs of the specific program. Finding such tests is a main objective in software engineering, but challenging in practice.

Recently, applying LLMs(e.g., ChatGPT) for software engineering has become popular, but directly apply ChatGPT to this task may be challenging and has a bad performance. Cause ChatGPT is insensitive to nuances(i.e., subtle differences between two similar sequence to tokens). So, it’s challenging for ChatGPT to find identify bugs because a bug is essentially a nuance between a buggy program and its fixed version.

Read more
[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

Link here

This paper demonstrates a new approach to replaying the Android bugs. More specifically, creates a new tool called AdbGPT to automatedly convert bug reports to reproduction. For the result, AdbGPT is able to reproduce 81.3% of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines and ablation studies.

Background:

Bug reports often go on to contain the steps to reproduce (S2Rs) the bugs that assist developers to replicate and rectify the bugs, albeit with considerable amounts of engineering effort.

Read more
[Review] Examining Zero-Shot Vulnerability Repair with Large Language Models

[Review] Examining Zero-Shot Vulnerability Repair with Large Language Models

Link here

The paper tests the performance of LLM for program repair. The same topic as Automated Program Repair in the Era of Large Pre-trained Language Models. Differently, this paper focuses more on the details, whose program repair setting is much more complicated.

Some conclusions were drawn:

  • LLMs can generate fixes to bugs.
  • But for real-world settings, the performance is not enough.

Background:

  • Security bugs are significant.
  • LLMs are popular and has outstanding performance.

Implementation:

RQ1: Can off-the-shelf LLMs generate safe and functional code to fix security vulnerabilities?

RQ2: Does varying the amount of context in the comments of a prompt affect the LLM’s ability to suggest fixes?

RQ3: What are the challenges when using LLMs to fix vulnerabilities in the real world?

RQ4: How reliable are LLMs at generating repairs?

Read more