How to Debug Concurrency Bugs using Thread Fuzzing

4 min readFeb 18, 2020

Wikipedia defines fuzzing or fuzz testing as “an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program.”

In Live Recorder 5.0, we introduced a thread fuzzing feature. Thread fuzzing allows Live Recorder to interfere with the regular scheduling of threads in order to expose more concurrency bugs in your software…before you deploy them to customers.

Thread fuzzing in the wild…

Several Live Recorder 5.0 users have reported enabling thread fuzzing in unit tests to see what showed up, and 100% of them reported that thread fuzzing revealed hitherto undiagnosed race conditions. Because they got recordings of the failures, they were fixed comparatively easily.

This article explains thread fuzzing usage in Live Recorder to help you understand whether it can improve your software quality and how to use it.

What are concurrency bugs?

Concurrency bugs are non-deterministic defects that arise when the execution of a thread disrupts the behavior of other threads running at the same time.

Non-deterministic factors such as thread-switching or external data can affect the order or timing of thread execution, resulting in concurrency issues that cause unpredictable application behavior such as miscalculations, crashes, or hangs.

Here are some examples of common concurrency defects: Atomicity violation

If the execution of thread 1 below is interrupted by thread 2 immediately after passing the if test, then thread 1 will crash with a memory access violation.

It is possible for Thread 1 to wait indefinitely for Thread 2 to unlock L2, while Thread 2 is waiting for Thread 1 to release L2. Pared back to its simplest case:

Thread 1:pthread_mutex_lock(L1); 
pthread_mutex_lock(L2);Thread 2:pthread_mutex_lock(L2); 
pthread_mutex_lock(L1);

Race condition

A race condition is a type of software defect that occurs when separate threads interact in an unforeseen way and disrupt the expected timing and ordering of operations.

For example, where two threads try to change shared data at the same time, leading to unpredictable system behavior. That is, multiple threads are in “a race” and different threads might win the race depending on non-deterministic events.

Issues like these are difficult and time consuming to recreate and investigate. Thread fuzzing is a technique to capture them before they go into production.

What is thread fuzzing

As mentioned above, thread fuzzing allows Live Recorder to interfere with the regular scheduling of threads, in order to expose concurrency bugs more easily.

As thread fuzzing changes the scheduling of threads, some concurrency bugs which are very rare in normal conditions become statistically more common.

You can configure Live Recorder 5.0 to use one or more thread fuzzing techniques:

Thread starvation
Random thread slices
Switching at locking/syncing instructions

Thread fuzzing runs within Live Recorder. If thread fuzzing provokes a defect, you instantly have a recording of the defect occurring, which you can interactively debug. You’ll never need to reproduce the failure again.

More detail on each technique is described below.

Thread starvation

(UNDO_tf=starve)

A common type of concurrency bug is due to ordering problems, for instance when there’s a fast data-generating thread and a slower, second thread consuming that data. The consumer thread, being slower, tends to always have data to consume so noticing bugs is rare. However, if the consumer thread overtakes the generator thread, for instance due to slow I/O, an error might occur.

char* array[100] = {0};void generator_thread() {
 for (int i = 0; i < 100; i++) {
 array[i] = strdup("Hello world\n");
 }
}
 
void consumer_thread() { for (int i = 0; i < 100; i++) {
 // Error: the consumer can overtake the generator
 // and call puts() on NULL!
 puts(array[i]);
 }
}

Thread fuzzing’s starve mode encourages race conditions by randomly picking some threads and preventing them from making progress for a short period of time.

Randomizing thread slices

(UNDO_tf=random)

With the randomization component active, Live Recorder randomly switches thread execution, thus increasing the likelihood of threads interrupting each other.

Switches around locking/syncing instructions (UNDO_tf=sync-instr)
With this setting, Live Recorder introduces extra thread switching around basic locking functionalities and atomic operations, for instance gcc’s _sync_* functions or pthread mutex.

By performing extra thread switches around these instructions, we can make it more likely that another thread, where locking is not done correctly, will be run at this point, exposing a concurrency bug.

Configuring thread fuzzing in Live Recorder

To enable thread fuzzing in Live Recorder, use:

live-record --thread-fuzzing.

By default, Live Recorder will apply all thread fuzzing components to your software. To enable only selected components, use a comma-separated list of components in the UNDO_tf environment variable:

UNDO_tf=starve,random,in-bb

To enable thread fuzzing via a Live Recorder API session, include undolr_thread_fuzzing.h and call undolr_thread_mode_set() with a bitmask of the desired components to enable.

Run thread fuzzing in your test pipeline

To eliminate concurrency issues before they get deployed, you will run your software in test with the thread fuzzing functions activated. If a program misbehaves, the defect will be captured in a Live Recorder recording, which you can replay and debug for the fastest possible resolution.

To get the most from thread fuzzing, tests which may expose concurrency bugs should be run many times until a failure happens and then the recording can be analyzed to discover the root cause.

Here’s a video explaining how to enrich your test automation pipeline with Live Recorder.

Originally published at https://undo.io.