Reducing Failure-Inducing Inputs¶

A standard problem in debugging is this: Your program fails after processing some large input. Only a part of this input, however, is responsible for the failure. Reducing the input to a failure-inducing minimum not only eases debugging – it also helps in understanding why and when the program fails. In this chapter, we present techniques that automatically reduce and simplify failure-inducing inputs to a minimum, notably the popular Delta Debugging technique.

Prerequisites

  • Using the "delta debugging" technique for reduction has no specific prerequisites.
  • To understand the DeltaDebugger implementation, reading the chapter on tracing is recommended.

This chapter is adapted from a similar chapter in "The Fuzzing Book". The material has been adapted to be independent of the fuzzingbook infrastructure, to build on general delta debugging (dd), and to provide a simpler invocation interface.

Why Reducing?¶

A common problem in debugging is that given an input, only a small part of that input may be responsible for the failure. A central part of debugging is to identify these parts – and to simplify (or reduce) the input to a minimal form that reproduces the failure – but does and contains as little else as possible.

Here's an example of such a situation. We have a mystery() method that – given its code – can occasionally fail. But under which circumstances does this actually happen? We have deliberately obscured the exact condition in order to make this non-obvious.

In [5]:
def mystery(inp: str) -> None:
    x = inp.find(chr(0o17 + 0o31))
    y = inp.find(chr(0o27 + 0o22))
    if x >= 0 and y >= 0 and x < y:
        raise ValueError("Invalid input")
    else:
        pass

To find an input that causes the function to fail, let us fuzz it – that is, feed it with random inputs – until we find a failing input. There are entire books about fuzzing; but here, a very simple fuzz() function for this purpose will already suffice.

To build a fuzzer, we need random inputs – and thus a source for randomness. The function random.randrange(a, b) returns a random number in the range (a, b).

In [7]:
random.randrange(32, 128)
Out[7]:
107

We can use random.randrange() to compose random (printable) characters:

In [8]:
def fuzz() -> str:
    length = random.randrange(10, 70)
    fuzz = ""
    for i in range(length):
        fuzz += chr(random.randrange(32, 127))
    return fuzz

Here are some random strings produced by our fuzz() function:

In [9]:
for i in range(6):
    print(repr(fuzz()))
'N&+slk%hyp5'
"'@[3(rW*M5W]tMFPU4\\P@tz%[X?uo\\1?b4T;1bDeYtHx #UJ5"
'w}pMmPodJM,_%%BC~dYN6*g|Y*Ou9I<P94}7,99ivb(9`=%jJj*Y*d~OLXk!;J'
"!iOU8]hqg00?u(c);>:\\=V<ZV1=*g#UJA'No5QZ)~--[})Sdv#m*L"
'0iHh[-MzS.U.X}fG7aA:G<bEI\'Ofn[",Mx{@jfto}i3D?7%V7XdtO6BjYEa#Il)~]'
"E`h7h)ChX0G*m,|sosJ.mu/\\c'EpaPi0(n{"

Let us now use fuzz() to find an input where mistery() fails:

In [10]:
while True:
    fuzz_input = fuzz()
    try:
        mystery(fuzz_input)
    except ValueError:
        break

This is an input that causes mystery() to fail:

In [11]:
failing_input = fuzz_input
failing_input
Out[11]:
'V"/+!aF-(V4EOz*+s/Q,7)2@0_'
In [12]:
len(failing_input)
Out[12]:
26
In [14]:
with ExpectError(ValueError):
    mystery(failing_input)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/3137704634.py", line 2, in <cell line: 1>
    mystery(failing_input)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/4141878445.py", line 5, in mystery
    raise ValueError("Invalid input")
ValueError: Invalid input (expected)

Something in this input causes mystery() to fail. But what is it?

Manual Input Reduction¶

One important step in the debugging process is reduction – that is, to identify those circumstances of a failure that are relevant for the failure to occur, and to omit (if possible) those parts that are not. As Kernighan and Pike \cite{Kernighan1999} put it:

For every circumstance of the problem, check whether it is relevant for the problem to occur. If it is not, remove it from the problem report or the test case in question.

Specifically for inputs, they suggest a divide and conquer process:

Proceed by binary search. Throw away half the input and see if the output is still wrong; if not, go back to the previous state and discard the other half of the input.

This is something we can easily try out, using our last generated input:

In [15]:
failing_input
Out[15]:
'V"/+!aF-(V4EOz*+s/Q,7)2@0_'

For instance, we can see whether the error still occurs if we only feed in the first half:

In [16]:
half_length = len(failing_input) // 2   # // is integer division
first_half = failing_input[:half_length]
first_half
Out[16]:
'V"/+!aF-(V4EO'
In [17]:
with ExpectError(ValueError):
    mystery(first_half)

Nope – the first half alone does not suffice. Maybe the second half?

In [18]:
second_half = failing_input[half_length:]
assert first_half + second_half == failing_input
second_half
Out[18]:
'z*+s/Q,7)2@0_'
In [19]:
with ExpectError(ValueError):
    mystery(second_half)

This did not go so well either. We may still proceed by cutting away smaller chunks – say, one character after another. If our test is deterministic and easily repeated, it is clear that this process eventually will yield a reduced input. But still, it is a rather inefficient process, especially for long inputs. What we need is a strategy that effectively minimizes a failure-inducing input – a strategy that can be automated.

Delta Debugging¶

One strategy to effectively reduce failure-inducing inputs is delta debugging \cite{Zeller2002}. Delta Debugging implements the "binary search" strategy, as listed above, but with a twist: If neither half fails (also as above), it keeps on cutting away smaller and smaller chunks from the input, until it eliminates individual characters. Thus, after cutting away the first half, we cut away the first quarter, the second quarter, and so on.

Let us illustrate this on our example, and see what happens if we cut away the first quarter.

In [20]:
quarter_length = len(failing_input) // 4
input_without_first_quarter = failing_input[quarter_length:]
input_without_first_quarter
Out[20]:
'F-(V4EOz*+s/Q,7)2@0_'
In [21]:
with ExpectError(ValueError):
    mystery(input_without_first_quarter)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2963114098.py", line 2, in <cell line: 1>
    mystery(input_without_first_quarter)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/4141878445.py", line 5, in mystery
    raise ValueError("Invalid input")
ValueError: Invalid input (expected)

Ah! This has failed, and reduced our failing input by 25%. Let's remove another quarter.

In [22]:
input_without_first_and_second_quarter = failing_input[quarter_length * 2:]
input_without_first_and_second_quarter
Out[22]:
'Oz*+s/Q,7)2@0_'
In [23]:
with ExpectError(ValueError):
    mystery(input_without_first_and_second_quarter)

This is not too surprising, as we had that one before:

In [24]:
second_half
Out[24]:
'z*+s/Q,7)2@0_'
In [25]:
input_without_first_and_second_quarter
Out[25]:
'Oz*+s/Q,7)2@0_'

How about removing the third quarter, then?

In [26]:
input_without_first_and_third_quarter = failing_input[quarter_length:
                                                      quarter_length * 2] + failing_input[quarter_length * 3:]
input_without_first_and_third_quarter
Out[26]:
'F-(V4EQ,7)2@0_'
In [27]:
with ExpectError(ValueError):
    mystery(input_without_first_and_third_quarter)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/4218135276.py", line 2, in <cell line: 1>
    mystery(input_without_first_and_third_quarter)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/4141878445.py", line 5, in mystery
    raise ValueError("Invalid input")
ValueError: Invalid input (expected)

Yes! This has succeeded. Our input is now 50% smaller.

We have now tried to remove pieces that make up $\frac{1}{2}$ and $\frac{1}{4}$ of the original failing string. In the next iteration, we would go and remove even smaller pieces – $\frac{1}{8}$, $\frac{1}{16}$ and so on. We continue until we are down to $\frac{1}{26}$ – that is, individual characters.

However, this is something we happily let a computer do for us – and this is what the Delta Debugging algorithm does. Delta Debugging implements the strategy sketched above: It first removes larger chunks of size $\frac{1}{2}$; if this does not fail, then we proceed to chunks of size $\frac{1}{4}$, then $\frac{1}{8}$ and so on.

Our ddmin() implementation uses the exact same Python code as Zeller in \cite{Zeller2002}; the only difference is that it has been adapted to work on Python 3. The variable n (initially 2) indicates the granularity – in each step, chunks of size $\frac{1}{n}$ are cut away. If none of the test fails (some_complement_is_failing is False), then n is doubled – until it reaches the length of the input.

In [28]:
PASS = 'PASS'
FAIL = 'FAIL'
UNRESOLVED = 'UNRESOLVED'
In [29]:
# ignore
from typing import Sequence, Any, Callable, Optional, Type, Tuple
from typing import Dict, Union, Set, List, FrozenSet, cast
In [30]:
def ddmin(test: Callable, inp: Sequence, *test_args: Any) -> Sequence:
    """Reduce the input inp, using the outcome of test(fun, inp)."""
    assert test(inp, *test_args) != PASS

    n = 2     # Initial granularity
    while len(inp) >= 2:
        start = 0
        subset_length = int(len(inp) / n)
        some_complement_is_failing = False

        while start < len(inp):
            complement = (inp[:int(start)] + inp[int(start + subset_length):])  # type: ignore

            if test(complement, *test_args) == FAIL:
                inp = complement
                n = max(n - 1, 2)
                some_complement_is_failing = True
                break

            start += subset_length

        if not some_complement_is_failing:
            if n == len(inp):
                break
            n = min(n * 2, len(inp))

    return inp

To see how ddmin() works, let us run it on our failing input. We need to define a test function that returns PASS or FAIL, depending on the test outcome. This generic_test() assumes that the function fails if it raises an exception (such as an AssertException), and passes otherwise. The optional argument expected_exc specifies the name of exception to be checked for; this ensures we reduce only for the kind of error raised in the original failure.

In [31]:
def generic_test(inp: Sequence, fun: Callable,
                 expected_exc: Optional[Type] = None) -> str:
    result = None
    detail = ""
    try:
        result = fun(inp)
        outcome = PASS
    except Exception as exc:
        detail = f" ({type(exc).__name__}: {str(exc)})"
        if expected_exc is None:
            outcome = FAIL
        elif type(exc) == type(expected_exc) and str(exc) == str(expected_exc):
            outcome = FAIL
        else:
            outcome = UNRESOLVED

    print(f"{fun.__name__}({repr(inp)}): {outcome}{detail}")
    return outcome

We can now invoke ddmin() in our setting. With each step, we see how the remaining input gets smaller and smaller, until only two characters remain:

In [32]:
ddmin(generic_test, failing_input, mystery, ValueError('Invalid input'))
mystery('V"/+!aF-(V4EOz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
mystery('z*+s/Q,7)2@0_'): PASS
mystery('V"/+!aF-(V4EO'): PASS
mystery('F-(V4EOz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
mystery('Oz*+s/Q,7)2@0_'): PASS
mystery('F-(V4EQ,7)2@0_'): FAIL (ValueError: Invalid input)
mystery(',7)2@0_'): PASS
mystery('F-(V4EQ'): PASS
mystery('V4EQ,7)2@0_'): PASS
mystery('F-(Q,7)2@0_'): FAIL (ValueError: Invalid input)
mystery('Q,7)2@0_'): PASS
mystery('F-()2@0_'): FAIL (ValueError: Invalid input)
mystery('2@0_'): PASS
mystery('F-()'): FAIL (ValueError: Invalid input)
mystery('()'): FAIL (ValueError: Invalid input)
mystery(')'): PASS
mystery('('): PASS
Out[32]:
'()'

Now we know why mystery() fails – it suffices that the input contains two matching parentheses. Delta Debugging determines this in 25 steps. Its result is 1-minimal, meaning that every character contained is required to produce the error; removing any (as seen in the last two tests, above) no longer causes the test to fail. This property is guaranteed by the delta debugging algorithm, which in its last stage always tries to delete characters one by one.

A reduced test case such as the one above has many advantages:

  • A reduced test case reduces the cognitive load of the programmer. The test case is shorter and focused, and thus does not burden the programmer with irrelevant details. A reduced input typically leads to shorter executions and smaller program states, both of which reduce the search space as it comes to understanding the bug. In our case, we have eliminated lots of irrelevant input – only the two characters the reduced input contains are relevant.

  • A reduced test case is easier to communicate. All one needs here is the summary: mystery() fails on "()", which is much better than mystery() fails on a 4100-character input (attached).

  • A reduced test case helps in identifying duplicates. If similar bugs have been reported already, and all of them have been reduced to the same cause (namely that the input contains matching parentheses), then it becomes obvious that all these bugs are different symptoms of the same underlying cause – and would all be resolved at once with one code fix.

How effective is delta debugging? In the best case (when the left half or the right half fails), the number of tests is logarithmic proportional to the length $n$ of an input (i.e., $O(\log_2 n)$); this is the same complexity as binary search. In the worst case, though, delta debugging can require a number of tests proportional to $n^2$ (i.e., $O(n^2)$) – this happens in the case when we are down to character granularity, and we have to repeatedly tried to delete all characters, only to find that deleting the last character results in a failure \cite{Zeller2002}. (This is a pretty pathological situation, though.)

In general, delta debugging is a robust algorithm that is easy to implement, easy to deploy, and easy to use – provided that the underlying test case is deterministic and runs quickly enough to warrant a number of experiments. In general, any debugging task should start with simplifying the test case as much as possible – and this is where delta debugging can help.

A Simple DeltaDebugger Interface¶

As defined above, using ddmin() still requires the developer to set up a special testing function – and writing or using even a generic tester (like generic_test()) takes some effort. We want to simplify the setup such that only two lines of Python is required.

Our aim is to have a DeltaDebugger class that we can use in conjunction with a failing (i.e., exception raising) function call:

with DeltaDebugger() as dd:
    mystery(failing_input)
dd

Here, at the end of the with statement, printing out dd shows us the minimal input that causes the failure.

To see how the DeltaDebugger works, let us run it on our failing input. The expected usage is as introduced earlier – we wrap the failing function in a with block, and then print out the debugger to see the reduced arguments. We see that DeltaDebugger easily reduces the arguments to the minimal failure-inducing input:

In [90]:
with DeltaDebugger() as dd:
    mystery(failing_input)
dd
Out[90]:
mystery(inp='()')

We can turn on logging for DeltaDebugger to see how it proceeds. With each step, we see how the remaining input gets smaller and smaller, until only two characters remain:

In [91]:
with DeltaDebugger(log=True) as dd:
    mystery(failing_input)
dd
Observed mystery(inp='V"/+!aF-(V4EOz*+s/Q,7)2@0_') raising ValueError: Invalid input
Test #1 mystery(inp='V"/+!aF-(V4EOz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Processing inp...
Test #2 mystery(inp=''): PASS
Test #3 mystery(inp='V"/+!aF-(V4EO'): PASS
Test #4 mystery(inp='z*+s/Q,7)2@0_'): PASS
Test #5 mystery(inp='-(V4EOz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Test #6 mystery(inp='*+s/Q,7)2@0_'): PASS
Test #7 mystery(inp='-(V4EOz7)2@0_'): FAIL (ValueError: Invalid input)
Test #8 mystery(inp='7)2@0_'): PASS
Test #9 mystery(inp='-(V4EOz'): PASS
Test #10 mystery(inp='-(V47)2@0_'): FAIL (ValueError: Invalid input)
Test #11 mystery(inp='-(V4@0_'): PASS
Test #12 mystery(inp='-(V47)2'): FAIL (ValueError: Invalid input)
Test #13 mystery(inp='-7)2'): PASS
Test #14 mystery(inp='(V4'): PASS
Test #15 mystery(inp='-47)2'): PASS
Test #16 mystery(inp='-(V7)2'): FAIL (ValueError: Invalid input)
Test #17 mystery(inp='-(V2'): PASS
Test #18 mystery(inp='(V7)'): FAIL (ValueError: Invalid input)
Test #19 mystery(inp='7)'): PASS
Test #20 mystery(inp='(V'): PASS
Test #21 mystery(inp='(7)'): FAIL (ValueError: Invalid input)
Test #22 mystery(inp='()'): FAIL (ValueError: Invalid input)
Test #23 mystery(inp=')'): PASS
Test #24 mystery(inp='('): PASS
Minimized inp to '()'
Minimized failing call to mystery(inp='()')
Out[91]:
mystery(inp='()')

It is also possible to access the debugger programmatically:

In [92]:
with DeltaDebugger() as dd:
    mystery(failing_input)
In [93]:
dd.args()
Out[93]:
{'inp': 'V"/+!aF-(V4EOz*+s/Q,7)2@0_'}
In [94]:
dd.min_args()
Out[94]:
{'inp': '()'}
In [95]:
quiz("What happens if the function under test does not raise an exception?",
    [
        "Delta debugging searches for the minimal input"
        " that produces the same result",
        "Delta debugging starts a fuzzer to find an exception",
        "Delta debugging raises an exception",
        "Delta debugging runs forever in a loop",
    ], '0 ** 0 + 1 ** 0 + 0 ** 1 + 1 ** 1')
Out[95]:

Quiz

What happens if the function under test does not raise an exception?





Indeed, DeltaDebugger checks if an exception occurs. If not, you obtain a NotFailingError.

In [96]:
with ExpectError(NotFailingError):
    with DeltaDebugger() as dd:
        mystery("An input that does not fail")
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/3784387889.py", line 2, in <cell line: 1>
    with DeltaDebugger() as dd:
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/4114738534.py", line 24, in __exit__
    self.after_collection()
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/1934034330.py", line 7, in after_collection
    raise NotFailingError(
NotFailingError: mystery(inp='An input that does not fail') did not raise an exception (expected)

Delta Debugging also assumes that the function under test is deterministic. If it occasionally fails and occasionally passes, you will get random results.

Usage Examples¶

Let us apply DeltaDebugger on a number of examples.

Reducing remove_html_markup()¶

For our ongoing remove_html_markup() example, we can reduce the failure-inducing input to a minimum, too:

In [98]:
with DeltaDebugger(log=True) as dd:
    remove_html_markup('"x > y"')
dd.min_args()
Observed remove_html_markup(s='"x > y"') raising AssertionError
Test #1 remove_html_markup(s='"x > y"'): FAIL (AssertionError)
Processing s...
Test #2 remove_html_markup(s=''): PASS
Test #3 remove_html_markup(s='"x >'): FAIL (AssertionError)
Test #4 remove_html_markup(s='"x'): PASS
Test #5 remove_html_markup(s=' >'): PASS
Test #6 remove_html_markup(s='x >'): PASS
Test #7 remove_html_markup(s='" >'): FAIL (AssertionError)
Test #8 remove_html_markup(s='">'): FAIL (AssertionError)
Test #9 remove_html_markup(s='>'): PASS
Test #10 remove_html_markup(s='"'): PASS
Minimized s to '">'
Minimized failing call to remove_html_markup(s='">')
Out[98]:
{'s': '">'}

Reducing Multiple Arguments¶

If a function has multiple reducible variables, they get reduced in turns. This string_error() function fails whenever s1 is a substring of s2:

In [99]:
def string_error(s1: str, s2: str) -> None:
    assert s1 not in s2, "no substrings"

Running DeltaDebugger on string_error shows how first s1 is reduced, then s2, then s1 again.

In [100]:
with DeltaDebugger(log=True) as dd:
    string_error("foo", "foobar")

string_error_args = dd.min_args()
string_error_args
Observed string_error(s1='foo', s2='foobar') raising AssertionError: no substrings
Test #1 string_error(s1='foo', s2='foobar'): FAIL (AssertionError: no substrings)
Processing s1...
Test #2 string_error(s1='', s2='foobar'): FAIL (AssertionError: no substrings)
Minimized s1 to ''
Processing s2...
Test #3 string_error(s1='', s2=''): FAIL (AssertionError: no substrings)
Minimized s2 to ''
Minimized failing call to string_error(s1='', s2='')
Out[100]:
{'s1': '', 's2': ''}

We see that the failure also occurs if both strings are empty:

In [101]:
with ExpectError(AssertionError):
    string_error(string_error_args['s1'], string_error_args['s2'])
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/3257360882.py", line 2, in <cell line: 1>
    string_error(string_error_args['s1'], string_error_args['s2'])
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/773514021.py", line 2, in string_error
    assert s1 not in s2, "no substrings"
AssertionError: no substrings (expected)

Invoking an Interactive Debugger¶

The results from delta debugging can be immediately used to invoke an interactive debugger on the minimized input. To this end, we need to turn the dictionary returned by min_args() into arguments of the (failing) function call.

Python provides a simple way to turn dictionaries into function calls. The construct

fun(**args)

invokes the function fun, with all parameters assigned from the respective values in the dictionary.

With this, we can immediately invoke a Debugger on the failing run with minimized arguments:

In [103]:
# ignore
from bookutils import next_inputs
In [104]:
# ignore
next_inputs(['print', 'quit'])
Out[104]:
['print', 'quit']
In [105]:
with ExpectError(AssertionError):
    with Debugger():
        string_error(**string_error_args)
Calling string_error(s1 = '', s2 = '')
(debugger) print
s1 = ''
s2 = ''
(debugger) quit
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2680174709.py", line 3, in <cell line: 1>
    string_error(**string_error_args)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/773514021.py", line 2, in string_error
    assert s1 not in s2, "no substrings"
AssertionError: no substrings (expected)

Reducing other Collections¶

Our DeltaDebugger is not limited to strings. It can reduce any argument x for which a len(x) operation and an indexing operation x[i] is defined – notably lists. Here is how to apply DeltaDebugger on a list:

In [106]:
def list_error(l1: List, l2: List, maxlen: int) -> None:
    assert len(l1) < len(l2) < maxlen, "invalid string length"
In [107]:
with DeltaDebugger() as dd:
    list_error(l1=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], l2=[1, 2, 3], maxlen=5)
dd
Out[107]:
list_error(l1=[], l2=[], maxlen=5)

Debugging Inputs¶

Sometimes, it may be useful to not minimize the input, but rather maximize it – that is, to find the maximum input that does not fail. For instance, you may have an input of which you want to preserve as much as possible – to repair it, or to establish a context that is as close as possible to the real input.

This is possible by using the max_arg() method. It implements the ddmax variant of the general Delta Debugging algorithm \cite{Kirschner2020}. With each step, it tries to add more and more characters to the passing input until it is 1-maximal – that is, any additional character that would be added from the failing input also would cause the function to fail.

In [108]:
with DeltaDebugger(log=True) as dd:
    mystery(failing_input)
max_passing_input = dd.max_args()['inp']
max_passing_input
Observed mystery(inp='V"/+!aF-(V4EOz*+s/Q,7)2@0_') raising ValueError: Invalid input
Test #1 mystery(inp='V"/+!aF-(V4EOz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Processing inp...
Test #2 mystery(inp=''): PASS
Test #3 mystery(inp='z*+s/Q,7)2@0_'): PASS
Test #4 mystery(inp='-(V4EOz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Test #5 mystery(inp='V"/+!aFz*+s/Q,7)2@0_'): PASS
Test #6 mystery(inp='V"/+!aF4EOz*+s/Q,7)2@0_'): PASS
Test #7 mystery(inp='V"/+!aF-4EOz*+s/Q,7)2@0_'): PASS
Test #8 mystery(inp='V"/+!aF-V4EOz*+s/Q,7)2@0_'): PASS
Maximized inp to 'V"/+!aF-V4EOz*+s/Q,7)2@0_'
Maximized passing call to mystery(inp='V"/+!aF-V4EOz*+s/Q,7)2@0_')
Out[108]:
'V"/+!aF-V4EOz*+s/Q,7)2@0_'

Note that this is precisely the failure-inducing input except for the first parenthesis. Adding this single character would cause the input to cause a failure.

Failure-Inducing Differences¶

If one wants to look for differences that distinguish passing from failing runs, Delta Debugging also has a direct method for this – by both maximizing the passing input and minimizing the failing input until they meet somewhere in the middle. The remaining difference is what makes the difference between passing and failing.

To compute the failure-inducing differences for mystery(), use the min_arg_diff() method:

In [109]:
with DeltaDebugger(log=True) as dd:
    mystery(failing_input)
max_passing_args, min_failing_args, diff = dd.min_arg_diff()
max_passing_args['inp'], min_failing_args['inp'], diff['inp']
Observed mystery(inp='V"/+!aF-(V4EOz*+s/Q,7)2@0_') raising ValueError: Invalid input
Test #1 mystery(inp='V"/+!aF-(V4EOz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Processing inp...
Test #2 mystery(inp=''): PASS
Test #3 mystery(inp='V"/+!aF-(V4EO'): PASS
Test #4 mystery(inp='z*+s/Q,7)2@0_'): PASS
Test #5 mystery(inp='V"/+!aFz*+s/Q,7)2@0_'): PASS
Test #6 mystery(inp='-(V4EOz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Test #7 mystery(inp='-(Vz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Test #8 mystery(inp='(Vz*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Test #9 mystery(inp='(z*+s/Q,7)2@0_'): FAIL (ValueError: Invalid input)
Maximized inp to 'z*+s/Q,7)2@0_'
Minimized inp to '(z*+s/Q,7)2@0_'
Maximized passing call to mystery(inp='z*+s/Q,7)2@0_')
Minimized failing call to mystery(inp='(z*+s/Q,7)2@0_')
Out[109]:
('z*+s/Q,7)2@0_', '(z*+s/Q,7)2@0_', '(')

Minimizing failure-inducing differences is especially efficient on large inputs, since the number of differences between a passing and a failing input is much smaller than the inputs themselves. Here is the failure-inducing difference as determined by Delta Debugging:

In [110]:
diff['inp']
Out[110]:
'('

Reducing Program Code¶

One particularly fun application of reducers is on program code. Technically speaking, program code is just another input to a computation; and we can actually automatically determine which minimum of program code is required to produce a failure, using Delta Debugging. Such minimization of code is typically used as it comes to debugging programs that accept code as their input, such as compilers and interpreters. However, it can also pinpoint failure causes in the (input) code itself.

As an example, let us apply Delta Debugging on the code from the chapter on assertions. You do not need to have read the chapter; the important part is that this chapter provides an implementation of remove_html_markup() that we want to use.

In [111]:
# ignore
try:
    del remove_html_markup
except NameError:
    pass

Here is the source code of all the chapter; this is several hundred lines long.

In [114]:
assertions_source_lines, _ = inspect.getsourcelines(Assertions)
# print_content("".join(assertions_source_lines), ".py")
assertions_source_lines[:10]
Out[114]:
['from bookutils import YouTubeVideo\n',
 'YouTubeVideo("9mI9sbKFkwU")\n',
 '\n',
 'import bookutils\n',
 '\n',
 'from bookutils import quiz\n',
 '\n',
 'import Tracer\n',
 '\n',
 'from ExpectError import ExpectError\n']
In [115]:
len(assertions_source_lines)
Out[115]:
552

We can take this code and execute it. Nothing particular should happen here, as our imports only import definitions of functions, classes, and global variables.

In [116]:
def compile_and_run(lines: List[str]) -> None:
    # To execute 'Assertions' in place, we need to define __name__ and __package__
    exec("".join(lines), {'__name__': '<string>',
                          '__package__': 'debuggingbook',
                          'Any': Any,
                         'Type': Type,
                         'TracebackType': TracebackType,
                         'Optional': Optional},
         {})
In [117]:
compile_and_run(assertions_source_lines)

Let us add some code to it – a "My Test" assertion that tests that remove_html_markup(), applied on a string with double quotes, should keep these in place:

In [119]:
def compile_and_test_html_markup_simple(lines: List[str]) -> None:
    compile_and_run(lines + 
        [
            '''''',
            '''assert remove_html_markup('"foo"') == '"foo"', "My Test"\n'''
        ])

This assertion fails. (As always, remove_html_markup() is buggy.)

In [120]:
with ExpectError(AssertionError):
    compile_and_test_html_markup_simple(assertions_source_lines)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2334363057.py", line 2, in <cell line: 1>
    compile_and_test_html_markup_simple(assertions_source_lines)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/1838190689.py", line 2, in compile_and_test_html_markup_simple
    compile_and_run(lines +
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2075409738.py", line 3, in compile_and_run
    exec("".join(lines), {'__name__': '<string>',
  File "<string>", line 553, in <module>
AssertionError: My Test (expected)

The question we want to address in this section is: Given this assertion, can we automatically determine which part of the Assertions code lines in assertions_source_lines is relevant for producing the failure?

Reducing Code Lines¶

Since our Assertions source code comes as a list of lines, we can apply our DeltaDebugger on it. The result will be the list of source lines that is necessary to make the assertion fail.

In [121]:
quiz("What will the reduced set of lines contain?",
     [
         "All of the source code in the assertions chapter.",
         "Only the source code of `remove_html_markup()`",
         "Only a subset of `remove_html_markup()`",
         "No lines at all."
     ], '[x for x in range((1 + 1) ** (1 + 1)) if x % (1 + 1) == 1][1]')
Out[121]:

Quiz

What will the reduced set of lines contain?





Let us see what the DeltaDebugger produces.

In [122]:
with DeltaDebugger(log=False) as dd:
    compile_and_test_html_markup_simple(assertions_source_lines)

We get exactly two lines of code:

In [123]:
reduced_lines = dd.min_args()['lines']
len(reduced_lines)
Out[123]:
2

And these are:

In [125]:
print_content("".join(reduced_lines), ".py")
def remove_html_markup(s):  # type: ignore
    tag = False

On these lines, our test actually still fails:

In [126]:
with ExpectError(AssertionError):
    compile_and_test_html_markup_simple(reduced_lines)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2457905094.py", line 2, in <cell line: 1>
    compile_and_test_html_markup_simple(reduced_lines)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/1838190689.py", line 2, in compile_and_test_html_markup_simple
    compile_and_run(lines +
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2075409738.py", line 3, in compile_and_run
    exec("".join(lines), {'__name__': '<string>',
  File "<string>", line 3, in <module>
AssertionError: My Test (expected)

This failure may come as a surprise – remove_html_markup() is reduced to a function which does not even return a value. However, this is how it causes our "My Test" assertion to fail: In Python, a function without an explicit return statement returns None. This value is definitely not the string the "My Test" assertion expects, so it fails.

At the same time, we also have a function test_square_root() which is equally devoid of any meaning – its code line does not even stem from its original implementation. Note, however, how the set of four lines is actually 1-minimal – removing any further line would result in a syntax error.

To ensure we do not remove code that actually would be necessary for normal behavior, let us add another check – one that checks for the normal functionality of remove_html_markup(). If this one fails (say, after the code has been tampered with too much), it raises an exception – but a different one from the original failure:

In [127]:
def compile_and_test_html_markup(lines: List[str]) -> None:
    compile_and_run(lines +
        [
            '',
            '''if remove_html_markup('<foo>bar</foo>') != 'bar':\n''',
            '''    raise RuntimeError("Missing functionality")\n''',
            '''assert remove_html_markup('"foo"') == '"foo"', "My Test"\n'''
        ])

On our "reduced" code, we now obtain a different exception.

In [128]:
with ExpectError():
    compile_and_test_html_markup(reduced_lines)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/810405118.py", line 2, in <cell line: 1>
    compile_and_test_html_markup(reduced_lines)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/4067432722.py", line 2, in compile_and_test_html_markup
    compile_and_run(lines +
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2075409738.py", line 3, in compile_and_run
    exec("".join(lines), {'__name__': '<string>',
  File "<string>", line 4, in <module>
RuntimeError: Missing functionality (expected)

Such an outcome that is different from the original failure causes our DeltaDebugger not treating this as a failure, but rather as a UNRESOLVED outcome, indicating that the test cannot determine whether it passed or failed. The ddmin algorithm treats such unresolved outcomes as if they were passing; hence, the algorithm treats its minimization attempt as unsuccessful.

How does this change things? When we reduce the Assertions source code with the extended assertions, we now get a different result:

In [129]:
with DeltaDebugger(log=False) as dd:
    compile_and_test_html_markup(assertions_source_lines)
reduced_assertions_source_lines = dd.min_args()['lines']

Our result actually is the source code of remove_html_markup() – and only the source code. This is a success, as Delta Debugging has eliminated all the other parts of the Assertions source code; these neither contribute to the correct functioning of remove_html_markup(), nor to the failure at hand.

In [130]:
print_content(''.join(reduced_assertions_source_lines), '.py')
def remove_html_markup(s):  # type: ignore
    tag = False
    quote = False
    out = ""
    for c in s:
        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c
    return out

All in all, we have reduced the number of relevant lines in Assertions to about 3% of the original source code.

In [131]:
len(reduced_assertions_source_lines) / len(assertions_source_lines)
Out[131]:
0.025362318840579712

The astute reader may notice that remove_html_markup(), as shown above, is slightly different from the original version in the chapter on assertions. Here's the original version for comparison:

In [132]:
remove_html_markup_source_lines, _ = inspect.getsourcelines(Assertions.remove_html_markup)
print_content(''.join(remove_html_markup_source_lines), '.py')
def remove_html_markup(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c

    # postcondition
    assert '<' not in out and '>' not in out

    return out
In [133]:
quiz("In the reduced version, what has changed?",
    [
        "Comments are deleted",
        "Blank lines are deleted",
        "Initializations are deleted",
        "The assertion is deleted",
    ], '[(1 ** 0 - -1 ** 0) ** n for n in range(0, 3)]')
Out[133]:

Quiz

In the reduced version, what has changed?





Indeed, Delta Debugging has determined all these as being irrelevant for reproducing the failure – and consequently, has deleted them.

Reducing Code Characters¶

We can reduce the code further by removing individual characters rather than lines. To this end, we convert our (already reduced) remove_html_markup() code into a list of characters.

In [134]:
reduced_assertions_source_characters = list("".join(reduced_assertions_source_lines))
print(reduced_assertions_source_characters[:30])
['d', 'e', 'f', ' ', 'r', 'e', 'm', 'o', 'v', 'e', '_', 'h', 't', 'm', 'l', '_', 'm', 'a', 'r', 'k', 'u', 'p', '(', 's', ')', ':', ' ', ' ', '#', ' ']

Our compile_and_test_html_markup() works (and fails) as before: It still joins the given strings into one and executes them. (Remember that in Python, "characters" are simply strings of length one.)

In [135]:
with ExpectError(AssertionError):
    compile_and_test_html_markup(reduced_assertions_source_characters)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/909985861.py", line 2, in <cell line: 1>
    compile_and_test_html_markup(reduced_assertions_source_characters)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/4067432722.py", line 2, in compile_and_test_html_markup
    compile_and_run(lines +
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2075409738.py", line 3, in compile_and_run
    exec("".join(lines), {'__name__': '<string>',
  File "<string>", line 17, in <module>
AssertionError: My Test (expected)

Let's see what Delta Debugging makes of that – and also, how long it takes. The Timer class gives us a simple means to measure time.

In [137]:
with DeltaDebugger(log=False) as dd:
    compile_and_test_html_markup(reduced_assertions_source_characters)

Here's the reduced result:

In [138]:
with Timer() as t:
    further_reduced_assertions_source_characters = dd.min_args()['lines']
print_content("".join(further_reduced_assertions_source_characters), ".py")
def remove_html_markup(s):
    tag=False
    quote=False
    out=""
    for c in s:
        if c=='<'and not quote:tag=True
        if c=='>'and not quote:tag=False
        elif c=='"'or c==""and g:not quote
        elif not tag:out=out+c
    return out

There's a number of observations we can make about this code.

  • All superfluous blanks and even newlines have been removed.
  • As a curiosity, the initialization of quote and out to "" is now merged into a single (semantics-preserving) statement.
  • The semantics and effect of < and > characters is preserved, as mandated by our RuntimeError check.
  • Double quotes still have the effect of not being included in the returned value: the remaining quote has no effect.

Semantics-wise, this reduced variant still yields the "original" failure; the biggest semantic differences, though, are in the condition and code associated with double quotes – which actually also is the location of the defect to be fixed. This is how reducing code can also point to not only necessary locations, but also defective locations.

Mind you that reducing code is not cheap, and especially not if you remove by characters. It has taken DeltaDebugger several thousand tests to obtain the result above:

In [139]:
dd.tests
Out[139]:
1278

And to do so, it even required several seconds. This may be little for a human, but from a CPU standpoint, this is an enormous effort.

In [140]:
t.elapsed_time()
Out[140]:
0.3180582500062883

Reducing Syntax Trees¶

When reducing code (or generally speaking, recursive structures), using a syntactic approach can be a much better alternative to the line-by-line or character-by-character approaches discussed above. The idea is that one represents the input as a tree (rather than a sequence of strings), in which a reducer would work on entire subtrees, deleting or reducing parts of the tree.

We illustrate this concept on syntax trees representing Python code. Python provides us with simple means to interactively convert code into syntax trees (and back again). So, in order to reduce code, we can

  1. parse the program code into a syntax tree (called abstract syntax tree or AST);
  2. reduce the syntax tree to a minimum, executing it to test reductions; and
  3. unparse the tree to obtain textual code again.

Since transformations on the AST are much less likely to produce syntax errors, reducing ASTs is much more efficient than reducing program code as text.

In the chapter on slicing, we already have seen several examples on how to work with ASTs. In our context, an AST also offers additional possibilities for reducing. Notably, instead of just deleting code fragments, we can also replace them with simpler fragments. For instance, we can replace arithmetic expressions with constants, or conditional statements if cond: body with the associated body body.

Let us illustrate how this works, again choosing remove_html_markup() as our ongoing example. One more time, we create a function with associated test.

In [141]:
fun_source = inspect.getsource(remove_html_markup)
In [142]:
print_content(fun_source, '.py')
def remove_html_markup(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c

    # postcondition
    assert '<' not in out and '>' not in out

    return out

From Code to Syntax Trees¶

Let us parse this piece of code into an AST. This is done by the ast.parse() function.

In [144]:
fun_tree: ast.Module = ast.parse(fun_source)

The parsed tree contains the function definition:

In [146]:
show_ast(fun_tree)
0 FunctionDef 1 "remove_html_markup" 0--1 2 arguments 0--2 5 Assign 0--5 11 Assign 0--11 17 Assign 0--17 23 For 0--23 121 Assert 0--121 138 Return 0--138 3 arg 2--3 4 "s" 3--4 6 Name 5--6 9 Constant 5--9 7 "tag" 6--7 8 Store 6--8 10 False 9--10 12 Name 11--12 15 Constant 11--15 13 "quote" 12--13 14 Store 12--14 16 False 15--16 18 Name 17--18 21 Constant 17--21 19 "out" 18--19 20 Store 18--20 22 "" 21--22 24 Name 23--24 27 Name 23--27 30 If 23--30 25 "c" 24--25 26 Store 24--26 28 "s" 27--28 29 Load 27--29 31 BoolOp 30--31 45 Assign 30--45 51 If 30--51 32 And 31--32 33 Compare 31--33 40 UnaryOp 31--40 34 Name 33--34 37 Eq 33--37 38 Constant 33--38 35 "c" 34--35 36 Load 34--36 39 "<" 38--39 41 Not 40--41 42 Name 40--42 43 "quote" 42--43 44 Load 42--44 46 Name 45--46 49 Constant 45--49 47 "tag" 46--47 48 Store 46--48 50 True 49--50 52 BoolOp 51--52 66 Assign 51--66 72 If 51--72 53 And 52--53 54 Compare 52--54 61 UnaryOp 52--61 55 Name 54--55 58 Eq 54--58 59 Constant 54--59 56 "c" 55--56 57 Load 55--57 60 ">" 59--60 62 Not 61--62 63 Name 61--63 64 "quote" 63--64 65 Load 63--65 67 Name 66--67 70 Constant 66--70 68 "tag" 67--68 69 Store 67--69 71 False 70--71 73 BoolOp 72--73 94 Assign 72--94 103 If 72--103 74 Or 73--74 75 Compare 73--75 82 BoolOp 73--82 76 Name 75--76 79 Eq 75--79 80 Constant 75--80 77 "c" 76--77 78 Load 76--78 81 """ 80--81 83 And 82--83 84 Compare 82--84 91 Name 82--91 85 Name 84--85 88 Eq 84--88 89 Constant 84--89 86 "c" 85--86 87 Load 85--87 90 "'" 89--90 92 "tag" 91--92 93 Load 91--93 95 Name 94--95 98 UnaryOp 94--98 96 "quote" 95--96 97 Store 95--97 99 Not 98--99 100 Name 98--100 101 "quote" 100--101 102 Load 100--102 104 UnaryOp 103--104 109 Assign 103--109 105 Not 104--105 106 Name 104--106 107 "tag" 106--107 108 Load 106--108 110 Name 109--110 113 BinOp 109--113 111 "out" 110--111 112 Store 110--112 114 Name 113--114 117 Add 113--117 118 Name 113--118 115 "out" 114--115 116 Load 114--116 119 "c" 118--119 120 Load 118--120 122 BoolOp 121--122 123 And 122--123 124 Compare 122--124 131 Compare 122--131 125 Constant 124--125 127 NotIn 124--127 128 Name 124--128 126 "<" 125--126 129 "out" 128--129 130 Load 128--130 132 Constant 131--132 134 NotIn 131--134 135 Name 131--135 133 ">" 132--133 136 "out" 135--136 137 Load 135--137 139 Name 138--139 140 "out" 139--140 141 Load 139--141

Let us add some tests to this, using the same scheme:

In [147]:
test_source = (
    '''if remove_html_markup('<foo>bar</foo>') != 'bar':\n''' +
    '''    raise RuntimeError("Missing functionality")\n''' +
    '''assert remove_html_markup('"foo"') == '"foo"', "My Test"'''
)
In [148]:
test_tree: ast.Module = ast.parse(test_source)
In [149]:
print_content(ast.unparse(test_tree), '.py')
if remove_html_markup('<foo>bar</foo>') != 'bar':
    raise RuntimeError('Missing functionality')
assert remove_html_markup('"foo"') == '"foo"', 'My Test'

We can merge the function definition tree and the test tree into a single one:

In [151]:
fun_test_tree = copy.deepcopy(fun_tree)
fun_test_tree.body += test_tree.body

Such a tree can be compiled into a code object, using Python's compile() function:

In [152]:
fun_test_code = compile(fun_test_tree, '<string>', 'exec')

and the resulting code object can be executed directly, using the Python exec() function. We see that our test fails as expected.

In [153]:
with ExpectError(AssertionError):
    exec(fun_test_code, {}, {})
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/1290587190.py", line 2, in <cell line: 1>
    exec(fun_test_code, {}, {})
  File "<string>", line 3, in <module>
AssertionError: My Test (expected)

Traversing Syntax Trees¶

Our goal is now to reduce this tree (or at least the subtree with the function definition) to a minimum. To this end, we manipulate the AST through the ast Python module. The official Python ast reference is complete, but a bit brief; the documentation "Green Tree Snakes - the missing Python AST docs" provides an excellent introduction.

The two means for exploring and changing ASTs are the classes NodeVisitor and NodeTransformer, respectively. We start with creating a list of all nodes in the tree, using a NodeVisitor subclass.

Its visit() method is called for every node in the tree, which we achieve by having it return self.generic_visit() for the current node. It saves all visited nodes in the _all_nodes attribute.

In [155]:
class NodeCollector(NodeVisitor):
    """Collect all nodes in an AST."""

    def __init__(self) -> None:
        super().__init__()
        self._all_nodes: List[AST] = []

    def generic_visit(self, node: AST) -> None:
        self._all_nodes.append(node)
        return super().generic_visit(node)

    def collect(self, tree: AST) -> List[AST]:
        """Return a list of all nodes in tree."""
        self._all_nodes = []
        self.visit(tree)
        return self._all_nodes

This is how our NodeCollector() class produces a list of all nodes:

In [156]:
fun_nodes = NodeCollector().collect(fun_tree)
len(fun_nodes)
Out[156]:
107
In [157]:
fun_nodes[:30]
Out[157]:
[<ast.Module at 0x108a0f460>,
 <ast.FunctionDef at 0x108a0f820>,
 <ast.arguments at 0x108a0f100>,
 <ast.arg at 0x108a0d2a0>,
 <ast.Assign at 0x108a0feb0>,
 <ast.Name at 0x108a0e440>,
 <ast.Store at 0x1032a24a0>,
 <ast.Constant at 0x108a0d7b0>,
 <ast.Assign at 0x108a0cf10>,
 <ast.Name at 0x108a0ce20>,
 <ast.Store at 0x1032a24a0>,
 <ast.Constant at 0x108a0f850>,
 <ast.Assign at 0x108a0d510>,
 <ast.Name at 0x108a0dc60>,
 <ast.Store at 0x1032a24a0>,
 <ast.Constant at 0x108a0ff40>,
 <ast.For at 0x108a0cca0>,
 <ast.Name at 0x108a0de10>,
 <ast.Store at 0x1032a24a0>,
 <ast.Name at 0x108a0f7c0>,
 <ast.Load at 0x1032a2440>,
 <ast.If at 0x108a0ed40>,
 <ast.BoolOp at 0x108a0c130>,
 <ast.And at 0x1032a2590>,
 <ast.Compare at 0x108a0d8a0>,
 <ast.Name at 0x108a0d7e0>,
 <ast.Load at 0x1032a2440>,
 <ast.Eq at 0x1032a2d70>,
 <ast.Constant at 0x108a0e0e0>,
 <ast.UnaryOp at 0x108a0d4e0>]

Such a list of nodes is what we can feed into Delta Debugging in order to reduce it. The idea is that with every test, we take the tree and for each node in the tree, we check whether it is still in the list – if not, we remove it. Thus, by reducing the list of nodes, we simultaneously reduce the tree as well.

Deleting Nodes¶

In our next step, we write some code that, given such a list of nodes, prunes the tree such that only elements in the list are still contained. To this end, we proceed in four steps:

  1. We traverse the original AST, marking all nodes as "to be deleted".
  2. We traverse the given list of nodes, clearing their markers.
  3. We copy the original tree (including the markers) into a new tree – the one to be reduced.
  4. We traverse the new tree, now deleting all marked nodes.

Why do we go through such an extra effort? The reason is that our list of nodes contains references into the original tree – a tree that needs to stay unchanged such that we can reuse it for later. The new tree (the copy) has the same nodes, but at different addresses, so our original references cannot be used anymore. Markers, however, just like any other attributes, are safely copied from the original into the new tree.

The NodeMarker() visitor marks all nodes in a tree:

In [158]:
class NodeMarker(NodeVisitor):
    def visit(self, node: AST) -> AST:
        node.marked = True  # type: ignore
        return super().generic_visit(node)

The NodeReducer() transformer reduces all marked nodes. If a method visit_<node class>() is defined, it will be invoked; otherwise, visit_Node() is invoked, which deletes the node (and its subtree) by returning None.

In [159]:
class NodeReducer(NodeTransformer):
    def visit(self, node: AST) -> Any:
        method = 'visit_' + node.__class__.__name__
        visitor = getattr(self, method, self.visit_Node)
        return visitor(node)

    def visit_Module(self, node: AST) -> Any:
        # Can't remove modules
        return super().generic_visit(node)

    def visit_Node(self, node: AST) -> Any:
        """Default visitor for all nodes"""
        if node.marked:  # type: ignore
            return None  # delete it
        return super().generic_visit(node)

Our function copy_and_reduce() puts these pieces together:

In [160]:
def copy_and_reduce(tree: AST, keep_list: List[AST]) -> AST:
    """Copy tree, reducing all nodes that are not in keep_list."""

    # Mark all nodes except those in keep_list
    NodeMarker().visit(tree)
    for node in keep_list:
        # print("Clearing", node)
        node.marked = False  # type: ignore

    # Copy tree and delete marked nodes
    new_tree = copy.deepcopy(tree)
    NodeReducer().visit(new_tree)
    return new_tree

Let us apply this in practice. We take the first assignment in our tree...

In [161]:
fun_nodes[4]
Out[161]:
<ast.Assign at 0x108a0feb0>

... whose subtree happens to be the assignment to tag:

In [162]:
ast.unparse(fun_nodes[4])
Out[162]:
'tag = False'

We keep all nodes except for this one.

In [163]:
keep_list = fun_nodes.copy()
del keep_list[4]

Let us now create a copy of the tree in which the assignment is missing:

In [164]:
new_fun_tree = cast(ast.Module, copy_and_reduce(fun_tree, keep_list))
show_ast(new_fun_tree)
0 FunctionDef 1 "remove_html_markup" 0--1 2 arguments 0--2 5 Assign 0--5 11 Assign 0--11 17 For 0--17 115 Assert 0--115 132 Return 0--132 3 arg 2--3 4 "s" 3--4 6 Name 5--6 9 Constant 5--9 7 "quote" 6--7 8 Store 6--8 10 False 9--10 12 Name 11--12 15 Constant 11--15 13 "out" 12--13 14 Store 12--14 16 "" 15--16 18 Name 17--18 21 Name 17--21 24 If 17--24 19 "c" 18--19 20 Store 18--20 22 "s" 21--22 23 Load 21--23 25 BoolOp 24--25 39 Assign 24--39 45 If 24--45 26 And 25--26 27 Compare 25--27 34 UnaryOp 25--34 28 Name 27--28 31 Eq 27--31 32 Constant 27--32 29 "c" 28--29 30 Load 28--30 33 "<" 32--33 35 Not 34--35 36 Name 34--36 37 "quote" 36--37 38 Load 36--38 40 Name 39--40 43 Constant 39--43 41 "tag" 40--41 42 Store 40--42 44 True 43--44 46 BoolOp 45--46 60 Assign 45--60 66 If 45--66 47 And 46--47 48 Compare 46--48 55 UnaryOp 46--55 49 Name 48--49 52 Eq 48--52 53 Constant 48--53 50 "c" 49--50 51 Load 49--51 54 ">" 53--54 56 Not 55--56 57 Name 55--57 58 "quote" 57--58 59 Load 57--59 61 Name 60--61 64 Constant 60--64 62 "tag" 61--62 63 Store 61--63 65 False 64--65 67 BoolOp 66--67 88 Assign 66--88 97 If 66--97 68 Or 67--68 69 Compare 67--69 76 BoolOp 67--76 70 Name 69--70 73 Eq 69--73 74 Constant 69--74 71 "c" 70--71 72 Load 70--72 75 """ 74--75 77 And 76--77 78 Compare 76--78 85 Name 76--85 79 Name 78--79 82 Eq 78--82 83 Constant 78--83 80 "c" 79--80 81 Load 79--81 84 "'" 83--84 86 "tag" 85--86 87 Load 85--87 89 Name 88--89 92 UnaryOp 88--92 90 "quote" 89--90 91 Store 89--91 93 Not 92--93 94 Name 92--94 95 "quote" 94--95 96 Load 94--96 98 UnaryOp 97--98 103 Assign 97--103 99 Not 98--99 100 Name 98--100 101 "tag" 100--101 102 Load 100--102 104 Name 103--104 107 BinOp 103--107 105 "out" 104--105 106 Store 104--106 108 Name 107--108 111 Add 107--111 112 Name 107--112 109 "out" 108--109 110 Load 108--110 113 "c" 112--113 114 Load 112--114 116 BoolOp 115--116 117 And 116--117 118 Compare 116--118 125 Compare 116--125 119 Constant 118--119 121 NotIn 118--121 122 Name 118--122 120 "<" 119--120 123 "out" 122--123 124 Load 122--124 126 Constant 125--126 128 NotIn 125--128 129 Name 125--129 127 ">" 126--127 130 "out" 129--130 131 Load 129--131 133 Name 132--133 134 "out" 133--134 135 Load 133--135

The new tree no longer contains the initial assignment to tag:

In [165]:
print_content(ast.unparse(new_fun_tree), '.py')
def remove_html_markup(s):
    quote = False
    out = ''
    for c in s:
        if c == '<' and (not quote):
            tag = True
        elif c == '>' and (not quote):
            tag = False
        elif c == '"' or (c == "'" and tag):
            quote = not quote
        elif not tag:
            out = out + c
    assert '<' not in out and '>' not in out
    return out

If we add our tests and then execute this code, we get an error, as tag is now no longer initialized:

In [166]:
new_fun_tree.body += test_tree.body
In [167]:
fun_code = compile(new_fun_tree, "<string>", 'exec')
In [168]:
with ExpectError(UnboundLocalError):
    exec(fun_code, {}, {})
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2411822553.py", line 2, in <cell line: 1>
    exec(fun_code, {}, {})
  File "<string>", line 3, in <module>
  File "<string>", line 13, in remove_html_markup
UnboundLocalError: local variable 'tag' referenced before assignment (expected)

If we have no node in the keep list, the whole tree gets deleted:

In [169]:
empty_tree = copy_and_reduce(fun_tree, [])
In [170]:
ast.unparse(empty_tree)
Out[170]:
''

Reducing Trees¶

We can put all these steps together in a single function. compile_and_test_ast() takes a tree and a list of nodes, reduces the tree to those nodes in the list, and then compiles and runs the reduced AST.

In [171]:
def compile_and_test_ast(tree: ast.Module, keep_list: List[AST], 
                         test_tree: Optional[ast.Module] = None) -> None:
    new_tree = cast(ast.Module, copy_and_reduce(tree, keep_list))
    # print(ast.unparse(new_tree))

    if test_tree is not None:
        new_tree.body += test_tree.body

    try:
        code_object = compile(new_tree, '<string>', 'exec')
    except Exception:
        raise SyntaxError("Cannot compile")

    exec(code_object, {}, {})
In [172]:
with ExpectError(AssertionError):
    compile_and_test_ast(fun_tree, fun_nodes, test_tree)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2588008834.py", line 2, in <cell line: 1>
    compile_and_test_ast(fun_tree, fun_nodes, test_tree)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/1107678207.py", line 14, in compile_and_test_ast
    exec(code_object, {}, {})
  File "<string>", line 3, in <module>
AssertionError: My Test (expected)

When we run our delta debugger on the AST, this is the list of remaining nodes we obtain:

In [173]:
with DeltaDebugger() as dd:
    compile_and_test_ast(fun_tree, fun_nodes, test_tree)
In [174]:
reduced_nodes = dd.min_args()['keep_list']
len(reduced_nodes)
Out[174]:
57

This is the associated tree:

In [175]:
reduced_fun_tree = copy_and_reduce(fun_tree, reduced_nodes)
show_ast(reduced_fun_tree)
0 FunctionDef 1 "remove_html_markup" 0--1 2 arguments 0--2 5 Assign 0--5 11 Assign 0--11 17 Assign 0--17 23 For 0--23 121 Return 0--121 3 arg 2--3 4 "s" 3--4 6 Name 5--6 9 Constant 5--9 7 "tag" 6--7 8 Store 6--8 10 False 9--10 12 Name 11--12 15 Constant 11--15 13 "quote" 12--13 14 Store 12--14 16 False 15--16 18 Name 17--18 21 Constant 17--21 19 "out" 18--19 20 Store 18--20 22 "" 21--22 24 Name 23--24 27 Name 23--27 30 If 23--30 25 "c" 24--25 26 Store 24--26 28 "s" 27--28 29 Load 27--29 31 BoolOp 30--31 45 Assign 30--45 51 If 30--51 32 And 31--32 33 Compare 31--33 40 UnaryOp 31--40 34 Name 33--34 37 Eq 33--37 38 Constant 33--38 35 "c" 34--35 36 Load 34--36 39 "<" 38--39 41 Not 40--41 42 Name 40--42 43 "quote" 42--43 44 Load 42--44 46 Name 45--46 49 Constant 45--49 47 "tag" 46--47 48 Store 46--48 50 True 49--50 52 BoolOp 51--52 66 Assign 51--66 72 If 51--72 53 And 52--53 54 Compare 52--54 61 UnaryOp 52--61 55 Name 54--55 58 Eq 54--58 59 Constant 54--59 56 "c" 55--56 57 Load 55--57 60 ">" 59--60 62 Not 61--62 63 Name 61--63 64 "quote" 63--64 65 Load 63--65 67 Name 66--67 70 Constant 66--70 68 "tag" 67--68 69 Store 67--69 71 False 70--71 73 BoolOp 72--73 94 Assign 72--94 103 If 72--103 74 Or 73--74 75 Compare 73--75 82 BoolOp 73--82 76 Name 75--76 79 Eq 75--79 80 Constant 75--80 77 "c" 76--77 78 Load 76--78 81 """ 80--81 83 And 82--83 84 Compare 82--84 91 Name 82--91 85 Name 84--85 88 Eq 84--88 89 Constant 84--89 86 "c" 85--86 87 Load 85--87 90 "'" 89--90 92 "tag" 91--92 93 Load 91--93 95 Name 94--95 98 UnaryOp 94--98 96 "quote" 95--96 97 Store 95--97 99 Not 98--99 100 Name 98--100 101 "quote" 100--101 102 Load 100--102 104 UnaryOp 103--104 109 Assign 103--109 105 Not 104--105 106 Name 104--106 107 "tag" 106--107 108 Load 106--108 110 Name 109--110 113 BinOp 109--113 111 "out" 110--111 112 Store 110--112 114 Name 113--114 117 Add 113--117 118 Name 113--118 115 "out" 114--115 116 Load 114--116 119 "c" 118--119 120 Load 118--120 122 Name 121--122 123 "out" 122--123 124 Load 122--124

And this is its textual representation:

In [176]:
print_content(ast.unparse(reduced_fun_tree), '.py')
def remove_html_markup(s):
    tag = False
    quote = False
    out = ''
    for c in s:
        if c == '<' and (not quote):
            tag = True
        elif c == '>' and (not quote):
            tag = False
        elif c == '"' or (c == "'" and tag):
            quote = not quote
        elif not tag:
            out = out + c
    return out
In [177]:
dd.tests
Out[177]:
310

We see that some code was deleted – notably the assertion at the end – but otherwise, our deletion strategy was not particularly effective. This is because in Python, one cannot simply delete the single statement in a controlled body – this raises a syntax error. One would have to replace it with pass (or some other statement with no effect) to stay syntactically valid. Still, the syntax-based reduction would still single out remove_html_markup() from the Assertions source code – and do so even faster, as it would apply on one definition (rather than one line) after another.

Transforming Nodes¶

To further boost our syntactic reduction strategy, we implement a set of additional reduction operators. First, as already discussed, we do not simply delete an assignment, but we replace it with a pass statement. To obtain the tree for pass, we simply parse it and access the subtree.

In [178]:
class NodeReducer(NodeReducer):
    PASS_TREE = ast.parse("pass").body[0]

    def visit_Assign(self, node: ast.Assign) -> AST:
        if node.marked:  # type: ignore
            # Replace by pass
            return self.PASS_TREE
        return super().generic_visit(node)

In a similar vein, we can replace comparison operators with False:

In [179]:
class NodeReducer(NodeReducer):
    FALSE_TREE = ast.parse("False").body[0].value  # type: ignore

    def visit_Compare(self, node: ast.Compare) -> AST:
        if node.marked:  # type: ignore
            # Replace by False
            return self.FALSE_TREE
        return super().generic_visit(node)

If we have a Boolean operator, we attempt to replace it with its left operand:

In [180]:
class NodeReducer(NodeReducer):
    def visit_BoolOp(self, node: ast.BoolOp) -> AST:
        if node.marked:  # type: ignore
            # Replace by left operator
            return node.values[0]
        return super().generic_visit(node)

And if we find an If clause, we attempt to replace it by its body:

In [181]:
class NodeReducer(NodeReducer):
    def visit_If(self, node: ast.If) -> Union[AST, List[ast.stmt]]:
        if node.marked:  # type: ignore
            # Replace by body
            return node.body
        return super().generic_visit(node)

Let us try to reduce our code with these additional reducers enabled:

In [182]:
with DeltaDebugger() as dd:
    compile_and_test_ast(fun_tree, fun_nodes, test_tree)

This is the reduced code we get. We see that all references to quote have gone, as has the handling of single quotes – none of this is relevant for the failure:

In [183]:
reduced_nodes = dd.min_args()['keep_list']
reduced_fun_tree = copy_and_reduce(fun_tree, reduced_nodes)
print_content(ast.unparse(reduced_fun_tree), '.py')
def remove_html_markup(s):
    tag = False
    pass
    out = ''
    for c in s:
        if c == '<':
            tag = True
        elif c == '>':
            tag = False
        elif c == '"':
            pass
        elif not tag:
            out = out + c
    return out

Again, the best insights come from comparing this reduced version to the original implementation – and we learn that the problem is not related to the quote variable, or to the handling of single quotes; the problem is simply that when the input contains double quotes, these are not added to the final string.

With our reduction code, however, we only touch the surface of what could actually be possible. So far, we implement exactly one reduction per node – but of course, there are many alternatives an expression or statement could be reduced to. We will explore some of these in the exercises, below; also be sure to check out the background on code reduction.

Synopsis¶

A reducer takes a failure-inducing input and reduces it to the minimum that still reproduces the failure. This chapter provides a DeltaDebugger class that implements such a reducer.

Here is a simple example: An arithmetic expression causes an error in the Python interpreter:

In [184]:
def myeval(inp: str) -> Any:
    return eval(inp)
In [185]:
with ExpectError(ZeroDivisionError):
    myeval('1 + 2 * 3 / 0')
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/4002351332.py", line 2, in <cell line: 1>
    myeval('1 + 2 * 3 / 0')
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_1782/2200911420.py", line 2, in myeval
    return eval(inp)
  File "<string>", line 1, in <module>
ZeroDivisionError: division by zero (expected)

Can we reduce this input to a minimum? Delta Debugging is a simple and robust reduction algorithm. We provide a DeltaDebugger class that is used in conjunction with a (failing) function call:

with DeltaDebugger() as dd:
    fun(args...)
dd

The class automatically determines minimal arguments that cause the function to fail with the same exception as the original. Printing out the class object reveals the minimized call.

In [186]:
with DeltaDebugger() as dd:
    myeval('1 + 2 * 3 / 0')
dd
Out[186]:
myeval(inp='3/0')

The input is reduced to the minimum: We get the essence of the division by zero.

There also is an interface to access the reduced input(s) programmatically. The method min_args() returns a dictionary in which all function arguments are minimized:

In [187]:
dd.min_args()
Out[187]:
{'inp': '3/0'}

In contrast, max_args() returns a dictionary in which all function arguments are maximized, but still pass:

In [188]:
dd.max_args()
Out[188]:
{'inp': '1 + 2 * 3  '}

The method min_arg_diff() returns a triple of

  • passing input,
  • failing input, and
  • their minimal failure-inducing difference:
In [189]:
dd.min_arg_diff()
Out[189]:
({'inp': ' 3 '}, {'inp': ' 3 /0'}, {'inp': '/0'})

And you can also access the function itself, as well as its original arguments.

In [190]:
dd.function().__name__, dd.args()
Out[190]:
('myeval', {'inp': '1 + 2 * 3 / 0'})

DeltaDebugger processes (i.e., minimizes or maximizes) all arguments that support a len() operation and that can be indexed – notably strings and lists. If a function has multiple arguments, all arguments that can be processed will be processed.

This chapter also provides a number of superclasses to DeltaDebugger, notably CallCollector, which obtains the first function call for DeltaDebugger. CallReducer classes allow for implementing alternate call reduction strategies.

In [191]:
# ignore
from ClassDiagram import display_class_hierarchy
In [192]:
# ignore
display_class_hierarchy([DeltaDebugger],
                        public_methods=[
                            StackInspector.caller_frame,
                            StackInspector.caller_function,
                            StackInspector.caller_globals,
                            StackInspector.caller_locals,
                            StackInspector.caller_location,
                            StackInspector.search_frame,
                            StackInspector.search_func,
                            StackInspector.is_internal_error,
                            StackInspector.our_frame,
                            CallCollector.__init__