Generalizing Failure Circumstances

One central question in debugging is: Does this bug occur in other situations, too? In this chapter, we present a technique that is set to generalize the circumstances under which a failure occurs. The DDSET algorithm takes a failure-inducing input, breaks it into individual elements. For each element, it tries to find whether it can be replaced by others in the same category, and if so, it generalizes the concrete element to the very category. The result is a pattern that characterizes the failure condition: "The failure occurs for all inputs of the form (<expr> * <expr>)".

from bookutils import YouTubeVideo



To use the code provided in this chapter, write

>>> from debuggingbook.DDSetDebugger import <identifier>

and then make use of the following features.

This chapter provides a class DDSetDebugger, implementing the DDSET algorithm for generalizing failure-inducing inputs. The DDSetDebugger is used as follows:

with DDSetDebugger(grammar) as dd:

Here, function(args...) is a failing function call (= raises an execption) that takes at least one string argument; grammar is an input grammar in fuzzingbook format that matches the format of this argument.

The result is a call of function() with an abstract failure-inducing input – a variant of the conrete input in which parts are replaced by placeholders in the form <name>, where <name> is a nonterminal in the grammar. The failure has been verified to occur for a number of instantiations of <name>.

Here is an example of how DDSetDebugger works. The concrete failing input <foo>"bar</foo> is generalized to an abstract failure-inducing input:

>>> with DDSetDebugger(SIMPLE_HTML_GRAMMAR) as dd:
>>>     remove_html_markup('<foo>"bar</foo>')
>>> dd

The abstract input tells us that the failure occurs for whatever opening and closing HTML tags as long as there is a double quote between them.

A programmatic interface is available as well. generalize() returns a mapping of argument names to (generalized) values:

>>> dd.generalize()
{'s': '<lt><id><gt>"<plain-text><closing-tag>'}

Using fuzz(), the abstract input can be instantiated to further concrete inputs, all set to produce the failure again:

>>> for i in range(10):
>>>     print(dd.fuzz())

DDSetDebugger can be customized by passing a subclass of TreeGeneralizer, which does the gist of the work; for details, see its constructor. The full class hierarchy is shown below.

DDSetDebugger DDSetDebugger __init__() __repr__() fuzz() fuzz_args() generalize() CallCollector CallCollector __enter__() __exit__() __init__() args() call() exception() function() DDSetDebugger->CallCollector StackInspector StackInspector _generated_function_cache CallCollector->StackInspector TreeGeneralizer TreeGeneralizer __init__() can_generalize() find_paths() fuzz_tree() generalizable_paths() generalize() generalize_path() test_tree() TreeMutator TreeMutator __init__() get_subtree() mutate() new_tree() TreeGeneralizer->TreeMutator Legend Legend •  public_method() •  private_method() •  overloaded_method() Hover over names to see doc

A Failing Program

As with previous chapters, we use remove_html_markup() as our ongoing example. This function is set to remove HTML markup tags (like <em>) from a given string s, returning the plain text only. We use the version from the chapter on asssertions, using an assertion as postcondition checker.

def remove_html_markup(s):
    tag = False
    quote = False
    out = ""

    for c in s:
        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c

    # postcondition
    assert '<' not in out and '>' not in out

    return out

For the most inputs, remove_html_markup() works just as expected:

remove_html_markup("Be <em>quiet</em>, he said")
'Be quiet, he said'

There are inputs, however, for which it fails:

BAD_INPUT = '<foo>"bar</foo>'
from ExpectError import ExpectError
with ExpectError(AssertionError):
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_47258/", line 2, in <cell line: 1>
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_47258/", line 17, in remove_html_markup
    assert '<' not in out and '>' not in out
AssertionError (expected)
from bookutils import quiz

In contrast to the other chapters, our aim now is not to immediately go and debug remove_html_markup(). Instead, we focus on another important question:

Under which conditions precisely does remove_html_markup() fail?

This question can be generalized to

What is the set of inputs for which remove_html_markup() fails?

Our plan for this is to generalize concrete inputs (such as BAD_INPUTS) into an abstract failure-inducing inputs. These are patterns formed from a concrete input, but in which specific placeholders indicate sets of inputs that are permitted. In the abstract failure-inducing input


for instance, <opening-tag> and <closing-tag> are placeholders for opening and closing HTML tags, respectively. The pattern indicates that any opening HTML tag and closing HTML tag can be present in the input, as long as the enclosed text reads "bar.

Given a concrete failure-inducing input, our aim is to generalize it as much as possible to such an abstract failure-inducing input. The resulting pattern should then

  • capture the circumstances under which the program fails;
  • allow for test generation by instantiating the placeholders;
  • help ensuring our fix is as general as possible.


If s = '<foo>"bar</foo>' (i.e., BAD_INPUT), what is the value of out such that the assertion fails?


To determine abstract failure-inducing inputs, we need means to determine and characterize sets of inputs – known in computer science as languages. To formally describe languages, the field of formal languages has devised a number of language specifications that describe a language. Regular expressions represent the simplest class of these languages to denote sets of strings: The regular expression [a-z]*, for instance, denotes a (possibly empty) sequence of lowercase letters. Automata theory connects these languages to automata that accept these inputs; finite state machines, for instance, can be used to specify the language of regular expressions.

Regular expressions are great for not-too-complex input formats, and the associated finite state machines have many properties that make them great for reasoning. To specify more complex inputs, though, they quickly encounter limitations. At the other end of the language spectrum, we have universal grammars that denote the language accepted by Turing machines. A Turing machine can compute anything that can be computed; and with Python being Turing-complete, this means that we can also use a Python program $p$ to specify or even enumerate legal inputs. But then, computer science theory also tells us that each such program has to be written specifically for the input to be considered, which is not the level of automation we want.

The middle ground between regular expressions and Turing machines is covered by grammars. Grammars are among the most popular (and best understood) formalisms to formally specify input languages. Using a grammar, one can express a wide range of the properties of an input language. Grammars are particularly great for expressing the syntactical structure of an input, and are the formalism of choice to express nested or recursive inputs. The grammars we use are so-called context-free grammars, one of the easiest and most popular grammar formalisms.

A grammar is defined as a mapping of nonterminal symbols (denoted in <angle brackets>) to lists of alternative expansions, which are strings containing terminal symbols and possibly more nonterminal symbols. To make the writing of grammars as simple as possible, we adopt the fuzzingbook format that is based on strings and lists.

import fuzzingbook

Fuzzingbook grammars take the format of a mapping between symbol names and expansions, where expansions are lists of alternatives.

Grammar = Dict[str,  # A grammar maps strings...
                   Union[str,  # to list of strings...
                         Tuple[str, Dict[str, Any]]  # or to pairs of strings and attributes.

A one-rule grammar for digits thus takes the form

DIGIT_GRAMMAR: Grammar = {
        ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]

which means that the <start> symbol can be expanded into any of the digits listed.

A full grammar for arithmetic expressions looks like this:

EXPR_GRAMMAR: Grammar = {

        ["<term> + <expr>", "<term> - <expr>", "<term>"],

        ["<factor> * <term>", "<factor> / <term>", "<factor>"],


        ["<digit><integer>", "<digit>"],

        ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]

From such a grammar, one can easily generate inputs that conform to the grammar.

from fuzzingbook.GrammarFuzzer import GrammarFuzzer
simple_expr_fuzzer = GrammarFuzzer(EXPR_GRAMMAR)
for i in range(10):
    fuzz_expr = simple_expr_fuzzer.fuzz()
3.8 + --62.912 - ++4 - +5 * 3.0 * 4
7 * (75.5 - -6 + 5 - 4) + -(8 - 1) / 5 * 2
(-(9) * +6 + 9 / 3 * 8 - 9 * 8 / 7) / -+-65
(9 + 8) * 2 * (6 + 6 + 9) * 0 * 1.9 * 0
(1 * 7 - 9 + 5) * 5 / 0 * 5 + 7 * 5 * 7
-(6 / 9 - 5 - 3 - 1) - -1 / +1 + (9) / (8) * 6
(+-(0 - (1) * 7 / 3)) / ((1 * 3 + 8) + 9 - +1 / --0) - 5 * (-+939.491)
+2.9 * 0 / 501.19814 / --+--(6.05002)
+-8.8 / (1) * -+1 + -8 + 9 - 3 / 8 * 6 + 4 * 3 * 5
(+(8 / 9 - 1 - 7)) + ---06.30 / +4.39

Nonterminals as found in the grammar make natural placeholders in abstract failure-inducing inputs. If we know, for instance, that it is not just the concrete failure-inducing input

(2 * 3)

but the abstract failure-inducing input

(<expr> * <expr>)

that causes the failure, we immediately see that the error is due to the multiplication operator rather than its operands.

Coming back to our remove_html_markup() example, let us create a simple grammar for HTML expressions. A <html> element is either plain text or tagged text.


        ["<plain-text>", "<tagged-text>"],

Plain text is a simple (possibly empty) sequence of letter, digits, punctuation, and whitespace. (Note how <plain-text> is either empty or some character followed by more plain text.) The characters < and > are not allowed, though.

import string
        ["", "<plain-char><plain-text>"],

        ["<letter>", "<digit>", "<other>", "<whitespace>"],

    "<letter>": list(string.ascii_letters),
    "<digit>": list(string.digits),
    "<other>": list(string.punctuation.replace('<', '').replace('>', '')),
    "<whitespace>": list(string.whitespace)

Tagged text is a bit more complicated. We have opening tags <foo>, followed by some more HTML material, and then closed by a closing tag </foo>. (We do not insist that the two tags match.) A self-closing tag has the form <br/>. For compatibility reasons, we also allow just opening tags without closing tags, as in <img>.


Since the characters < and > are already reserved for denoting nonterminal symbols, we use the special nonterminal symbols <lt> and <gt> that expand into < and >, respectively,


    "<lt>": ["<"],
    "<gt>": [">"],

        ["<letter>", "<id><letter>", "<id><digit>"],



Finally, HTML tags can have attributes, which are enclosed in quotes.

        ["<attr>", "<attr><attrs>" ],

        [" <id>='<plain-text>'",
         ' <id>="<plain-text>"'],

Again, we can generate inputs from the grammar.

simple_html_fuzzer = GrammarFuzzer(SIMPLE_HTML_GRAMMAR)
for i in range(10):
    fuzz_html = simple_html_fuzzer.fuzz()
'<T3 xG="">'
'<N9cd U=\'\' y=\'l1\' v0="" tb4ya="" UbD=\'\'>9</R>'
' ea\\\\'
"<c1 o2='' x9661lQo64T=''/>"
'<j CI=\'\' T98sJ="" DR4=\'\'/>'
'<FQc90 Wt=""/>'

Such inputs, of course, are great for systematic testing. Our sister book, the fuzzing book, covers these and more.

Derivation Trees

To produce inputs from a grammar, the fuzzingbook GrammarFuzzer makes use of a structure called a derivation tree (also known as syntax tree). A derivation tree encodes the individual expansion steps undertaken while producing the output.

DerivationTree = Tuple[str, Optional[List[Any]]]

Let us illustrate derivation trees by example, using the last HTML output we produced.

'<FQc90 Wt=""/>'

The GrammarFuzzer attribute derivation_tree holds the last tree used to produce this input. We can visualize the tree as follows:

0 <start> 1 <html> 0->1 2 <tagged-text> 1->2 3 <self-closing-tag> 2->3 4 <lt> 3->4 6 <id> 3->6 21 <attrs> 3->21 34 / (47) 3->34 35 <gt> 3->35 5 < (60) 4->5 7 <id> 6->7 19 <digit> 6->19 8 <id> 7->8 17 <digit> 7->17 9 <id> 8->9 15 <letter> 8->15 10 <id> 9->10 13 <letter> 9->13 11 <letter> 10->11 12 F (70) 11->12 14 Q (81) 13->14 16 c (99) 15->16 18 9 (57) 17->18 20 0 (48) 19->20 22 <attr> 21->22 23  (32) 22->23 24 <id> 22->24 30 =" 22->30 31 <plain-text> 22->31 33 " (34) 22->33 25 <id> 24->25 28 <letter> 24->28 26 <letter> 25->26 27 W (87) 26->27 29 t (116) 28->29 32 31->32 36 > (62) 35->36

From top to bottom, we see that the input was constructed from a <start> symbol, which then expanded into html, which then expanded into HTML text, and so on. Multiple children in a tree stand for a concatenation of individual symbols.

Internally, these trees come as pairs (symbol, children), where symbol is the name of a node (say, <html>), and children is a (possibly empty) list of subtrees. Here are the topmost nodes of the above tree:

import pprint
pp = pprint.PrettyPrinter(depth=7)
('<start>', [('<html>', [('<tagged-text>', [('<self-closing-tag>', [...])])])])

To produce abstract failure-inducing patterns, we will work on this very structure. The idea is to

  1. systematically replace subtrees by other, generated, compatible subtrees (e.g. replace one <html> subtree in the concrete input by some other generated <html> subtree);
  2. see whether these subtrees also result in failures; and
  3. if they do, use the nonterminal (<html>) as a placeholder in the pattern.

This will involve some subtree manipulation, construction, and finally testing. First, though, we need to be able to turn an existing input into a derivation tree.


The activity of creating a structure out of an unstructured input is called parsing. Generally speaking, a parser uses a grammar to create a derivation tree (also called parse tree in parsing contexts) from a string input.

Again, there's a whole body of theory (and practice!) around constructing parsers. We make our life simple by using an existing parser (again, from the fuzzing book), which does just what we want. The EarleyParser is instantiated with a grammar such as SIMPLE_HTML_GRAMMAR:

from fuzzingbook.Parser import Parser, EarleyParser  # minor dependency
simple_html_parser = EarleyParser(SIMPLE_HTML_GRAMMAR)

Its method parse() returns an iterator over multiple possible derivation trees. (There can be multiple trees because the grammar could be ambiguous). We are only interested in the first such tree. Let us parse BAD_INPUT and inspect the resulting ~parse tree~ ~syntax tree~ derivation tree:

bad_input_tree = list(simple_html_parser.parse(BAD_INPUT))[0]
0 <start> 1 <html> 0->1 2 <tagged-text> 1->2 3 <opening-tag> 2->3 17 <html> 2->17 35 <closing-tag> 2->35 4 <lt> 3->4 6 <id> 3->6 15 <gt> 3->15 5 < (60) 4->5 7 <id> 6->7 13 <letter> 6->13 8 <id> 7->8 11 <letter> 7->11 9 <letter> 8->9 10 f (102) 9->10 12 o (111) 11->12 14 o (111) 13->14 16 > (62) 15->16 18 <plain-text> 17->18 19 <plain-char> 18->19 22 <plain-text> 18->22 20 <other> 19->20 21 " (34) 20->21 23 <plain-char> 22->23 26 <plain-text> 22->26 24 <letter> 23->24 25 b (98) 24->25 27 <plain-char> 26->27 30 <plain-text> 26->30 28 <letter> 27->28 29 a (97) 28->29 31 <plain-char> 30->31 34 <plain-text> 30->34 32 <letter> 31->32 33 r (114) 32->33 36 <lt> 35->36 38 / (47) 35->38 39 <id> 35->39 48 <gt> 35->48 37 < (60) 36->37 40 <id> 39->40 46 <letter> 39->46 41 <id> 40->41 44 <letter> 40->44 42 <letter> 41->42 43 f (102) 42->43 45 o (111) 44->45 47 o (111) 46->47 49 > (62) 48->49

This derivation tree has the same structure as the one created from our GrammarFuzzer above. We see how the <tagged-text> is composed of three elements:

  1. an<opening-tag> (<foo>);
  2. a <html> element which becomes <plain-text> ("bar); and
  3. a <closing-tag> (</foo>).

We can easily turn the tree back into a string. The method tree_to_string() traverses the tree left-to-right and joins all nonterminal symbols.

from fuzzingbook.GrammarFuzzer import tree_to_string, all_terminals
assert tree_to_string(bad_input_tree) == BAD_INPUT

With this, we can now

  • parse an input into a tree structure;
  • (re-)create parts of the tree structure; and
  • turn the tree back into an input string.

Mutating the Tree

We introduce a class TreeMutator that is set to mutate a tree. Its constructor takes a grammar and a tree.

from fuzzingbook.Grammars import is_valid_grammar
class TreeMutator:
    """Grammar-based mutations of derivation trees."""

    def __init__(self, grammar: Grammar, tree: DerivationTree,
                 fuzzer: Optional[GrammarFuzzer] = None, log: Union[bool, int] = False):
        `grammar` is the underlying grammar; 
        `tree` is the tree to work on.
        `fuzzer` is the grammar fuzzer to use (default: `GrammarFuzzer`)

        assert is_valid_grammar(grammar)
        self.grammar = grammar
        self.tree = tree
        self.log = log

        if fuzzer is None:
            fuzzer = GrammarFuzzer(grammar)

        self.fuzzer = fuzzer

Referencing Subtrees

To reference individual elements in the tree, we introduce the concept of a path. A path is a list of numbers indicating the children (starting with 0) we should follow. A path [0, 0, 0, ..., 0] stands for the leftmost child in a tree.

TreePath = List[int]

The method get_subtree() returns the subtree for a given path.

class TreeMutator(TreeMutator):
    def get_subtree(self, path: TreePath, tree: Optional[DerivationTree] = None) -> DerivationTree:
        """Access a subtree based on `path` (a list of children numbers)"""
        if tree is None:
            tree = self.tree

        symbol, children = tree

        if not path or children is None:
            return tree

        return self.get_subtree(path[1:], children[path[0]])

Here's get_subtree() in action. We instantiate a TreeMutator on the BAD_INPUT tree as shown above and return the element at the path [0, 0, 1, 0] – i.e. follow the leftmost edge twice, than the second-to-leftmost edge, then the leftmost edge again. This gives us the <plain-text> subtree representing the string "bar:

def bad_input_tree_mutator() -> TreeMutator:
    return TreeMutator(SIMPLE_HTML_GRAMMAR, bad_input_tree, log=2)    
plain_text_subtree = bad_input_tree_mutator().get_subtree([0, 0, 1, 0])
 [('<plain-char>', [('<other>', [('"', [])])]),
   [('<plain-char>', [('<letter>', [...])]),
    ('<plain-text>', [('<plain-char>', [...]), ('<plain-text>', [...])])])])


In bad_input_tree, what is the subtree at the path [0, 0, 2, 1] as string?

Creating new Subtrees

The method new_tree() creates a new subtree for the given <start_symbol> according to the rules of the grammar. It invokes expand_tree() on the given GrammarFuzzer – a method that takes an initial (empty) tree and expands it until no more expansions are left.

class TreeMutator(TreeMutator):
    def new_tree(self, start_symbol: str) -> DerivationTree:
        """Create a new subtree for <start_symbol>."""

        if self.log >= 2:
            print(f"Creating new tree for {start_symbol}")

        tree = (start_symbol, None)
        return self.fuzzer.expand_tree(tree)

Here is an example of new_tree():

plain_text_tree = cast(TreeMutator, bad_input_tree_mutator()).new_tree('<plain-text>')
Creating new tree for <plain-text>
0 <plain-text> 1 0->1

Mutating the Tree

With us now being able to

  • access a particular path in the tree (get_subtree()) and
  • create a new subtree (new_tree()),

we can mutate the tree at a given path. This is the task of mutate().

class TreeMutator(TreeMutator):
    def mutate(self, path: TreePath, tree: Optional[DerivationTree] = None) -> DerivationTree:
        """Return a new tree mutated at `path`"""
        if tree is None:
            tree = self.tree
        assert tree is not None

        symbol, children = tree

        if not path or children is None:
            return self.new_tree(symbol)

        head = path[0]
        new_children = (children[:head] +
                        [self.mutate(path[1:], children[head])] +
                        children[head + 1:])
        return symbol, new_children

Here is an example of mutate() in action. We mutate bad_input_tree at the path [0, 0, 1, 0] – that is, <plain-text>:

mutated_tree = cast(TreeMutator, bad_input_tree_mutator()).mutate([0, 0, 1, 0])
Creating new tree for <plain-text>
0 <start> 1 <html> 0->1 2 <tagged-text> 1->2 3 <opening-tag> 2->3 17 <html> 2->17 24 <closing-tag> 2->24 4 <lt> 3->4 6 <id> 3->6 15 <gt> 3->15 5 < (60) 4->5 7 <id> 6->7 13 <letter> 6->13 8 <id> 7->8 11 <letter> 7->11 9 <letter> 8->9 10 f (102) 9->10 12 o (111) 11->12 14 o (111) 13->14 16 > (62) 15->16 18 <plain-text> 17->18 19 <plain-char> 18->19 22 <plain-text> 18->22 20 <whitespace> 19->20 21  (32) 20->21 23 22->23 25 <lt> 24->25 27 / (47) 24->27 28 <id> 24->28 37 <gt> 24->37 26 < (60) 25->26 29 <id> 28->29 35 <letter> 28->35 30 <id> 29->30 33 <letter> 29->33 31 <letter> 30->31 32 f (102) 31->32 34 o (111) 33->34 36 o (111) 35->36 38 > (62) 37->38

We see that the <plain-text> subtree is now different, which also becomes evident in the string representation.

'<foo> </foo>'

Generalizing Trees

Now for the main part – finding out which parts of a tree can be generalized. Our idea is to test a finite number of mutations to a subtree (say, 10). If all of these tests fail as well, then we assume we can generalize the subtree to a placeholder.

We introduce a class TreeGeneralizer for this purpose. On top of grammar and tree already used for the TreeMutator constructor, the TreeGeneralizer also takes a test function.

class TreeGeneralizer(TreeMutator):
    """Determine which parts of a derivation tree can be generalized."""

    def __init__(self, grammar: Grammar, tree: DerivationTree, test: Callable,
                 max_tries_for_generalization: int = 10, **kwargs: Any) -> None:
        Constructor. `grammar` and `tree` are as in `TreeMutator`.
        `test` is a function taking a string that either
          * raises an exception, indicating test failure;
          * or not, indicating test success.
        `max_tries_for_generalization` is the number of times
        an instantiation has to fail before it is generalized.

        super().__init__(grammar, tree, **kwargs)
        self.test = test
        self.max_tries_for_generalization = max_tries_for_generalization

The test function is used in test_tree(), returning False if the test fails (raising an exception), and True if the test passes (no exception).

class TreeGeneralizer(TreeGeneralizer):
    def test_tree(self, tree: DerivationTree) -> bool:
        """Return True if testing `tree` passes, else False"""
        s = tree_to_string(tree)
        if self.log:
            print(f"Testing {repr(s)}...", end="")
        except Exception as exc:
            if self.log:
                print(f"FAIL ({type(exc).__name__})")
            ret = False
            if self.log:
            ret = True

        return ret

Testing for Generalization

The can_generalize() method brings the above methods together. It creates a number of tree mutations at the given path, and returns True if all of them produce a failure. (Note that this is not as sophisticated as our delta debugger implementation, which also checks that the same error occurs.)

class TreeGeneralizer(TreeGeneralizer):
    def can_generalize(self, path: TreePath, tree: Optional[DerivationTree] = None) -> bool:
        """Return True if the subtree at `path` can be generalized."""
        for i in range(self.max_tries_for_generalization):
            mutated_tree = self.mutate(path, tree)
            if self.test_tree(mutated_tree):
                # Failure no longer occurs; cannot abstract
                return False

        return True

Let us put TreeGeneralizer into action. We can directly use remove_html_markup() as test function.

def bad_input_tree_generalizer(**kwargs: Any) -> TreeGeneralizer:
    return TreeGeneralizer(SIMPLE_HTML_GRAMMAR, bad_input_tree,
                           remove_html_markup, **kwargs)

On our BAD_INPUT (and its tree), can we generalize the root <html> node? In other words, does the failure occur for all possible <html> inputs?

Testing "<l35Gmq W2G571=''>"...PASS

The answer is no. The first alternative passes the test; hence no generalization.

How about the middle <plain_text> part? Can we generalize this?

bad_input_tree_generalizer(log=True).can_generalize([0, 0, 1, 0])
Testing '<foo>8</foo>'...PASS

The answer is no – just as above.

How about the closing tag? Can we generalize this one?

bad_input_tree_generalizer(log=True).can_generalize([0, 0, 2])
Testing '<foo>"bar</e7N>'...FAIL (AssertionError)
Testing '<foo>"bar</q37A>'...FAIL (AssertionError)
Testing '<foo>"bar</W>'...FAIL (AssertionError)
Testing '<foo>"bar</z93Q5>'...FAIL (AssertionError)
Testing '<foo>"bar</WR>'...FAIL (AssertionError)
Testing '<foo>"bar</m>'...FAIL (AssertionError)
Testing '<foo>"bar</Uy443wt1h7>'...FAIL (AssertionError)
Testing '<foo>"bar</fon2>'...FAIL (AssertionError)
Testing '<foo>"bar</M>'...FAIL (AssertionError)
Testing '<foo>"bar</g9>'...FAIL (AssertionError)

Yes, we can! All alternate instantiations of <closing-tag> result in a failure.


Is this also true for <opening-tag>?

Note that the above does not hold for <opening-tag>. If the attribute value contains a quote character, it will extend to the end of the input. This is another error, but not caught by our assertion; hence, the input will be flagged as passing:

BAD_ATTR_INPUT = '<foo attr="\'">bar</foo>'

The effect of this is that there are patterns for <opening-tag> which do not cause the failure to occur; hence, <opening-tag> is not a fully valid generalization.

This, however, becomes apparent only if one of our generated tests includes a quote character in the attribute value. Since quote characters are as likely (or as unlikely) to appear as other characters, this effect may not become apparent in our default 10 tests:

bad_input_tree_generalizer().can_generalize([0, 0, 0])

It will become apparent, however, as we increase the number of tests:

bad_input_tree_generalizer(max_tries_for_generalization=100, log=True).can_generalize([0, 0, 0])
Testing '<Ada np=\'7\' y7v=\'\'>"bar</foo>'...FAIL (AssertionError)
Testing '<B V=\'\'>"bar</foo>'...FAIL (AssertionError)
Testing '<K v="$" s5F="\x0b" q="" E=\'\'>"bar</foo>'...FAIL (AssertionError)
Testing '<Fcdt8 v7A4u=\'.\t\'>"bar</foo>'...FAIL (AssertionError)
Testing '<s n="">"bar</foo>'...FAIL (AssertionError)
Testing '<W>"bar</foo>'...FAIL (AssertionError)
Testing '<ap>"bar</foo>'...FAIL (AssertionError)
Testing '<B1>"bar</foo>'...FAIL (AssertionError)
Testing '<Q00wY M=\'\r \'>"bar</foo>'...FAIL (AssertionError)
Testing '<O6 d7="" H=\'\'>"bar</foo>'...FAIL (AssertionError)
Testing '<v1IH w="" ZI="" O="">"bar</foo>'...FAIL (AssertionError)
Testing '<T1 w998=\'a\' j=\'z\n7\'>"bar</foo>'...FAIL (AssertionError)
Testing '<Dnh1>"bar</foo>'...FAIL (AssertionError)
Testing '<D F9="" x4=\'\' Hup=\'7\n\'>"bar</foo>'...FAIL (AssertionError)
Testing '<l62E>"bar</foo>'...FAIL (AssertionError)
Testing '<k11 P8x5="">"bar</foo>'...FAIL (AssertionError)
Testing '<V6LBVu>"bar</foo>'...FAIL (AssertionError)
Testing '<k9S>"bar</foo>'...FAIL (AssertionError)
Testing '<tU2J913 lQ6N=\'\' f=\'*\' V=\'\' b="" l="" G=\'\'>"bar</foo>'...FAIL (AssertionError)
Testing '<X O="U~">"bar</foo>'...FAIL (AssertionError)
Testing '<q4 W=\'\' i=\'aA9\' I=\'9\'>"bar</foo>'...FAIL (AssertionError)
Testing '<HK>"bar</foo>'...FAIL (AssertionError)
Testing '<T>"bar</foo>'...FAIL (AssertionError)
Testing '<NJc j32="\x0b">"bar</foo>'...FAIL (AssertionError)
Testing '<G>"bar</foo>'...FAIL (AssertionError)
Testing '<w B="\r">"bar</foo>'...FAIL (AssertionError)
Testing '<Ac1>"bar</foo>'...FAIL (AssertionError)
Testing '<vB y2=\'7x\'\'>"bar</foo>'...PASS

We see that our approach may overgeneralize – producing a generalization that may be too lenient. In practice, this is not too much of a problem, as we would be interested in characterizing cases that trigger the failure, rather than characterizing a small subset that does not trigger the failure.

Generalizable Paths

Using can_generalize(), we can devise a method generalizable_paths() that returns all paths in the tree that can be generalized.

class TreeGeneralizer(TreeGeneralizer):
    def find_paths(self, 
                   predicate: Callable[[TreePath, DerivationTree], bool], 
                   path: Optional[TreePath] = None, 
                   tree: Optional[DerivationTree] = None) -> List[TreePath]:
        Return a list of all paths for which `predicate` holds.
        `predicate` is a function `predicate`(`path`, `tree`), where
        `path` denotes a subtree in `tree`. If `predicate()` returns
        True, `path` is included in the returned list.

        if path is None:
            path = []
        assert path is not None

        if tree is None:
            tree = self.tree
        assert tree is not None

        symbol, children = self.get_subtree(path)

        if predicate(path, tree):
            return [path]

        paths = []
        if children is not None:
            for i, child in enumerate(children):
                child_symbol, _ = child
                if child_symbol in self.grammar:
                    paths += self.find_paths(predicate, path + [i])

        return paths

    def generalizable_paths(self) -> List[TreePath]:
        """Return a list of all paths whose subtrees can be generalized."""
        return self.find_paths(self.can_generalize)

Here is generalizable_paths() in action. We obtain all (paths to) subtrees that can be generalized:

bad_input_generalizable_paths = \
    cast(TreeGeneralizer, bad_input_tree_generalizer()).generalizable_paths()
[[0, 0, 0], [0, 0, 1, 0, 1, 0], [0, 0, 1, 0, 1, 1], [0, 0, 2]]

To convert these subtrees into abstract failure-inducing patterns, the method generalize_path() returns a copy of the tree with the subtree replaced by a nonterminal without children:

class TreeGeneralizer(TreeGeneralizer):
    def generalize_path(self, path: TreePath, 
                        tree: Optional[DerivationTree] = None) -> DerivationTree:
        """Return a copy of the tree in which the subtree at `path`
        is generalized (= replaced by a nonterminal without children)"""

        if tree is None:
            tree = self.tree
        assert tree is not None

        symbol, children = tree

        if not path or children is None:
            return symbol, None  # Nonterminal without children

        head = path[0]
        new_children = (children[:head] +
                        [self.generalize_path(path[1:], children[head])] +
                        children[head + 1:])
        return symbol, new_children

The function all_terminals() expands these placeholders:

all_terminals(cast(TreeGeneralizer, bad_input_tree_generalizer()).generalize_path([0, 0, 0]))

Finally, the method generalize() obtains a tree in which all generalizable paths actually are generalized:

class TreeGeneralizer(TreeGeneralizer):
    def generalize(self) -> DerivationTree:
        """Returns a copy of the tree in which all generalizable subtrees
        are generalized (= replaced by nonterminals without children)"""
        tree = self.tree
        assert tree is not None

        for path in self.generalizable_paths():
            tree = self.generalize_path(path, tree)

        return tree
abstract_failure_inducing_input = cast(TreeGeneralizer, bad_input_tree_generalizer()).generalize()

This gives us the final generalization of BAD_INPUT. In the abstract failure-inducing input, all generalizable elements are generalized.


We see that to obtain the failure, it suffices to have an <opening-tag>, followed by a quote and (any) <plain-text> and (any) <closing-tag>. Clearly, all that it takes to produce the failure is to have a double quote in the plain text.

Also note how this diagnosis was reached through experiments only – just as with delta debugging, we could treat the program under test as a black box. In contrast to delta debugging, however, we obtain an abstraction that generalizes the circumstances under which a given failure occurs.

Fuzzing with Patterns

One neat side effect of abstract failure-inducing patterns is that they can be easily instantiated into further test cases, all set to reproduce the failure in question. This gives us a test suite we can later test our fix against.

The method fuzz_tree() takes a tree representing an abstract failure-inducing input and instantiates all missing subtrees.

import copy
class TreeGeneralizer(TreeGeneralizer):
    def fuzz_tree(self, tree: DerivationTree) -> DerivationTree:
        """Return an instantiated copy of `tree`."""
        tree = copy.deepcopy(tree)
        return self.fuzzer.expand_tree(tree)
bitg = cast(TreeGeneralizer, bad_input_tree_generalizer())
for i in range(10):
<h lV="'">"</x8>
<a0 l0820650g='3'>"</v5t>
<yMgT02p s="" g94e='R'>"</P2>
<D48>"	</R>

We can take these inputs and see whether they reproduce the failure in question:

successes = 0
failures = 0
trials = 1000

for i in range(trials):
    test_input = all_terminals(
    except AssertionError:
        successes += 1
        failures += 1
successes, failures
(982, 18)

We get an overall failure rate of ~98%, which is not bad at all.

failures / 1000

In our case, it is overgeneralization (as discussed above) that is responsible for not reaching a 100% rate. (In all generality, we are trying to approximate the behavior of a Turing machine with a context free grammar, which is, well, always an approximation.) However, even a lower rate would still be useful, as any additional test case that reproduces a failure helps in ensuring the final fix is complete.

Putting it all Together

Let us now put together all this in a more convenient package that does not require the user to parse and unparse derivation trees.

Our DDSetDebugger is modeled after the DeltaDebugger from the chapter on delta debugging. It is to be used as

with DDSetDebugger(grammar) as dd:

After that, evaluating dd yields a generalized abstract failure-inducing input as a string.

Since DDSetDebugger accepts only one grammar, the function to be debugged should have exactly one string argument (besides other arguments); this string must fit the grammar.


The constructor puts together the various components. It allows for customization by subclassing.

from DeltaDebugger import CallCollector
class DDSetDebugger(CallCollector):
    Debugger implementing the DDSET algorithm for abstracting failure-inducing inputs.

    def __init__(self, grammar: Grammar, 
                 generalizer_class: Type = TreeGeneralizer,
                 parser: Optional[Parser] = None,
                 **kwargs: Any) -> None:
        `grammar` is an input grammar in fuzzingbook format.
        `generalizer_class` is the tree generalizer class to use
        (default: `TreeGeneralizer`)
        `parser` is the parser to use (default: `EarleyParser(grammar)`).
        All other keyword args are passed to the tree generalizer, notably:
        `fuzzer` - the fuzzer to use (default: `GrammarFuzzer`), and
        `log` - enables debugging output if True.
        self.grammar = grammar
        assert is_valid_grammar(grammar)

        self.generalizer_class = generalizer_class

        if parser is None:
            parser = EarleyParser(grammar)
        self.parser = parser
        self.kwargs = kwargs

        # These save state for further fuzz() calls
        self.generalized_args: Dict[str, Any] = {}
        self.generalized_trees: Dict[str, DerivationTree] = {}
        self.generalizers: Dict[str, TreeGeneralizer] = {}

Generalizing Arguments

The method generalize() is the main entry point. For all string arguments collected in the first function call, it generalizes the arguments and returns an abstract failure-inducing string.

class DDSetDebugger(DDSetDebugger):
    def generalize(self) -> Dict[str, Any]:
        Generalize arguments seen. For each function argument,
        produce an abstract failure-inducing input that characterizes
        the set of inputs for which the function fails.
        if self.generalized_args:
            return self.generalized_args

        self.generalized_args = copy.deepcopy(self.args())
        self.generalized_trees = {}
        self.generalizers = {}

        for arg in self.args():
            def test(value: Any) -> Any:
                return{arg: value})

            value = self.args()[arg]
            if isinstance(value, str):
                tree = list(self.parser.parse(value))[0]
                gen = self.generalizer_class(self.grammar, tree, test, 
                generalized_tree = gen.generalize()

                self.generalizers[arg] = gen
                self.generalized_trees[arg] = generalized_tree
                self.generalized_args[arg] = all_terminals(generalized_tree)

        return self.generalized_args
class DDSetDebugger(DDSetDebugger):
    def __repr__(self) -> str:
        """Return a string representation of the generalized call."""
        return self.format_call(self.generalize())

Here is an example of how DDSetDebugger would be used on our BAD_INPUT example. Simply evaluating the debugger yields a call with a generalized input.

with DDSetDebugger(SIMPLE_HTML_GRAMMAR) as dd:


The fuzz() method produces instantiations of the abstract failure-inducing pattern.

class DDSetDebugger(DDSetDebugger):
    def fuzz_args(self) -> Dict[str, Any]:
        Return arguments randomly instantiated
        from the abstract failure-inducing pattern.
        if not self.generalized_trees:

        args = copy.deepcopy(self.generalized_args)
        for arg in args:
            if arg not in self.generalized_trees:

            tree = self.generalized_trees[arg]
            gen = self.generalizers[arg]
            instantiated_tree = gen.fuzz_tree(tree)
            args[arg] = all_terminals(instantiated_tree)

        return args

    def fuzz(self) -> str:
        Return a call with arguments randomly instantiated
        from the abstract failure-inducing pattern.
        return self.format_call(self.fuzz_args())

Here are some axamples of fuzz() in action:

with DDSetDebugger(SIMPLE_HTML_GRAMMAR) as dd:
'remove_html_markup(s=\'<Z VF=\\\'\\\' l="_-(">")</Y>\')'

These can be fed into eval(), set to produce more failing calls.

with ExpectError(AssertionError):
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_47258/", line 2, in <cell line: 1>
  File "<string>", line 1, in <module>
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_47258/", line 17, in remove_html_markup
    assert '<' not in out and '>' not in out
AssertionError (expected)

More Examples

Let us apply DDSetDebugger on more examples.

Square Root

Our first example is the square_root() function from the chapter on assertions.

from Assertions import square_root  # minor dependency

The square_root() function fails on a value of -1:

with ExpectError(AssertionError):
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_47258/", line 2, in <cell line: 1>
  File "/Users/zeller/Projects/debuggingbook/notebooks/Assertions.ipynb", line 55, in square_root
    assert x >= 0  # precondition
AssertionError (expected)

We define a grammar for its arguments:

INT_GRAMMAR: Grammar = {

        ["<positive-int>", "-<positive-int>"],

        ["<digit>", "<nonzero-digit><positive-int>"],

    "<nonzero-digit>": list("123456789"),

    "<digit>": list(string.digits),

The test function takes a string and converts it into an integer:

def square_root_test(s: str) -> None:
    return square_root(int(s))

With this, we can go and see whether we can generalize a failing input:

with DDSetDebugger(INT_GRAMMAR, log=True) as dd_square_root:
Testing '-8'...FAIL (AssertionError)
Testing '-316'...FAIL (AssertionError)
Testing '8'...PASS
Testing '684'...PASS
Testing '-3'...FAIL (AssertionError)
Testing '-870'...FAIL (AssertionError)
Testing '-3'...FAIL (AssertionError)
Testing '-3451'...FAIL (AssertionError)
Testing '-8'...FAIL (AssertionError)
Testing '-63213'...FAIL (AssertionError)
Testing '-26'...FAIL (AssertionError)
Testing '-4'...FAIL (AssertionError)
Testing '-6'...FAIL (AssertionError)
Testing '-8'...FAIL (AssertionError)

Success! Using DDSetDebugger, we have nicely generalized the failure-inducing input to a pattern -<positive-int> that translates into "any negative number".


The middle() function from the chapter on statistical debugging returns the middle of three numerical values x, y, and z.

from StatisticalDebugger import middle  # minor dependency

We set up a test function that evaluates a string – a tuple of three arguments – and then tests middle():

def middle_test(s: str) -> None:
    x, y, z = eval(s)
    assert middle(x, y, z) == sorted([x, y, z])[1]

The grammar for the three numbers simply puts three integers together:

XYZ_GRAMMAR: Grammar = {
        ["<int>, <int>, <int>"],

        ["<positive-int>", "-<positive-int>"],

        ["<digit>", "<nonzero-digit><positive-int>"],

    "<nonzero-digit>": list("123456789"),

    "<digit>": list(string.digits),

Here is an example of middle() failing:

with ExpectError(AssertionError):
    middle_test("2, 1, 3")
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_47258/", line 2, in <cell line: 1>
    middle_test("2, 1, 3")
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_47258/", line 3, in middle_test
    assert middle(x, y, z) == sorted([x, y, z])[1]
AssertionError (expected)

What happens if we debug this with DDSetDebugger? We see that there is no abstraction at the syntax level that could characterize this failure:

with DDSetDebugger(XYZ_GRAMMAR, log=True) as dd_middle:
    middle_test("2, 1, 3")
Testing '7, 4591, -0'...PASS
Testing '6, 1, 3'...PASS
Testing '7, 1, 3'...PASS
Testing '7, 1, 3'...PASS
Testing '2, -7, 3'...FAIL (AssertionError)
Testing '2, 0, 3'...FAIL (AssertionError)
Testing '2, -89, 3'...FAIL (AssertionError)
Testing '2, 973, 3'...PASS
Testing '2, 11, 3'...PASS
Testing '2, 8, 3'...PASS
Testing '2, 1, 9'...FAIL (AssertionError)
Testing '2, 1, -16'...PASS
Testing '2, 1, 35'...FAIL (AssertionError)
Testing '2, 1, 6'...FAIL (AssertionError)
Testing '2, 1, 53'...FAIL (AssertionError)
Testing '2, 1, 5'...FAIL (AssertionError)
Testing '2, 1, 737'...FAIL (AssertionError)
Testing '2, 1, 28'...FAIL (AssertionError)
Testing '2, 1, 3'...FAIL (AssertionError)
Testing '2, 1, 5'...FAIL (AssertionError)
Testing '2, 1, 5'...FAIL (AssertionError)
Testing '2, 1, 56'...FAIL (AssertionError)
middle_test(s='2, 1, <positive-int>')

So, while there are failures that can be nicely characterized using abstractions of input elements, middle() is not one of them. Which is good, because this means that all our other techniques such as statistical debugging and dynamic invariants still have a use case :-)

Lessons Learned

  • Generalizing failure-inducing inputs can yield important information for which inputs and under which circumstances a failure occurs.
  • Generalizing failure-inducing inputs is most useful if the input can be split into multiple elements, of which only a part are relevant for producing the error.
  • As they help in parsing and producing input, grammars can play an important role in testing and debugging.

Next Steps

Our next chapter introduces automated repair of programs, building on the fault localization and generalization mechanisms introduced so far.


Our DDSetDebugger class implements the DDSET algorithm as introduced by Gopinath et al. in [Gopinath et al, 2020]. A full-fledged implementation of DDSET with plenty of details and experiments is available as a Jupyter Notebook. Our implementation follows the simplified implementation of DDSET, as described by Gopinath.

The potential for determining how input features relate to bugs is not nearly explored yet. The ALHAZEN work by Kampmann et al. [Kampmann et al, 2020] generalizes over DDSET differently, by investigating semantic features of input elements such as their numeric interpretation or length and their correlation with failures. Like DDSET, ALHAZEN also uses a feedback loop to strengthen or refute its hypotheses.

In recent work [Gopinath et al, 2021], Gopinath has extended the concept of DDSET further. His work on evocative expressions introduces a pattern language in which arbitrary DDSET-like patterns can be combined into Boolean formula that even more precisely capture and produce failure circumstances. In particular, evocative expressions can specialize grammars towards Boolean pattern combinations, thus allowing for great flexibility in testing and debugging.


Exercise 1: Generalization and Specialization

Consider the abstract failure-inducing input for BAD_INPUT we determined:

  1. How does it change if you increase the number of test runs, using max_tries_for_generalization?
  2. What is the success rate of the new pattern?

Creative Commons License The content of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The source code that is part of the content, as well as the source code used to format and display that content is licensed under the MIT License. Last change: 2023-11-11 18:05:06+01:00CiteImprint