Third Project - Automated Repair

Overview

The Task

For the first two submissions we asked you to implement a Debugger as well as an Input Reducer. Both of these tools are used to help the developer to locate bugs and then manually fix them.

In this project, you will implement a technique of automatic code repair. To do so, you will extend the Repairer introduced in the Repairing Code Automatically chapter of The Debugging Book.

Your own Repairer should automatically generate repair suggestions to the faulty functions we provide later in this notebook. This can be achieved, for instance, by changing various components of the mutator, changing the debugger, or the reduction algorithm. However, you are neither reguired to make all these changes nor required to limit yourself to the changes proposed here.

The Submission

The time frame for this project is 3 weeks, and the deadline is Februrary 5th, 23:59.

The submission should be in form of a Jupyter notebook and you are expected to hand in the submission as a .zip-archive. The notebook should, apart from the code itself, also provide sufficient explanations and reasoning (in markdown cells) behind the decisions that lead to the solution provided. Projects that do not include explanations cannot get more than 15 points.

A Faulty Function and How to Repair It

Before discussing how to use and extend the Repairer, we first start by introducing a new (and highly complex) function that is supposed to return the larger of two values.

def larger(x: Any, y: Any) -> Any:
    if x < y:
        return x
    return y

Unfortunately, we introduced a bug which makes the function behave the exact opposite way as it is supposed to:

larger(1, 3)
1

To fix this issue, we could try to debug it, using the tools we have already seen. However, given the complexity of the function under test (sorry for the irony), we might want to automatically repair the function, using the Repairer introduced in The Debugging Book.

To do so, we first need to define set of test cases, which help the Repairer in fixing the function.

def larger_testcase() -> Tuple[int, int]:
    x = random.randrange(100)
    y = random.randrange(100)
    return x, y
def larger_test(x: Any, y: Any) -> None:
    m = larger(x, y)
    assert m == max(x, y), f"expected {max(x, y)}, but got {m}"
import math
import random
random.seed(42)

Let us generate a random test case for our function:

larger_input = larger_testcase()
print(larger_input)
(81, 14)

and then feed it into our larger_test():

from ExpectError import ExpectError
with ExpectError():
    larger_test(*larger_input)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57754/1363473154.py", line 2, in <cell line: 1>
    larger_test(*larger_input)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57754/2390942771.py", line 3, in larger_test
    assert m == max(x, y), f"expected {max(x, y)}, but got {m}"
AssertionError: expected 81, but got 14 (expected)

As expected, we got an error – the larger() function has adefect.

For a complete test suite, we need a set of passing and failing tests. To be sure we have both, we create functions which produce dedicated inputs:

def larger_passing_testcase() -> Tuple:
    while True:
        try:
            x, y = larger_testcase()
            larger_test(x, y)
            return x, y
        except AssertionError:
            pass
def larger_failing_testcase() -> Tuple:
    while True:
        try:
            x, y = larger_testcase()
            larger_test(x, y)
        except AssertionError:
            return x, y
passing_input = larger_passing_testcase()
print(passing_input)
(10, 10)

With passing_input, our larger() function produces a correct result, and its test function does not fail.

larger_test(*passing_input)
failing_input = larger_failing_testcase()
print(failing_input)
(93, 62)

While failing_input leads to an error:

with ExpectError():
    larger_test(*failing_input)
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57754/108970404.py", line 2, in <cell line: 1>
    larger_test(*failing_input)
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57754/2390942771.py", line 3, in larger_test
    assert m == max(x, y), f"expected {max(x, y)}, but got {m}"
AssertionError: expected 93, but got 62 (expected)

With the above defined functions, we can now start to create a number of passing and failing tests:

TESTS = 100
LARGER_PASSING_TESTCASES = [larger_passing_testcase()
                            for i in range(TESTS)]
LARGER_FAILING_TESTCASES = [larger_failing_testcase()
                            for i in range(TESTS)]
from StatisticalDebugger import OchiaiDebugger

Next, let us use statistical debugging to identify likely faulty locations. The OchiaiDebugger ranks individual code lines by how frequently they are executed in failing runs (and not in passing runs).

larger_debugger = OchiaiDebugger()
for x, y in LARGER_PASSING_TESTCASES + LARGER_FAILING_TESTCASES:
    with larger_debugger:
        larger_test(x, y)

Given the results of statistical debugging, we can now use the Repairer introduced in the book to repair our function. Here we use the default implementation which is initialized with the simple StatementMutator mutator.

from Repairer import Repairer
from Repairer import ConditionMutator, CrossoverOperator
from Repairer import DeltaDebugger
repairer = Repairer(larger_debugger, log=True)
Target code to be repaired:
def larger(x: Any, y: Any) -> Any:
    if x < y:
        return x
    return y
best_tree, fitness = repairer.repair()
Evolving population: iteration   0/100 fitness = 1.0   
Best code (fitness = 1.0):
def larger(x: Any, y: Any) -> Any:
    if x < y:
        if x < y:
            return y
    return x

Reduced code (fitness = 1.0):
def larger(x: Any, y: Any) -> Any:
    if x < y:
        return y
    return x

The Repairer successfully produced a fix.

Implementation

As stated above, the goal of this project is to implement a repairer capable of producing a fix to the functions defined in the Evaluation section, as well as secret functions.
To do this, you need to work with the Repairer class from the book. The Repairer class is very configurable, so that it is easy to update and plug in various components: the fault localization (pass a different debugger that is a subclass of DifferenceDebugger), the mutation operator (set mutator_class to a subclass of StatementMutator), the crossover operator (set crossover_class to a subclass of CrossoverOperator), and the reduction algorithm (set reducer_class to a subclass of Reducer). You may change any of these components or make other changes at will.

For us to be able to test your implementation, you will have to implement the debug_and_repair() function defined here.

from bookutils import print_content, show_ast
def debug_and_repair(f: Callable,
                     testcases: List[Tuple[Any, ...]],
                     test_function: Callable,
                     log: bool = False) -> Optional[str]:
    '''
    Debugs a function with the given testcases and the test_function
    and tries to repair it afterwards.

    Parameters
    ----------
    f : function
        The faulty function, to be debugged and repaired
    testcases : List
        A list that includes test inputs for the function under test
    test_function : function
        A function that takes the test inputs and tests whether the
        function under test produces the correct output.
    log: bool
        Turn logging on/off.
    Returns
    -------
    str
        The repaired version of f as a string.
    '''

    # TODO: implement this function

    return None

The function debug_and_repair() is the function that needs to implement everything. We will provide you with the function to be repaired, as well as testcases for this function and a test-function. Let us show you how the function can be used and should behave:

random.seed(42)
import ast
def simple_debug_and_repair(f: Callable,
                            testcases: List[Tuple[Any, ...]],
                            test_function: Callable, 
                            log: bool = False) -> str:
    '''
    Debugs a function with the given testcases and the test_function
    and tries to repair it afterwards.

    Parameters
    ----------
    f : function
        The faulty function, to be debugged and repaired
    testcases : List
        A list that includes test inputs for the function under test
    test_function : function
        A function, that takes the test inputs and tests whether the
        function under test produces the correct output.
    log: bool
        Turn logging on/off.
    Returns
    -------
    str
        The repaired version of f as a string.
    '''

    debugger = OchiaiDebugger()

    for i in testcases:
        with debugger:
            test_function(*i)  # Ensure that you use *i here.

    repairer = Repairer(debugger,
                        mutator_class=ConditionMutator,
                        crossover_class=CrossoverOperator,
                        reducer_class=DeltaDebugger,
                        log=log)

    # Ensure that you specify a sufficient number of
    # iterations to evolve.
    best_tree, fitness = repairer.repair(iterations=100)

    return ast.unparse(best_tree)

Here we again used the Ochiai statistical debugger and the Repairer described in The Debugging Book. In contrast to the initial example, now we used another type of mutator – ConditionMutator. It can successfully fix the larger function as well.

repaired = simple_debug_and_repair(larger,
                                   LARGER_PASSING_TESTCASES +
                                   LARGER_FAILING_TESTCASES,
                                   larger_test, False)
print_content(repaired, '.py')
def larger(x: Any, y: Any) -> Any:
    if not x < y:
        return x
    return y

Although simple_debug_and_repair produced a correct solution for our example, it does not generalize to other functions.
So your task is to create the debug_and_repair() function which can be applied on any faulty function.

Apart from the function debug_and_repair(), you may of course implement your own classes. Make sure, however, that you are using these classes within debug_and_repair(). Also, keep in mind to tune the iteration parameter of the Repairer so that it has sufficient number of generation to evolve. As it may take too much time to find a solution for an ill-programmed repairer (e.g., consider an infinite whileloop introduced in the fix), we impose a 10-minute timeout for each repair.

Evaluation

Having you implement debug_and_repair() allows us to easily test your implementation by calling the function with its respective arguments and testing the correctness of its output. In this section, we will provide you with some test cases as well as the testing framework for this project. This will help you to assess the quality of your work.

We define functions as well as test-case generators for these functions. The functions given here should be considered as public tests. If you pass all public tests, without hard-coding the solutions, you are guaranteed to achieve 10 points. The secret tests for the remaining 10 must-have-points have similar defects.

Factorial

The first function we implement is a factorial function. It is supposed to compute the following formula:
\begin{equation*} n! = \textit{factorial}(n) = \prod_{k=1}^{n}k, \quad \text{for $k\geq 1$} \end{equation*}

Here we define three faulty functions factorial1, factorial2, and factorial3 that are supposed to compute the factorial.

def factorial1(n):
    res = 1
    for i in range(1, n):
        res *= i
    return res

From the first sight, factorial1 looks to be correctly implemented, still it produces the wrong answer:

factorial1(5)
24

while the correct value for 5! is 120.

def factorial_testcase() -> int:
    n = random.randrange(100)
    return n
def factorial1_test(n: int) -> None:
    m = factorial1(n)
    assert m == math.factorial(n)
def factorial_passing_testcase() -> Tuple:
    while True:
        try:
            n = factorial_testcase()
            factorial1_test(n)
            return (n,)
        except AssertionError:
            pass
def factorial_failing_testcase() -> Tuple:
    while True:
        try:
            n = factorial_testcase()
            factorial1_test(n)
        except AssertionError:
            return (n,)
FACTORIAL_PASSING_TESTCASES_1 = [factorial_passing_testcase() for i in range(TESTS)]
FACTORIAL_FAILING_TESTCASES_1 = [factorial_failing_testcase() for i in range(TESTS)]

As we can see, our simple Repairer cannot produce a fix. (Or more precisely, the "fix" it produces is pretty much pointless.)

repaired = \
    simple_debug_and_repair(factorial1,
                            FACTORIAL_PASSING_TESTCASES_1 +
                            FACTORIAL_FAILING_TESTCASES_1,
                            factorial1_test, True)
Target code to be repaired:
def factorial1(n):
    res = 1
    for i in range(1, n):
        res *= i
    return res

Evolving population: iteration  99/100 fitness = 0.99   
Best code (fitness = 0.99):
def factorial1(n):
    res = 1
    pass
    return res

Reduced code (fitness = 0.99):
def factorial1(n):
    res = 1
    return res

The problem is that the current Repairer does not provide a suitable mutation to change the right part of the code.

How can we repair this? Consider extending StatementMutator operator such that it can mutate various parts of the code, such as ranges, arithmetic operations, variable names etc. (As a reference of how to do that, look at the ConditionMutator class.)

The next faulty function is factorial2():

def factorial2(n):
    i = 1
    for i in range(1, n + 1):
        i *= i
    return i

Again, it outputs the incorrect answer:

factorial2(5)
25
def factorial2_test(n: int) -> None:
    m = factorial2(n)
    assert m == math.factorial(n)
def factorial_passing_testcase() -> Tuple:
    while True:
        try:
            n = factorial_testcase()
            factorial2_test(n)
            return (n,)
        except AssertionError:
            pass
def factorial_failing_testcase() -> Tuple:
    while True:
        try:
            n = factorial_testcase()
            factorial2_test(n)
        except AssertionError:
            return (n,)
FACTORIAL_PASSING_TESTCASES_2 = [factorial_passing_testcase()
                                 for i in range(TESTS)]
FACTORIAL_FAILING_TESTCASES_2 = [factorial_failing_testcase()
                                 for i in range(TESTS)]

The third faulty function is factorial3():

def factorial3(n):
    res = 1
    for i in range(1, n + 1):
        res += i
    return res
factorial3(5)
16
def factorial3_test(n: int) -> None:
    m = factorial3(n)
    assert m == math.factorial(n)
def factorial_passing_testcase() -> Tuple:
    while True:
        try:
            n = factorial_testcase()
            factorial3_test(n)
            return (n,)
        except AssertionError:
            pass
def factorial_failing_testcase() -> Tuple:
    while True:
        try:
            n = factorial_testcase()
            factorial3_test(n)
        except AssertionError:
            return (n,)
FACTORIAL_PASSING_TESTCASES_3 = [factorial_passing_testcase()
                                 for i in range(TESTS)]
FACTORIAL_FAILING_TESTCASES_3 = [factorial_failing_testcase()
                                 for i in range(TESTS)]

Middle

The following faulty function is the already well known Middle function, though with another defect.

def middle(x, y, z):
    if x < x:
        if y < z:
            return y
        if x < z:
            return z
        return x
    if x < z:
        return x
    if y < z:
        return z
    return y

It should return the second largest number of the input, but it does not:

middle(2, 3, 1)
3
def middle_testcase() -> Tuple:
    x = random.randrange(10)
    y = random.randrange(10)
    z = random.randrange(10)
    return x, y, z
def middle_test(x: int, y: int, z: int) -> None:
    m = middle(x, y, z)
    assert m == sorted([x, y, z])[1]
def middle_passing_testcase() -> Tuple:
    while True:
        try:
            x, y, z = middle_testcase()
            middle_test(x, y, z)
            return x, y, z
        except AssertionError:
            pass
def middle_failing_testcase() -> Tuple:
    while True:
        try:
            x, y, z = middle_testcase()
            middle_test(x, y, z)
        except AssertionError:
            return x, y, z
MIDDLE_PASSING_TESTCASES = [middle_passing_testcase()
                            for i in range(TESTS)]
MIDDLE_FAILING_TESTCASES = [middle_failing_testcase()
                            for i in range(TESTS)]

Power

The power function should implement the following formular: \begin{equation*} \textit{power}(x, n) = x^n, \quad \text{for $x\geq 0$ and $n \geq 0$} \end{equation*}

def power(x, n):
    res = 1
    for i in range(0, x):
        res *= n
    return res

However, this power() function either has an uncommon interpretation of $x^y$ – or it is simply wrong:

power(2, 5)
25

We go with the simpler explanation that power() is wrong. The correct value, of course, should be $2^5 = 32$.

def power_testcase() -> Tuple:
    x = random.randrange(100)
    n = random.randrange(100)
    return x, n
def power_test(x: int, n: int) -> None:
    m = power(x, n)
    assert m == pow(x, n)
def power_passing_testcase() -> Tuple:
    while True:
        try:
            x, n = power_testcase()
            power_test(x, n)
            return x, n
        except AssertionError:
            pass
def power_failing_testcase() -> Tuple:
    while True:
        try:
            x, n = power_testcase()
            power_test(x, n)
        except AssertionError:
            return x, n
POWER_PASSING_TESTCASES = [power_passing_testcase()
                           for i in range(TESTS)]
POWER_FAILING_TESTCASES = [power_failing_testcase()
                           for i in range(TESTS)]

Tester Class

To make it convenient to test your solution we provide a testing framework:

import re
class Test:
    def __init__(self,
                 function: Callable,
                 testcases: List, 
                 test_function: Callable,
                 assert_function: Callable) -> None:
        self.function = function
        self.testcases = testcases
        self.test_function = test_function
        self.assert_function = assert_function

    def run(self, repair_function: Callable) -> None:
        repaired = repair_function(self.function,
                                   self.testcases,
                                   self.test_function)
        repaired = re.sub(self.function.__name__, 'foo', repaired)
        exec(repaired, globals())

        for test in self.testcases:
            res = foo(*test)
            assert res == self.assert_function(*test)
def middle_assert(x, y, z):
    return sorted([x, y, z])[1]
test0 = Test(factorial1, FACTORIAL_PASSING_TESTCASES_1 + FACTORIAL_FAILING_TESTCASES_1, factorial1_test, math.factorial)
test1 = Test(factorial2, FACTORIAL_PASSING_TESTCASES_2 + FACTORIAL_FAILING_TESTCASES_2, factorial2_test, math.factorial)
test2 = Test(factorial3, FACTORIAL_PASSING_TESTCASES_3 + FACTORIAL_FAILING_TESTCASES_3, factorial3_test, math.factorial)
test3 = Test(middle, MIDDLE_PASSING_TESTCASES + MIDDLE_FAILING_TESTCASES, middle_test, middle_assert)
test4 = Test(power, POWER_PASSING_TESTCASES + POWER_FAILING_TESTCASES, power_test, pow)
tests = [test0, test1, test2, test3, test4]
class Tester:
    def __init__(self, function: Callable, tests: List) -> None:
        self.function = function
        self.tests = tests
        random.seed(42)  # We use this seed for our evaluation; don't change it.

    def run_tests(self) -> None:
        for test in self.tests:
            try:
                test.run(self.function)
                print(f'Test {test.function.__name__}: OK')
            except AssertionError:
                print(f'Test {test.function.__name__}: Failed')
tester = Tester(simple_debug_and_repair, tests)  # TODO: replace simple_debug_and_repair by your debug_and_repair function
tester.run_tests()
Test factorial1: Failed
Test factorial2: Failed
Test factorial3: Failed
Test middle: Failed
Test power: Failed

By executing the Tester as shown above, you can assess the quality of your repairing approach, by testing your own debug_and_repair() function.

Grading

Your project will be graded by automated tests. The tests are executed in the same manner as shown above. In total there are 20 points + 10 bonus points to be awarded for this project. 20 points for the must-haves, 10 bonus points for may-haves.

Must-Haves (20 points)

Must haves include an implementation of the debug_and_repair function in a way that it automatically repairs faulty functions given sufficiantly large test suites. 10 points are awarded for passing the tests in this notebook. Each passing test being worth two points. 10 points are awarded for passing secret tests.

May-Haves (10 points)

May-haves will also be tested with secret tests, and award 2 points each. The may-have-features for this project are a more robust implementation, that is able to cope with a wider range of defects:

  • Infinite loops
  • Infinite recursion (RecursionError in Python)
  • Type errors (TypeError in Python)
  • Undefined identifiers (NameError in Python)

General Rules

You need to achieve at least 10 points to be awarded any points at all.
Tests must be passed without hard-coding results, otherwise no points are awarded.
Your code needs to be sufficiently documented in order to achieve points!

Creative Commons License The content of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The source code that is part of the content, as well as the source code used to format and display that content is licensed under the MIT License. Last change: 2023-11-11 18:05:06+01:00CiteImprint