# Third Project - Automated Repair¶

## Overview¶

### The Task¶

For the first two submissions we asked you to implement a *Debugger* as well as an *Input Reducer*. Both of these tools are used to help the developer to locate bugs and then manually fix them.

In this project, you will implement a technique of automatic code repair. To do so, you will extend the `Repairer`

introduced in the Repairing Code Automatically chapter of The Debugging Book.

Your own `Repairer`

should automatically generate *repair suggestions* to the faulty functions we provide later in this notebook. This can be achieved, for instance, by changing various components of the mutator, changing the debugger, or the reduction algorithm. However, you are neither reguired to make all these changes nor required to limit yourself to the changes proposed here.

### The Submission¶

The time frame for this project is **3 weeks**, and the deadline is **Februrary 5th, 23:59**.

The submission should be in form of a Jupyter notebook and you are expected to hand in the submission as a .zip-archive. The notebook should, apart from the code itself, also provide sufficient explanations and reasoning (in markdown cells) behind the decisions that lead to the solution provided. Projects that do not include explanations cannot get more than **15 points**.

```
import bookutils.setup
```

## A Faulty Function and How to Repair It¶

Before discussing how to use and extend the Repairer, we first start by introducing a new (and highly complex) function that is supposed to return the larger of two values.

```
def larger(x: Any, y: Any) -> Any:
if x < y:
return x
return y
```

Unfortunately, we introduced a bug which makes the function behave the exact opposite way as it is supposed to:

```
larger(1, 3)
```

To fix this issue, we could try to debug it, using the tools we have already seen. However, given the complexity of the function under test (sorry for the irony), we might want to automatically repair the function, using the *Repairer* introduced in *The Debugging Book*.

To do so, we first need to define set of test cases, which help the *Repairer* in fixing the function.

```
def larger_testcase() -> Tuple[int, int]:
x = random.randrange(100)
y = random.randrange(100)
return x, y
```

```
def larger_test(x: Any, y: Any) -> None:
m = larger(x, y)
assert m == max(x, y), f"expected {max(x, y)}, but got {m}"
```

```
random.seed(42)
```

Let us generate a random test case for our function:

```
larger_input = larger_testcase()
print(larger_input)
```

and then feed it into our `larger_test()`

:

```
from ExpectError import ExpectError
```

```
with ExpectError():
larger_test(*larger_input)
```

As expected, we got an error – the `larger()`

function has adefect.

For a complete test suite, we need a set of passing and failing tests. To be sure we have both, we create functions which produce dedicated inputs:

```
def larger_passing_testcase() -> Tuple:
while True:
try:
x, y = larger_testcase()
larger_test(x, y)
return x, y
except AssertionError:
pass
```

```
def larger_failing_testcase() -> Tuple:
while True:
try:
x, y = larger_testcase()
larger_test(x, y)
except AssertionError:
return x, y
```

```
passing_input = larger_passing_testcase()
print(passing_input)
```

With `passing_input`

, our `larger()`

function produces a correct result, and its test function does not fail.

```
larger_test(*passing_input)
```

```
failing_input = larger_failing_testcase()
print(failing_input)
```

While `failing_input`

leads to an error:

```
with ExpectError():
larger_test(*failing_input)
```

With the above defined functions, we can now start to create a number of passing and failing tests:

```
TESTS = 100
```

```
LARGER_PASSING_TESTCASES = [larger_passing_testcase()
for i in range(TESTS)]
```

```
LARGER_FAILING_TESTCASES = [larger_failing_testcase()
for i in range(TESTS)]
```

```
from StatisticalDebugger import OchiaiDebugger
```

Next, let us use *statistical debugging* to identify likely faulty locations. The `OchiaiDebugger`

ranks individual code lines by how frequently they are executed in failing runs (and not in passing runs).

```
larger_debugger = OchiaiDebugger()
```

```
for x, y in LARGER_PASSING_TESTCASES + LARGER_FAILING_TESTCASES:
with larger_debugger:
larger_test(x, y)
```

Given the results of statistical debugging, we can now use the *Repairer* introduced in the book to repair our function. Here we use the default implementation which is initialized with the simple *StatementMutator* mutator.

```
repairer = Repairer(larger_debugger, log=True)
```

```
best_tree, fitness = repairer.repair()
```

The *Repairer* successfully produced a fix.

## Implementation¶

As stated above, the goal of this project is to implement a repairer capable of producing a fix to the functions defined in *the Evaluation section*, as well as secret functions.

To do this, you need to work with the *Repairer* class from the book.
The *Repairer* class is very configurable, so that it is easy to update and plug in various components: the fault localization (pass a different debugger that is a subclass of DifferenceDebugger), the mutation operator (set mutator_class to a subclass of `StatementMutator`

), the crossover operator (set crossover_class to a subclass of `CrossoverOperator`

), and the reduction algorithm (set reducer_class to a subclass of `Reducer`

). You may change any of these components or make other changes at will.

For us to be able to test your implementation, you will have to implement the `debug_and_repair()`

function defined here.

```
from bookutils import print_content, show_ast
```

```
def debug_and_repair(f: Callable,
testcases: List[Tuple[Any, ...]],
test_function: Callable,
log: bool = False) -> Optional[str]:
'''
Debugs a function with the given testcases and the test_function
and tries to repair it afterwards.
Parameters
----------
f : function
The faulty function, to be debugged and repaired
testcases : List
A list that includes test inputs for the function under test
test_function : function
A function that takes the test inputs and tests whether the
function under test produces the correct output.
log: bool
Turn logging on/off.
Returns
-------
str
The repaired version of f as a string.
'''
# TODO: implement this function
return None
```

The function `debug_and_repair()`

is the function that needs to implement everything. We will provide you with the function to be repaired, as well as testcases for this function and a test-function. Let us show you how the function can be used and should behave:

```
random.seed(42)
```

```
import ast
```

```
def simple_debug_and_repair(f: Callable,
testcases: List[Tuple[Any, ...]],
test_function: Callable,
log: bool = False) -> str:
'''
Debugs a function with the given testcases and the test_function
and tries to repair it afterwards.
Parameters
----------
f : function
The faulty function, to be debugged and repaired
testcases : List
A list that includes test inputs for the function under test
test_function : function
A function, that takes the test inputs and tests whether the
function under test produces the correct output.
log: bool
Turn logging on/off.
Returns
-------
str
The repaired version of f as a string.
'''
debugger = OchiaiDebugger()
for i in testcases:
with debugger:
test_function(*i) # Ensure that you use *i here.
repairer = Repairer(debugger,
mutator_class=ConditionMutator,
crossover_class=CrossoverOperator,
reducer_class=DeltaDebugger,
log=log)
# Ensure that you specify a sufficient number of
# iterations to evolve.
best_tree, fitness = repairer.repair(iterations=100)
return ast.unparse(best_tree)
```

Here we again used the *Ochiai* statistical debugger and the *Repairer* described in The Debugging Book. In contrast to the initial example, now we used another type of mutator – `ConditionMutator`

. It can successfully fix the `larger`

function as well.

```
repaired = simple_debug_and_repair(larger,
LARGER_PASSING_TESTCASES +
LARGER_FAILING_TESTCASES,
larger_test, False)
print_content(repaired, '.py')
```

Although `simple_debug_and_repair`

produced a correct solution for our example, it does not generalize to other functions.

So your task is to create the `debug_and_repair()`

function which can be applied on any faulty function.

Apart from the function `debug_and_repair()`

, you may of course implement your own classes. Make sure, however, that you are using these classes within `debug_and_repair()`

. Also, keep in mind to tune the *iteration* parameter of the `Repairer`

so that it has sufficient number of generation to evolve. As it may take too much time to find a solution for an ill-programmed repairer (e.g., consider an infinite `while`

loop introduced in the fix), we impose a *10-minute timeout* for each repair.

## Evaluation¶

Having you implement `debug_and_repair()`

allows us to easily test your implementation by calling the function with its respective arguments and testing the correctness of its output. In this section, we will provide you with some test cases as well as the testing framework for this project. This will help you to assess the quality of your work.

We define functions as well as test-case generators for these functions. The functions given here should be considered as **public tests**. If you pass all public tests, without hard-coding the solutions, you are guaranteed to achieve **10 points**. The secret tests for the remaining 10 must-have-points have similar defects.

### Factorial¶

The first function we implement is a *factorial* function. It is supposed to compute the following formula:

\begin{equation*}
n! = \textit{factorial}(n) = \prod_{k=1}^{n}k, \quad \text{for $k\geq 1$}
\end{equation*}

Here we define three faulty functions `factorial1`

, `factorial2`

, and `factorial3`

that are supposed to compute the factorial.

```
def factorial1(n):
res = 1
for i in range(1, n):
res *= i
return res
```

From the first sight, `factorial1`

looks to be correctly implemented, still it produces the wrong answer:

```
factorial1(5)
```

while the correct value for 5! is 120.

```
def factorial_testcase() -> int:
n = random.randrange(100)
return n
```

```
def factorial1_test(n: int) -> None:
m = factorial1(n)
assert m == math.factorial(n)
```

```
def factorial_passing_testcase() -> Tuple:
while True:
try:
n = factorial_testcase()
factorial1_test(n)
return (n,)
except AssertionError:
pass
```

```
def factorial_failing_testcase() -> Tuple:
while True:
try:
n = factorial_testcase()
factorial1_test(n)
except AssertionError:
return (n,)
```

```
FACTORIAL_PASSING_TESTCASES_1 = [factorial_passing_testcase() for i in range(TESTS)]
```

```
FACTORIAL_FAILING_TESTCASES_1 = [factorial_failing_testcase() for i in range(TESTS)]
```

As we can see, our simple Repairer cannot produce a fix. (Or more precisely, the "fix" it produces is pretty much pointless.)

```
repaired = \
simple_debug_and_repair(factorial1,
FACTORIAL_PASSING_TESTCASES_1 +
FACTORIAL_FAILING_TESTCASES_1,
factorial1_test, True)
```

The problem is that the current `Repairer`

does not provide a suitable mutation to change the right part of the code.

How can we repair this? Consider extending `StatementMutator`

operator such that it can mutate various parts of the code, such as ranges, arithmetic operations, variable names etc. (As a reference of how to do that, look at the `ConditionMutator`

class.)

The next faulty function is `factorial2()`

:

```
def factorial2(n):
i = 1
for i in range(1, n + 1):
i *= i
return i
```

Again, it outputs the incorrect answer:

```
factorial2(5)
```

```
def factorial2_test(n: int) -> None:
m = factorial2(n)
assert m == math.factorial(n)
```

```
def factorial_passing_testcase() -> Tuple:
while True:
try:
n = factorial_testcase()
factorial2_test(n)
return (n,)
except AssertionError:
pass
```

```
def factorial_failing_testcase() -> Tuple:
while True:
try:
n = factorial_testcase()
factorial2_test(n)
except AssertionError:
return (n,)
```

```
FACTORIAL_PASSING_TESTCASES_2 = [factorial_passing_testcase()
for i in range(TESTS)]
```

```
FACTORIAL_FAILING_TESTCASES_2 = [factorial_failing_testcase()
for i in range(TESTS)]
```

The third faulty function is `factorial3()`

:

```
def factorial3(n):
res = 1
for i in range(1, n + 1):
res += i
return res
```

```
factorial3(5)
```

```
def factorial3_test(n: int) -> None:
m = factorial3(n)
assert m == math.factorial(n)
```

```
def factorial_passing_testcase() -> Tuple:
while True:
try:
n = factorial_testcase()
factorial3_test(n)
return (n,)
except AssertionError:
pass
```

```
def factorial_failing_testcase() -> Tuple:
while True:
try:
n = factorial_testcase()
factorial3_test(n)
except AssertionError:
return (n,)
```

```
FACTORIAL_PASSING_TESTCASES_3 = [factorial_passing_testcase()
for i in range(TESTS)]
```

```
FACTORIAL_FAILING_TESTCASES_3 = [factorial_failing_testcase()
for i in range(TESTS)]
```

### Middle¶

The following faulty function is the already well known *Middle* function, though with another defect.

```
def middle(x, y, z):
if x < x:
if y < z:
return y
if x < z:
return z
return x
if x < z:
return x
if y < z:
return z
return y
```

It should return the second largest number of the input, but it does not:

```
middle(2, 3, 1)
```

```
def middle_testcase() -> Tuple:
x = random.randrange(10)
y = random.randrange(10)
z = random.randrange(10)
return x, y, z
```

```
def middle_test(x: int, y: int, z: int) -> None:
m = middle(x, y, z)
assert m == sorted([x, y, z])[1]
```

```
def middle_passing_testcase() -> Tuple:
while True:
try:
x, y, z = middle_testcase()
middle_test(x, y, z)
return x, y, z
except AssertionError:
pass
```

```
def middle_failing_testcase() -> Tuple:
while True:
try:
x, y, z = middle_testcase()
middle_test(x, y, z)
except AssertionError:
return x, y, z
```

```
MIDDLE_PASSING_TESTCASES = [middle_passing_testcase()
for i in range(TESTS)]
```

```
MIDDLE_FAILING_TESTCASES = [middle_failing_testcase()
for i in range(TESTS)]
```

### Power¶

The power function should implement the following formular: \begin{equation*} \textit{power}(x, n) = x^n, \quad \text{for $x\geq 0$ and $n \geq 0$} \end{equation*}

```
def power(x, n):
res = 1
for i in range(0, x):
res *= n
return res
```

However, this `power()`

function either has an uncommon interpretation of $x^y$ – or it is simply wrong:

```
power(2, 5)
```

We go with the simpler explanation that `power()`

is wrong. The correct value, of course, should be $2^5 = 32$.

```
def power_testcase() -> Tuple:
x = random.randrange(100)
n = random.randrange(100)
return x, n
```

```
def power_test(x: int, n: int) -> None:
m = power(x, n)
assert m == pow(x, n)
```

```
def power_passing_testcase() -> Tuple:
while True:
try:
x, n = power_testcase()
power_test(x, n)
return x, n
except AssertionError:
pass
```

```
def power_failing_testcase() -> Tuple:
while True:
try:
x, n = power_testcase()
power_test(x, n)
except AssertionError:
return x, n
```

```
POWER_PASSING_TESTCASES = [power_passing_testcase()
for i in range(TESTS)]
```

```
POWER_FAILING_TESTCASES = [power_failing_testcase()
for i in range(TESTS)]
```

### Tester Class¶

To make it convenient to test your solution we provide a testing framework:

```
import re
```

```
class Test:
def __init__(self,
function: Callable,
testcases: List,
test_function: Callable,
assert_function: Callable) -> None:
self.function = function
self.testcases = testcases
self.test_function = test_function
self.assert_function = assert_function
def run(self, repair_function: Callable) -> None:
repaired = repair_function(self.function,
self.testcases,
self.test_function)
repaired = re.sub(self.function.__name__, 'foo', repaired)
exec(repaired, globals())
for test in self.testcases:
res = foo(*test)
assert res == self.assert_function(*test)
```

```
def middle_assert(x, y, z):
return sorted([x, y, z])[1]
```

```
test0 = Test(factorial1, FACTORIAL_PASSING_TESTCASES_1 + FACTORIAL_FAILING_TESTCASES_1, factorial1_test, math.factorial)
test1 = Test(factorial2, FACTORIAL_PASSING_TESTCASES_2 + FACTORIAL_FAILING_TESTCASES_2, factorial2_test, math.factorial)
test2 = Test(factorial3, FACTORIAL_PASSING_TESTCASES_3 + FACTORIAL_FAILING_TESTCASES_3, factorial3_test, math.factorial)
test3 = Test(middle, MIDDLE_PASSING_TESTCASES + MIDDLE_FAILING_TESTCASES, middle_test, middle_assert)
test4 = Test(power, POWER_PASSING_TESTCASES + POWER_FAILING_TESTCASES, power_test, pow)
```

```
tests = [test0, test1, test2, test3, test4]
```

```
class Tester:
def __init__(self, function: Callable, tests: List) -> None:
self.function = function
self.tests = tests
random.seed(42) # We use this seed for our evaluation; don't change it.
def run_tests(self) -> None:
for test in self.tests:
try:
test.run(self.function)
print(f'Test {test.function.__name__}: OK')
except AssertionError:
print(f'Test {test.function.__name__}: Failed')
```

```
tester = Tester(simple_debug_and_repair, tests) # TODO: replace simple_debug_and_repair by your debug_and_repair function
tester.run_tests()
```

By executing the `Tester`

as shown above, you can assess the quality of your repairing approach, by testing your own `debug_and_repair()`

function.

## Grading¶

Your project will be graded by *automated tests*. The tests are executed in the same manner as shown above.
In total there are **20 points** + **10 bonus points** to be awarded for this project. **20 points** for the must-haves, **10 bonus points** for may-haves.

### Must-Haves (20 points)¶

Must haves include an implementation of the `debug_and_repair`

function in a way that it automatically repairs faulty functions given sufficiantly large test suites.
**10 points** are awarded for passing the tests in this notebook. Each passing test being worth two points.
**10 points** are awarded for passing secret tests.

### May-Haves (10 points)¶

May-haves will also be tested with secret tests, and award **2 points** each. The may-have-features for this project are a more robust implementation, that is able to cope with a wider range of defects:

- Infinite loops
- Infinite recursion (
`RecursionError`

in Python) - Type errors (
`TypeError`

in Python) - Undefined identifiers (
`NameError`

in Python)

### General Rules¶

You need to achieve at least **10 points** to be awarded any points at all.

Tests must be passed without hard-coding results, otherwise no points are awarded.

Your code needs to be sufficiently documented in order to achieve points!

The content of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The source code that is part of the content, as well as the source code used to format and display that content is licensed under the MIT License. Last change: 2023-11-11 18:05:06+01:00 • Cite • Imprint