Second Project¶
In this project, you will implement a debugging reducer able to reduce a Python program containing a specific property down to a syntactically correct Python program that only contains said property. To do this, you will be given several Python parsers checking for specific properties as well as Python programs containing these properties.
The time frame for this project is 2 weeks, and the Deadline is January 15th 23:59.
import bookutils.setup
from bookutils import show_ast
from ExpectError import ExpectError
Description¶
Imagine we want to create a new compiler for Python programs which transforms the code into a binary executable. To this end, at first we need to implement a parser that parses a given program into a set of individual statements which can then be converted to the bytecode.
Parsing Python¶
If we want to parse a programming language like Python, it makes sense to parse the code into an abstract syntax tree (AST) and work with the AST from this point onward.
The reason we might want to do this, is to preserve the syntactic integrity of our program.
Let's start with parsing a simple Python function (foo
) into an AST using ast.parse()
and printing it.
def foo(a, b):
"""
Checks whether a and b are true
"""
if a and b:
return 1
return 0
We take a string representation of foo()
...
source = inspect.getsource(foo)
print(source)
...and convert it to the AST:
source_ast = ast.parse(source)
show_ast(source_ast)
We can see how the code is structured by inspecting the AST shown above.
To learn the structure of an AST, have a look at the official Python ast
reference for a list of AST nodes and their attributes.
This reference is complete, but a bit terse; the "Green Tree Snakes missing Python AST docs" provide a good manual on how to modify the tree.
If we want to inspect certain elements of the source code, we can now use the ast.NodeVisitor
to visit every node in the AST.
This can be done by extending this class and implementing visit-methods for each type of node in the tree we want to inspect.
So, let's implement a parser that first parses the source code into an AST and then visits each node one after another.
class SimpleParser(ast.NodeVisitor):
"""
A simple parser printing all nodes in order
"""
def parse(self, file: str) -> None:
"""
When the parse function is called, we parse the source code
into a tree and then start visiting all nodes.
"""
tree = ast.parse(source=file)
self.visit(tree)
def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
print(node)
self.generic_visit(node)
def visit_arguments(self, node: ast.arguments) -> None:
print(node)
self.generic_visit(node)
def visit_If(self, node: ast.If) -> None:
print(node)
self.generic_visit(node)
def visit_Return(self, node: ast.Return) -> None:
print(node)
self.generic_visit(node)
def visit_arg(self, node: ast.arg) -> None:
print(node)
self.generic_visit(node)
def visit_Constant(self, node: ast.Constant) -> None:
print(node)
self.generic_visit(node)
def visit_Name(self, node: ast.Name) -> None:
print(node)
self.generic_visit(node)
def visit_BoolOp(self, node: ast.BoolOp) -> None:
print(node)
self.generic_visit(node)
Using this SimpleParser
, we can visit each node and print the node object itself.
SimpleParser().parse(source)
Now imagine, instead of just parsing the input and printing each node, we start to process the information present in each node.
To notify the user if anything breaks, we introduce a ParserException
that is thrown when we encounter a parsing error.
class ParserException(Exception):
pass
For instance, let's consider our parser has troubles processing boolean operators.
We simulate this behavior by extending our parser to throw an exception whenever it encounters a BoolOp
node.
class SimpleParserPlus(SimpleParser):
"""
A simple parser printing all nodes in order until it reaches a BoolOp
"""
def visit_BoolOp(self, node: ast.BoolOp) -> None:
"""
If a BoolOp is encountered, we throw an exception
"""
raise ParserException("Something went wrong")
with ExpectError():
SimpleParserPlus().parse(source)
As we can see here, as soon as we encounter a boolean operation, an exception is thrown.
Reducing Python¶
Let's get back to our SimpleParserPlus
example. This parser parses Python code into an AST and then iterates over all nodes. When it reaches a boolean operation, it encounters an error and crashes.
Of course our example is a very small and artificial one. Furthermore, the input triggering our bug is rather small as well.
However, in reality a similar situation might present itself to you but with a far less obvious error and a far larger input that causes it.
In this case, instead of using the entire input, one might want to reduce the input down to only those parts actually causing the error.
But how do we do that?
We could, for instance, randomly remove some parts of the program.
However, this would likely produce syntactically invalid Python code and hence prevent the parser from reaching the desired code location.
So instead of just removing random parts of the code itself, we will manipulate the AST to preserve the correct syntax.
How do we manipulate an AST produced by ast.parse
? We can make use of the ast.NodeTransformer
:
class NodeReducer(ast.NodeTransformer):
def visit_If(self, node: ast.If) -> ast.AST:
"""
Instead of the `if` node return just its condition
"""
# Apply other transformations to the children first
super().generic_visit(node)
# We create an expression around the node test
new_node = ast.Expr(node.test)
return new_node
The ast.NodeTransformer
again uses the visitor pattern to walk over all nodes.
This time, however, we change the resulting AST. In our case, if we encounter an if-statement we only return the condition.
(Keep in mind, NodeReducer
changes the ast in place, so if you want to preserve the original tree, you need to copy it with copy.deepcopy(tree)
)
NodeReducer().visit(source_ast)
show_ast(source_ast)
As you can see, after running the NodeReducer
there is no longer an if-statement, but only its condition.
A more sophisticated NodeReducer is presented in the Reducing Failure-Inducing Inputs chapter in the debugging book.
To learn the structure of an AST, have a look at the official Python ast
reference for a list of AST nodes and their attributes. This reference is complete, but a bit terse; the "Green Tree Snakes missing Python AST docs" provide a good manual on how to modify the tree.
Pro tip: You can learn the expected structure of any Python fragment by invoking ast.dump()
on any (parsed) AST.
demo_ast = ast.parse('''
if a == b:
c = d
''')
print(ast.dump(demo_ast, indent=4))
The nice thing is that this not only reveals the AST structure, it also tells you how to construct it. If you take the very structure as produced by ast.dump()
and evaluate it, you obtain the same syntax tree.
from ast import Module, If, Compare, Load, Eq, Assign, Name, Store
ast_from_scratch = Module(
body=[
If(
test=Compare(
left=Name(id='a', ctx=Load()),
ops=[
Eq()],
comparators=[
Name(id='b', ctx=Load())]),
body=[
Assign(
targets=[
Name(id='c', ctx=Store())],
value=Name(id='d', ctx=Load()))],
orelse=[])],
type_ignores=[])
You need to invoke ast.fix_missing_locations()
to add line numbers and other location info that would be normally added during parsing...
ast_from_scratch = ast.fix_missing_locations(ast_from_scratch)
... and with this, you're all set:
print(ast.unparse(ast_from_scratch))
Hence, if you do not know what kind of AST fragment to produce, just
- parse its text with
ast.parse()
- study the expression produced by
ast.dump()
, and - build it from the same expression, using
ast.fix_missing_locations()
andast.unparse()
Implementing a Reducing Debugger¶
Having seen how one can access and transform AST structures using a NodeReducer
in Python, we can now use that knowledge to implement a ReducingDebugger
capable of reducing the failure inducing Python code down to only that parts of the code that actually triggers the failure.
class ReducingDebugger:
def __init__(self, parser: SimpleParser) -> None:
"""
We initialize the DebuggingReducer with a parser,
as this is needed to verify whether the failure
is still triggered after the code transformation
"""
self.parser = parser
self.node_reducer = NodeReducer()
def minimize(self, code: str) -> str:
"""
This function takes some Python code as string, reduces
it using the NodeReducer defined earlier and returns the
reduced program.
"""
# Parse the code to a tree
code_ast = ast.parse(source=code)
# Use the visit-method of the NodeReducer to reduce if nodes down to their condition
new_code_ast = self.node_reducer.visit(code_ast)
# After we've updated nodes, we need to fix the tree
ast.fix_missing_locations(new_code_ast)
# Generate code from the reduced tree
new_code = ast.unparse(new_code_ast)
# Test, whether the error is still triggered by the reduced code
try:
self.parser.parse(new_code)
# No exception is thrown. This means the new_code does not
# trigger an error anymore. Therefore, we failed in
# reduction and return the initial code.
return code
except ParserException:
# The error is still triggered. Return the reduced code
return new_code
The ReducingDebugger
implemented above takes some Python code as an input and replaces all if-statements with their conditions.
Then, it checks whether the newly generated code still triggers the error. If not, it returns the initial code. If the exception is still triggered, it returns the reduced code.
Let's apply it to the source code of our foo
function:
reducing_debugger = ReducingDebugger(SimpleParserPlus())
new_source = reducing_debugger.minimize(source)
print(new_source)
with ExpectError():
SimpleParserPlus().parse(new_source)
As we can see, our ReducingDebugger
reduces the original function to a smaller input, while still preserving the property of the code that triggers the same error as before.
However, this reducer is very simple: it applies only one transformation to all if
nodes at once.
Instead, one can consider various transformations of the same node as well as transformations of various node types and apply them one by one (or in batches) to gradually reduce the code.
Keep in mind, that the changes should be made in a way that the code is still accepted by a regular Python interpreter. Besides, the goal is to obtain a minimal program that still triggers the error. So, your debugging reducer should not simply reduce all possible nodes, but also regularly check whether the reduced code still triggers the failure.
Therefore, you are required to implement certain modifications that can be done to the AST. In general, these transformations can be one of the following:
- Delete a node,
- Substitute a node with the new node (e.g., pass node),
- Substitute a node with all of its children,
- Substitute a node with one of its children.
More detailed, one can consider, for instance:
- Replacing a
BoolOp
node byTrue
. - Replacing a
BoolOp
node byFalse
. - Replacing a
BoolOp
node by its left operand. - Replacing a
BoolOp
node by its right operand. - Replacing an
If
node by its "then" body. - Replacing an
If
node by its condition. - Replacing an
If
node by its "else" body. - Replacing all instances of a variable by a constant.
- Replacing expressions by a constant.
- etc.
Please, refer to the official Python ast
reference for a full list of node types.
Implementation¶
To succeed in this project, you need to complete the MyReducer
class.
class MyReducer(ReducingDebugger):
def minimize(self, code: str) -> str:
# TODO: implement this!
return code
To this end, you will need to extend the NodeReducer
with additional transformations and implement a reducer strategy
which collects all possible transformations and applies them to the AST until it is exhaustively minimized.
Feel free to make your own implementation or extend the DeltaDebugger
from the Reducing Failure-Inducing Inputs chapter with
the proper strategy.
Evaluation¶
We evaluate your project based on public as well as secret tests. In this section, we present five different Python parsers as well as five Python input programs, which should be minified. These parsers check for a specific property in the code and fail the execution if the property exists. The input programs and parsers in this section make up the public test cases. If you pass all of those tests without hardcoding the modifications you are guaranteed to score 15 points in this project. You can get more points for passing secret tests.
Parsers¶
class Parser(ast.NodeVisitor):
"""The base class for a parser"""
def parse_tree(self, tree: ast.AST) -> None:
self.visit(tree)
def parse(self, source: str) -> str:
tree = ast.parse(source=source)
self.parse_tree(tree)
return "The input was successfully parsed."
Parser
is the base class for a parser from which all other parsers are derived. It works the same way as the SimpleParser
introduced earlier.
Let's define a couple of parsers which model a failure during processing of a certain code feature.
class Parser1(Parser):
"""
Contains a boolean operation
"""
def visit_BoolOp(self, node: ast.BoolOp) -> None:
raise ParserException(f"Something went wrong")
If we feed this parser with an input program which contains a boolean operation, it fails:
parser1 = Parser1()
input_program = '''
a = True and False'''
with ExpectError():
parser1.parse(input_program)
In contrast to the above, an input program without a boolean operation is parsed correctly.
parser1 = Parser1()
input_program = '''
a = 1
b = True
'''
parser1.parse(input_program)
The other parsers are
class Parser2(Parser):
"""
Fails if an input program contains `if` statement
"""
def visit_If(self, node: ast.If) -> None:
raise ParserException(f"Something went wrong")
parser2 = Parser2()
input_program = '''
if a:
b = 1
else:
b = 2
'''
with ExpectError():
parser2.parse(input_program)
class Parser3(Parser):
"""
Fails if an input program contains a special unicode character
"""
def __init__(self) -> None:
self.assignment = False
self.steps = 0
def check_unicode(self, string: str) -> bool:
return string == u'\u0426'
def generic_visit(self, node: ast.AST) -> None:
self.steps += 1
super().generic_visit(node)
def visit_Assign(self, node: ast.Assign) -> None:
self.assignment = True
self.steps = 0
self.generic_visit(node)
def visit_Constant(self, node: ast.Constant) -> None:
if self.assignment and self.steps == 3:
if isinstance(node.value, str):
if self.check_unicode(node.value):
raise ParserException(f"Something went wrong")
parser3 = Parser3()
input_program = '''
a = u'\u0426'
'''
with ExpectError(ParserException):
parser3.parse(input_program)
class Parser4(Parser):
"""
Fails if an input program contains a variable which is not defined
"""
def __init__(self) -> None:
self.assignment = False
self.steps = 0
self.variables: Set[str] = set()
def generic_visit(self, node: ast.AST) -> None:
self.steps += 1
super().generic_visit(node)
def visit_Name(self, node: ast.Name) -> None:
if self.assignment and self.steps == 1:
self.variables.add(node.id)
self.assignment = False
self.generic_visit(node)
elif node.id in self.variables:
self.generic_visit(node)
else:
raise ParserException(f"Something went wrong")
def visit_Assign(self, node: ast.Assign) -> None:
self.assignment = True
self.steps = 0
self.generic_visit(node)
parser4 = Parser4()
input_program = '''
a = 1
b = c
'''
with ExpectError(ParserException):
parser4.parse(input_program)
class Parser5(Parser):
"""
Fails if an input program contains a list
"""
def visit_List(self, node: ast.List) -> None:
raise ParserException(f"Something went wrong")
parser5 = Parser5()
input_program = '''
a = {x * 2 for x in map(int, ['1', '2', '3'])}
'''
with ExpectError(ParserException):
parser5.parse(input_program)
For secret tests we may use other parsers crafted in the same way.
Input Programs¶
Now we present five public tests.
Each test case implements a get_original()
method that returns an input program which your reducer should minify.
It also has a parser
field which stores a parser used for a respective input program.
Besides, get_minimized()
method provides a reference solution.
class Test1:
parser = Parser1()
def get_original(self) -> str:
return '''
def original():
a = True
b = not False
c = 30
for i in range(c):
if i == 15:
if a and b:
return 1
return 0
'''
def get_minimized(self) -> str:
return '''
True and True'''
class Test2:
parser = Parser2()
def get_original(self) -> str:
return '''
def original():
a = True
b = not False
c = 30
for i in range(c):
if i == 15:
if a and b:
return 1
return 0
'''
def get_minimized(self) -> str:
return '''
if True:
return
'''
class Test3:
parser = Parser3()
def get_original(self) -> str:
return '''
def original():
a = 1
b = a
c = a - b
if c < a:
d = ''
while a == b:
d = u'\u0426'
a += 1
return d
return ''
'''
def get_minimized(self) -> str:
return '''
d = u'\u0426'
'''
class Test4:
parser = Parser4()
def get_original(self) -> str:
return '''
def original():
a = 1
b = a
c = a - b
if c < a:
while a == b:
a += 1
return d
return ''
'''
def get_minimized(self) -> str:
return '''
d
'''
class Test5:
parser = Parser5()
def get_original(self) -> str:
return '''
def original():
a = 1
b = 0
while True:
if a < b:
return [1, 2, 3]
else:
return []
'''
def get_minimized(self) -> str:
return '''
[]
'''
For instance, let's take the input program of Test1
:
test = Test1()
source = test.get_original()
print(source)
And parse it with the provided parser:
with ExpectError():
test.parser.parse(source)
The parser failed.
Testing Infrastructure¶
The following section introduces the testing infrastructure, which is used to assess your performance in the project. Passing all tests will be enough to complete the project successfully.
We are not going to directly compare a minimized program with the reference solution (as two strings). Instead, we again transform the code into AST and count the number of nodes in both trees.
If the difference between them is lower than a THRESHOLD
(and a parser still produces an error), the test is passed.
To count the number of nodes in the AST, we need a helper class NodeCounter
.
class NodeCounter(ast.NodeVisitor):
"""
This node counter is used to assess
the amount of reductions performed by your reducer.
It counts the number of nodes in the AST.
"""
def __init__(self) -> None:
self.num_nodes = 0
def visit(self, node: ast.AST) -> None:
self.num_nodes += 1
self.generic_visit(node)
def count(self, source: str) -> int:
tree = ast.parse(source=source)
self.visit(tree)
return self.num_nodes
The AST of the input program is as follows:
show_ast(ast.parse(source))
print(f"The number of nodes is {NodeCounter().count(source)}")
As you can see, the input program of Test1
has 42 nodes.
The testing framework takes a reducer and allows executing a single test or all tests at once.
class TestingFramework:
THRESHOLD = 3
test_cases = {
'test1': Test1(),
'test2': Test2(),
'test3': Test3(),
'test4': Test4(),
'test5': Test5()
}
def __init__(self, reducer: Any) -> None:
self.reducer = reducer
def count_nodes(self, source: Optional[str]) -> int:
if source is None:
return 100000
return NodeCounter().count(source)
def run_test(self, test: Any) -> bool:
"""
run a single test
"""
print(f'Running test {test.__class__.__name__}')
reducer = self.reducer(test.parser)
reduced_code = reducer.minimize(test.get_original())
return (self.has_property(reduced_code, test.parser) and
self.is_minimized(reduced_code, test.get_minimized()))
def run_tests(self) -> None:
"""
run all public tests
"""
passed_tests = 0
for test in self.test_cases.values():
success = self.run_test(test)
if success:
passed_tests += 1
print(f"In total {passed_tests} tests passed")
def has_property(self, source: str, parser: Any) -> bool:
"""returns True if the parser fails to parse the source"""
try:
parser.parse(source)
print(f'HAS PROPERTY: FAIL')
return False
except ParserException:
print(f'HAS PROPERTY: OK')
return True
except Exception as e:
print(f'HAS PROPERTY: FAIL {e}')
return False
def is_minimized(self, reduced: str, reference: str) -> bool:
"""
Returns True if the AST of the reduced code contains
no more then the number of nodes in the reference + a THRESHOLD
"""
count_minimized = self.count_nodes(reduced)
count_reference = self.count_nodes(reference)
if count_minimized <= count_reference + self.THRESHOLD:
print(f'IS MINIMIZED: OK')
return True
else:
print(f'IS MINIMIZED: FAIL')
return False
Let's test our ReducingDebugger
with the following test:
class Test0:
parser = Parser1()
def get_original(self) -> str:
return '''
if a and b:
c = 1
else:
c = 2
'''
def get_minimized(self) -> str:
return '''
a and b'''
tf = TestingFramework(ReducingDebugger)
if tf.run_test(Test0()):
print("Success!")
The input program was successfully minimized.
What if we run all public tests?
tf.run_tests()
Unfortunately, our parser failed to minimize all input programs from the public test suite.
Grading¶
For this project, you can get a total of 30 Points:
- 15 Points will be awarded for passing the public test (without hardcoding the minimized solution or the sequence of transformations) presented in this notebook. These 15 points mean that you successfully implemented the must-have implementation.
- 5 Points will be awarded for passing secret tests.
- 10 Points will be awarded for efficient implementation (see may-have implementation).
Must-have Features¶
Implement a reducer which minifies a given program so that its parser still produces an error. To this end, collect all possible transformations over the nodes of an AST tree and then apply one change at a time. These modifications should be repeated until no further updates can be made without triggering a parser exception. If your reducer passes secret test you are awarded additional 5 points.
Note: Implementing the given modifications should be sufficient to successfully complete this project.
May-have Features¶
An implementation only of the must-have features aims for correctness but can be inefficient (if transformations are applied one at a time). So, it can be optimized, for instance, with help of the Delta Debugging approach.
Implement an AST Delta Debugger which efficiently prunes the nodes. If your reducer can beat the runtime of our reference implementation (which simply collects all possible transformations and applies them randomly one at a time) you can get up to 10 points (depending on the margin).
Hint: The Reducing Failure-Inducing Inputs chapter already tries to optimize the reduction of the AST with help of Delta Debugging.
Hint: For instance, you can find useful Hierarchical Delta Debugging paper from the Background section of Reducing Failure-Inducing Inputs Chapter.
The content of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The source code that is part of the content, as well as the source code used to format and display that content is licensed under the MIT License. Last change: 2023-11-11 18:05:06+01:00 • Cite • Imprint