furlat / openbugger Goto Github PK
View Code? Open in Web Editor NEWCode to create bugged python scripts for OpenAssistant Training, maintained by https://twitter.com/Cyndesama
License: Apache License 2.0
Code to create bugged python scripts for OpenAssistant Training, maintained by https://twitter.com/Cyndesama
License: Apache License 2.0
Here is a taxonomy list of common bugs you might encounter when writing code, ordered from most common to less common:
These are mistakes in the structure of your code, such as missing brackets or semicolons. Syntax errors are usually the most common type of bug and are usually easy to fix.
These are mistakes in the logic of your code, such as using the wrong operator or comparing the wrong values. Logic errors are usually more difficult to find and fix because they don't produce any error messages.
These are mistakes in the meaning of your code, such as using the wrong variable or function. Semantic errors can be difficult to spot because the code may still compile and run, but it produces the wrong results.
These are errors that occur while the code is running, such as trying to access an array index that is out of bounds or dividing by zero. Runtime errors can be difficult to debug because they may only occur in certain conditions or at certain points in the code.
These are bugs that occur when the code fails to release resources that it has acquired, such as memory or file handles. Resource leaks can cause performance issues and may not be immediately noticeable.
These are bugs that could be exploited by malicious actors to gain unauthorized access to systems or data. Security vulnerabilities are generally the most serious type of bug and should be given the highest priority when fixing.
Concrete Syntax Tree are lossless and we can guarantee script == bugger.debug(bugger.bug(script))
https://github.com/furlat/OpenBugger/blob/main/notebooks/doc_string_extraction.ipynb
does not work with
def multiline_str():
""" Returns a multiline string """
s = """This is
a
multiline
string!
"""
a = """Another
multiline
string!"
"""
return s + a
A bug type I've come across myself has been with forgetting to use the global
keyword with global variables. Here's an example:
# A globally accessible list
current_labels = []
def reset_current_labels():
""" Clears the label list """
current_labels = []
The bug is that calling reset_current_labels()
will not modify current_labels:
>>> current_labels.append('delete me')
>>> current_labels
['delete me']
>>> reset_current_labels()
>>> current_labels
['delete me']
The correct code would be
# A globally accessible list
current_labels = []
def reset_current_labels():
""" Clears the label list """
global current_labels
current_labels = []
So to bug the code, you would remove one or more global
statements.
Python comes with a built-in module for parsing its own code called ast
for parsing the abstract syntax tree of Python code.
We should use ast
instead of regex for creating logic bugs. First, we parse the code into an AST object. Then, we can modify the AST to reflect the logic bug we wish to create. Finally, we use a module like astor
to convert the AST object back into Python source code.
Here is an example that can help solve #5 by accurately locating and selectively removing individual variables from global statements:
import ast
import random
def remove_random_global(tree):
# Find all global statements and their parents in the AST
globals_and_parents = [(node, parent) for parent in ast.walk(tree) for node in getattr(parent, 'body', []) if isinstance(node, ast.Global)]
# If there are no global statements, return the original code
if len(globals_and_parents) == 0:
return
random_global, parent = random.choice(globals_and_parents)
if len(random_global.names) > 1:
# Remove a single variable from the declaration
random_var = random.choice(random_global.names)
random_global.names.remove(random_var)
else:
# Remove the entire global statement
parent.body.remove(random_global)
code = '''
a = 0
b = 1
result = 0
def fib_next():
""" Computes the next Fibonacci number """
global a, b
global result
a_temp = b
b += a
a = a_temp
result = a
'''
tree = ast.parse(code)
remove_random_global(tree)
print(astor.to_source(tree))
Sample result:
>>> print(astor.to_source(tree))
a = 0
b = 1
result = 0
def fib_next():
""" Computes the next Fibonacci number """
global a
global result
a_temp = b
b += a
a = a_temp
result = a
Notice that global a, b
has changed to global a
. Running fib_next()
will return an UnboundLocalError: local variable 'b' referenced before assignment
.
We can use similar techniques to introduce other types of logic bugs into Python scripts.
Furthermore, ast
can also tell us if a Python program is formatted correctly. ast.parse
will return the precise parsing error if not:
>>> ast.parse("5 = 5")
File "<unknown>", line 1
SyntaxError: cannot assign to literal
[Pile V2](In Progress talking with the carper ai team)
The Pile V2 is a filter of the Stack.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.