Comments (7)
Disclaimer: I may be saying things here that are known by the contributors to pyt. I'm sorry if this is just too obvious at this point.
I think this is an example of flow/path insensitivity, right? The first example says that pyt is flow insensitive, and the second one that pyt is path insensitive. Most commercial analyzers are flow sensitive, but not path sensitive due to the exponential explosion of path constraints involved.
I just started looking into pyt
so I'm not quite sure yet how it performs taint analysis, but it looks like it assigns constraints to the nodes / qualifiers to the variables and uses reaching definitions + fixed point to solve them to get a result? I would have to look more into reaching definitions to see if the following applies. If it does, we're on a good place to solve at least the first issue described here.
In the example:
def maybe_return_the_arg(foo):
whatup = foo
whatup = 'no vuln'
return whatup
we would need to assign each assignment to whatup
a different qualifier, whatup1
, and whatup2
. Then the value that reaches the return
is whatup2
which is not tainted. This will result in a correct analysis. This is equivalent to rewriting the program in Static Single Assignment (SSA).
Although I agree with @KevinHock that code like the above is rare, this issue can materialize in different ways that are more plausible.
value = request.form.get("value")
while(some_criteria(value)):
r = sink(value)
value = r
name = request.form.get('name')
name = sanitize(name)
sink(name)
I'd argue those examples, specially the second one warrant trying to address this issue.
On the other hand, in the example:
def maybe_return_the_arg(foo):
if foo.startswith('h'): # Ignore the actual condition, we can statically SMT solve a year from now.
return 'huh' + foo
else:
return 'apple'
we would need to add a path condition to each constraint. That will allow pyt to solve for those constraints that have compatible path conditions (e.g. foo.startswith('h')
). In this particular case, I don't think pyt can solve foo.startswith('h')
if foo
is a source, so it should probably report an issue with a lower confidence level. In certain circumstances it may be possible to tell whether the path condition is true and compatible with other paths along the code. There's a video from University of Maryland that covers this very concisely: https://www.youtube.com/watch?v=UhY8_Wb40B0
Again sorry if I'm being captain obvious or too naive. I'm just trying to understand how pyt works and finding areas I can contribute to and I got to this open issue.
from pyt.
You're definitely right about everything you just said @adrianbn! Thank you for contributing to and being interested in PyT :)
It turns out that PyT currently is flow-sensitive, and I didn't update this GitHub issue, my bad! 🤐 Here is where the vulnerability finding happens, I initially thought I fixed this when I added def-use chains, in this PR, but I put a print statement after if tainted_node_in_sink_arg:
and it never got executed, so it must have been fixed before that. I might have fixed some bug in the reaching definitions code a long time ago and forgot exactly what.
Repro: I re-read what I had written in this issue and thought, "That seems weird, we should kill the previous definition? Maybe it is specific to return values of functions." I went to that part of the code, and figured I'd run it on an example to sanity check.
Before changing, we run:
(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py
1 vulnerability found:
Vulnerability 1:
File: examples/vulnerable_code/command_injection.py
> User input at line 15, source "form[":
param = request.form['suggestion']
Reassigned in:
File: examples/vulnerable_code/command_injection.py
> Line 16: command = 'echo ' + param + ' >> ' + 'menu.txt'
File: examples/vulnerable_code/command_injection.py
> reaches line 18, sink "subprocess.call(":
~call_1 = ret_subprocess.call(command, shell=True)
After changing, let's run:
(code) Kevins-MacBook-Pro:pyt kevin$ git diff
diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..8025f84 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -13,6 +13,7 @@ def index():
@app.route('/menu', methods=['POST'])
def menu():
param = request.form['suggestion']
+ param = 'kill that def!'
command = 'echo ' + param + ' >> ' + 'menu.txt'
subprocess.call(command, shell=True)
(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py
No vulnerabilities found.
So that's good, maybe this is specific to return values, let me try my example.
(code) Kevins-MacBook-Pro:pyt kevin$ git diff
diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..7745b0a 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -10,10 +10,16 @@ def index():
return render_template('command_injection.html', menu=menu)
+def is_it_just_return_values(tainted_param):
+ ret_me = tainted_param
+ ret_me = 'no vuln'
+ return ret_me
+
@app.route('/menu', methods=['POST'])
def menu():
param = request.form['suggestion']
- command = 'echo ' + param + ' >> ' + 'menu.txt'
+ hey = is_it_just_return_values(param)
+ command = 'echo ' + hey + ' >> ' + 'menu.txt'
subprocess.call(command, shell=True)
(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py
No vulnerabilities found.
Okay, so that's good, I should have updated this Github issue.
Comment out the overwriting assignment:
(code) Kevins-MacBook-Pro:pyt kevin$ clear;git diff
diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..02d8ba8 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -10,10 +10,16 @@ def index():
return render_template('command_injection.html', menu=menu)
+def is_it_just_return_values(tainted_param):
+ ret_me = tainted_param
+ # ret_me = 'no vuln'
+ return ret_me
+
@app.route('/menu', methods=['POST'])
def menu():
param = request.form['suggestion']
- command = 'echo ' + param + ' >> ' + 'menu.txt'
+ hey = is_it_just_return_values(param)
+ command = 'echo ' + hey + ' >> ' + 'menu.txt'
subprocess.call(command, shell=True)
and we get
(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py
1 vulnerability found:
Vulnerability 1:
File: examples/vulnerable_code/command_injection.py
> User input at line 20, source "form[":
param = request.form['suggestion']
Reassigned in:
File: examples/vulnerable_code/command_injection.py
> Line 13: save_1_param = param
File: examples/vulnerable_code/command_injection.py
> Line 21: temp_1_tainted_param = param
File: examples/vulnerable_code/command_injection.py
> Line 13: tainted_param = temp_1_tainted_param
File: examples/vulnerable_code/command_injection.py
> Line 14: ret_me = tainted_param
File: examples/vulnerable_code/command_injection.py
> Line 16: ret_is_it_just_return_values = ret_me
File: examples/vulnerable_code/command_injection.py
> Line 21: ~call_1 = ret_is_it_just_return_values
File: examples/vulnerable_code/command_injection.py
> Line 21: hey = ~call_1
File: examples/vulnerable_code/command_injection.py
> Line 22: command = 'echo ' + hey + ' >> ' + 'menu.txt'
File: examples/vulnerable_code/command_injection.py
> reaches line 24, sink "subprocess.call(":
~call_2 = ret_subprocess.call(command, shell=True)
💥
from pyt.
We could try to turn Python code into SSA, and prune it via liveness, and add predicates so it's PSSA, and try to solve those predicates with e.g. z3str3
etc. or something similar. But handling all node types, being performant, and being battle tested (i.e. no recursion bugs or crashes), is higher priority to me, b/c I want it to be usable and run-able on tons of codebases.
from pyt.
Okay, so, I was going to suggest we do something to sort of OR all the different return values, such that if one was tainted then we would say it returned a tainted value. (Maybe this could be accomplished with e.g. ret_foo = first_ret + second_ret + third_ret
.)
But, it turns out there's another issue that we would need to fix for this to even be an issue, because we currently do the ORing above even when we should not, on all variables e.g.
def maybe_return_the_arg(foo):
whatup = foo
whatup = 'no vuln'
return whatup
says the return value is tainted.
Where it should only say it's tainted when e.g.
def maybe_return_the_arg(foo):
if foo.startswith('h'): # Ignore the actual condition, we can statically SMT solve a year from now.
return 'huh' + foo
else:
return 'apple'
if we decide to fix the 1st example in this comment, then I'm pretty sure we'll introduce a false-negative in the 2nd example, because it will see ret_maybe_return_the_arg = 'apple'
and think the tainted value (assigned in ret_maybe_return_the_arg = 'huh' + foo
was overwritten.)
Overall, finding out that the 2nd example works right now, and thinking the 1st example is contrived and people don't normally write code like that, I care a lot less about any of this. But I'm glad I doc'd it ;)
If we were going to fix the first example, it would probably be in https://github.com/python-security/pyt/blob/master/pyt/reaching_definitions_taint.py but I don't plan on working on this any time soon.
from pyt.
To do the ORing ( e.g. ret_foo = first_ret + second_ret + third_ret
) you could easily do it in return_connection_handler
.
from pyt.
I'm not quite sure yet how it performs taint analysis, but it looks like it assigns constraints to the nodes / qualifiers to the variables and uses reaching definitions + fixed point to solve them to get a result?
I'm not certain what constraints/qualifiers mean, but the we mostly do reaching definitions + fixed point. I wrote some documentation in this readme, but let me know if you'd like me to clarify anything :D
In this particular case, I don't think pyt can solve
if foo.startswith('h'):
if foo is a source, so it should probably report an issue with a lower confidence level.
We sort of kind of do that with the following:
pyt/pyt/vulnerabilities/vulnerabilities.py
Lines 431 to 432 in 4b495ad
pyt/pyt/vulnerabilities/vulnerabilities.py
Lines 311 to 312 in 4b495ad
Again sorry if I'm being captain obvious or too naive. I'm just trying to understand how pyt works and finding areas I can contribute to and I got to this open issue.
No worries! I'm glad you are interested @adrianbn 😁 One area you could contribute to, although it's rather difficult, is that we currently use inlining instead of summaries, for inter-procedural analysis, which makes PyT slower than it needs to be.
Here are some videos, specifically the last one, explains function summaries well:
#57 Call Graphs
#58 Interprocedural Data Flow Analysis
#59 Procedure Summaries in Data Flow Analysis
Feel free to ping me with any questions. My current open PR tries to address certain expressions, so we can handle BoolOp
's and IfExp
's etc., I think you've motivated me to work on it / clean it up 👍 (It sort of became like a big chapter in a book I need a long plane ride to finish or something.)
from pyt.
I'm gonna go ahead and close this issue 👍 b/c it was solved before.
from pyt.
Related Issues (20)
- OSError: Input needs to be a file. Path: <path>/app.py HOT 3
- python3.7 support HOT 1
- AttributeError: 'IgnoredNode' object has no attribute 'first_statement'
- RecursionError: maximum recursion depth exceeded while calling a Python object HOT 3
- Control flow incorrect if imported functions have the same name HOT 2
- Sources and sinks should propagate
- Inappropriate ioctl using Ubuntu and pty HOT 3
- How to put it inside the proxy? HOT 1
- How to handle callbacks HOT 2
- Fails on Python 3.9.0 HOT 7
- o HOT 1
- pyt usually picks the wrong encoding to load files HOT 1
- args is empty in BBorBInode when CFG generated
- RFH: jsonpickle security detection HOT 1
- .
- Teste
- H
- real
- Vulnerable python code HOT 1
- Tic tac toe
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyt.