I'll try to work on this relatively soon, but to think out loud.. <p di

You're definitely right about everything you just said <a class="user-mention notransl

(Not an issue right now) Handle multiple returns about pyt HOT 7 CLOSED

python-security commented on May 22, 2024

(Not an issue right now) Handle multiple returns

from pyt.

Comments (7)

adrianbn commented on May 22, 2024 2

Disclaimer: I may be saying things here that are known by the contributors to pyt. I'm sorry if this is just too obvious at this point.

I think this is an example of flow/path insensitivity, right? The first example says that pyt is flow insensitive, and the second one that pyt is path insensitive. Most commercial analyzers are flow sensitive, but not path sensitive due to the exponential explosion of path constraints involved.

I just started looking into pyt so I'm not quite sure yet how it performs taint analysis, but it looks like it assigns constraints to the nodes / qualifiers to the variables and uses reaching definitions + fixed point to solve them to get a result? I would have to look more into reaching definitions to see if the following applies. If it does, we're on a good place to solve at least the first issue described here.

In the example:

def maybe_return_the_arg(foo):
    whatup = foo
    whatup = 'no vuln'
    return whatup

we would need to assign each assignment to whatup a different qualifier, whatup1, and whatup2. Then the value that reaches the return is whatup2 which is not tainted. This will result in a correct analysis. This is equivalent to rewriting the program in Static Single Assignment (SSA).

Although I agree with @KevinHock that code like the above is rare, this issue can materialize in different ways that are more plausible.

value = request.form.get("value")
while(some_criteria(value)):
    r = sink(value)
    value = r

name = request.form.get('name')
name = sanitize(name)
sink(name)

I'd argue those examples, specially the second one warrant trying to address this issue.

On the other hand, in the example:

def maybe_return_the_arg(foo):
    if foo.startswith('h'): # Ignore the actual condition, we can statically SMT solve a year from now.
        return 'huh' + foo
    else:
        return 'apple'

we would need to add a path condition to each constraint. That will allow pyt to solve for those constraints that have compatible path conditions (e.g. foo.startswith('h')). In this particular case, I don't think pyt can solve foo.startswith('h') if foo is a source, so it should probably report an issue with a lower confidence level. In certain circumstances it may be possible to tell whether the path condition is true and compatible with other paths along the code. There's a video from University of Maryland that covers this very concisely: https://www.youtube.com/watch?v=UhY8_Wb40B0

Again sorry if I'm being captain obvious or too naive. I'm just trying to understand how pyt works and finding areas I can contribute to and I got to this open issue.

from pyt.

KevinHock commented on May 22, 2024 2

You're definitely right about everything you just said @adrianbn! Thank you for contributing to and being interested in PyT :)

It turns out that PyT currently is flow-sensitive, and I didn't update this GitHub issue, my bad! 🤐 Here is where the vulnerability finding happens, I initially thought I fixed this when I added def-use chains, in this PR, but I put a print statement after if tainted_node_in_sink_arg: and it never got executed, so it must have been fixed before that. I might have fixed some bug in the reaching definitions code a long time ago and forgot exactly what.

Repro: I re-read what I had written in this issue and thought, "That seems weird, we should kill the previous definition? Maybe it is specific to return values of functions." I went to that part of the code, and figured I'd run it on an example to sanity check.

Before changing, we run:

(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py 
1 vulnerability found:
Vulnerability 1:
File: examples/vulnerable_code/command_injection.py
 > User input at line 15, source "form[":
	 param = request.form['suggestion']
Reassigned in:
	File: examples/vulnerable_code/command_injection.py
	 > Line 16: command = 'echo ' + param + ' >> ' + 'menu.txt'
File: examples/vulnerable_code/command_injection.py
 > reaches line 18, sink "subprocess.call(":
	~call_1 = ret_subprocess.call(command, shell=True)

After changing, let's run:

(code) Kevins-MacBook-Pro:pyt kevin$ git diff
diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..8025f84 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -13,6 +13,7 @@ def index():
 @app.route('/menu', methods=['POST'])
 def menu():
     param = request.form['suggestion']
+    param = 'kill that def!'
     command = 'echo ' + param + ' >> ' + 'menu.txt'
 
     subprocess.call(command, shell=True)
(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py 
No vulnerabilities found.

So that's good, maybe this is specific to return values, let me try my example.

(code) Kevins-MacBook-Pro:pyt kevin$ git diff
diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..7745b0a 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -10,10 +10,16 @@ def index():
 
     return render_template('command_injection.html', menu=menu)
 
+def is_it_just_return_values(tainted_param):
+    ret_me = tainted_param
+    ret_me = 'no vuln'
+    return ret_me
+
 @app.route('/menu', methods=['POST'])
 def menu():
     param = request.form['suggestion']
-    command = 'echo ' + param + ' >> ' + 'menu.txt'
+    hey = is_it_just_return_values(param)
+    command = 'echo ' + hey + ' >> ' + 'menu.txt'
 
     subprocess.call(command, shell=True)
 
(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py 
No vulnerabilities found.

Okay, so that's good, I should have updated this Github issue.

Comment out the overwriting assignment:

(code) Kevins-MacBook-Pro:pyt kevin$ clear;git diff

diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..02d8ba8 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -10,10 +10,16 @@ def index():
 
     return render_template('command_injection.html', menu=menu)
 
+def is_it_just_return_values(tainted_param):
+    ret_me = tainted_param
+    # ret_me = 'no vuln'
+    return ret_me
+
 @app.route('/menu', methods=['POST'])
 def menu():
     param = request.form['suggestion']
-    command = 'echo ' + param + ' >> ' + 'menu.txt'
+    hey = is_it_just_return_values(param)
+    command = 'echo ' + hey + ' >> ' + 'menu.txt'
 
     subprocess.call(command, shell=True)

and we get

(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py 
1 vulnerability found:
Vulnerability 1:
File: examples/vulnerable_code/command_injection.py
 > User input at line 20, source "form[":
	 param = request.form['suggestion']
Reassigned in:
	File: examples/vulnerable_code/command_injection.py
	 > Line 13: save_1_param = param
	File: examples/vulnerable_code/command_injection.py
	 > Line 21: temp_1_tainted_param = param
	File: examples/vulnerable_code/command_injection.py
	 > Line 13: tainted_param = temp_1_tainted_param
	File: examples/vulnerable_code/command_injection.py
	 > Line 14: ret_me = tainted_param
	File: examples/vulnerable_code/command_injection.py
	 > Line 16: ret_is_it_just_return_values = ret_me
	File: examples/vulnerable_code/command_injection.py
	 > Line 21: ~call_1 = ret_is_it_just_return_values
	File: examples/vulnerable_code/command_injection.py
	 > Line 21: hey = ~call_1
	File: examples/vulnerable_code/command_injection.py
	 > Line 22: command = 'echo ' + hey + ' >> ' + 'menu.txt'
File: examples/vulnerable_code/command_injection.py
 > reaches line 24, sink "subprocess.call(":
	~call_2 = ret_subprocess.call(command, shell=True)

💥

from pyt.

KevinHock commented on May 22, 2024 1

We could try to turn Python code into SSA, and prune it via liveness, and add predicates so it's PSSA, and try to solve those predicates with e.g. z3str3 etc. or something similar. But handling all node types, being performant, and being battle tested (i.e. no recursion bugs or crashes), is higher priority to me, b/c I want it to be usable and run-able on tons of codebases.

from pyt.

KevinHock commented on May 22, 2024

Okay, so, I was going to suggest we do something to sort of OR all the different return values, such that if one was tainted then we would say it returned a tainted value. (Maybe this could be accomplished with e.g. ret_foo = first_ret + second_ret + third_ret.)

But, it turns out there's another issue that we would need to fix for this to even be an issue, because we currently do the ORing above even when we should not, on all variables e.g.

def maybe_return_the_arg(foo):
    whatup = foo
    whatup = 'no vuln'
    return whatup

says the return value is tainted.

Where it should only say it's tainted when e.g.

def maybe_return_the_arg(foo):
    if foo.startswith('h'): # Ignore the actual condition, we can statically SMT solve a year from now.
        return 'huh' + foo
    else:
        return 'apple'

if we decide to fix the 1st example in this comment, then I'm pretty sure we'll introduce a false-negative in the 2nd example, because it will see ret_maybe_return_the_arg = 'apple' and think the tainted value (assigned in ret_maybe_return_the_arg = 'huh' + foo was overwritten.)

Overall, finding out that the 2nd example works right now, and thinking the 1st example is contrived and people don't normally write code like that, I care a lot less about any of this. But I'm glad I doc'd it ;)

If we were going to fix the first example, it would probably be in https://github.com/python-security/pyt/blob/master/pyt/reaching_definitions_taint.py but I don't plan on working on this any time soon.

from pyt.

KevinHock commented on May 22, 2024

To do the ORing ( e.g. ret_foo = first_ret + second_ret + third_ret) you could easily do it in return_connection_handler.

from pyt.

KevinHock commented on May 22, 2024

I'm not quite sure yet how it performs taint analysis, but it looks like it assigns constraints to the nodes / qualifiers to the variables and uses reaching definitions + fixed point to solve them to get a result?

I'm not certain what constraints/qualifiers mean, but the we mostly do reaching definitions + fixed point. I wrote some documentation in this readme, but let me know if you'd like me to clarify anything :D

In this particular case, I don't think pyt can solve if foo.startswith('h'): if foo is a source, so it should probably report an issue with a lower confidence level.

We sort of kind of do that with the following:

pyt/pyt/vulnerabilities/vulnerabilities.py

Lines 431 to 432 in 4b495ad

    
           elif isinstance(cfg_node, IfNode): 
        
               potential_sanitiser = cfg_node

pyt/pyt/vulnerabilities/vulnerabilities.py

Lines 311 to 312 in 4b495ad

    
               Note: potential_sanitiser is the only hack here, it is because we do not take p-use's into account yet. 
        
               e.g. we can only say potentially instead of definitely sanitised in the path_traversal_sanitised_2.py test.

pyt/tests/vulnerabilities/vulnerabilities_test.py

Line 283 in 33dedd4

This vulnerability is potentially sanitised by: Label: if '..' in image_name:

Again sorry if I'm being captain obvious or too naive. I'm just trying to understand how pyt works and finding areas I can contribute to and I got to this open issue.

No worries! I'm glad you are interested @adrianbn 😁 One area you could contribute to, although it's rather difficult, is that we currently use inlining instead of summaries, for inter-procedural analysis, which makes PyT slower than it needs to be.

Here are some videos, specifically the last one, explains function summaries well:
#57 Call Graphs
#58 Interprocedural Data Flow Analysis
#59 Procedure Summaries in Data Flow Analysis

Feel free to ping me with any questions. My current open PR tries to address certain expressions, so we can handle BoolOp's and IfExp's etc., I think you've motivated me to work on it / clean it up 👍 (It sort of became like a big chapter in a book I need a long plane ride to finish or something.)

from pyt.

KevinHock commented on May 22, 2024

I'm gonna go ahead and close this issue 👍 b/c it was solved before.

from pyt.

(Not an issue right now) Handle multiple returns about pyt HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	elif isinstance(cfg_node, IfNode):
	potential_sanitiser = cfg_node

	Note: potential_sanitiser is the only hack here, it is because we do not take p-use's into account yet.
	e.g. we can only say potentially instead of definitely sanitised in the path_traversal_sanitised_2.py test.