pixee / codemodder-python Goto Github PK
View Code? Open in Web Editor NEWPython implementation of the Codemodder framework
License: GNU Affero General Public License v3.0
Python implementation of the Codemodder framework
License: GNU Affero General Public License v3.0
We need to enhance the SQL parameterization codemod so that it can handle queries that are (improperly) formatted using the string format operator.
Setuptools now has a deprecation warning for pkg_resources
:
Use of pkg_resources is deprecated in favor of importlib.resources, importlib.metadata and their backports (importlib_resources, importlib_metadata). Some useful APIs are also provided by packaging (e.g. requirements and version parsing). Users should refrain from new usage of pkg_resources and should work to port to importlib-based solutions.
We should write a codemod that switches to the new APIs.
Using --hash
when describing dependencies in requirements.txt
would help to ensure that the right version is being used and validate against any potential supply chain attacks. In requirements.txt
it ends up looking something like this:
SomeLibrary==1.2.3 --hash=sha256:abcdef123456...
We should make sure to add the --hash
value when updating requirements.txt
files. This means that it will be necessary to encode the sha256 from PyPI in the dependency object itself. We should definitely do this for our security package and we should probably do it for defusedxml
as well.
As far as I can tell, using --hash
is not valid with our other dependency locations such as setup.py
and pyproject.toml
, but it bears a bit of further investigation.
This idea was originally proposed in pixee/python-security#10
Sonar gets pretty upset about some of our test code. Probably because many of these tests have examples that are deliberately intended to trigger Sonar rules ๐
We need to make sure that test code is ignored by Sonar. This includes
tests/samples
integration_tests
We should be able to use the jwt-decode-verify
codemod to remediate this sonar rule.
Based on this Sonar rule: https://rules.sonarsource.com/python/RSPEC-2091/
It would be useful to see whether we can generalize the existing SQL parameterization codemod to see how much of it could be reused to implement this.
See this StackOverflow issue for a description of the problem: https://stackoverflow.com/questions/53632152/why-cant-dataclasses-have-mutable-defaults-in-their-class-attributes-declaratio
It would be useful to have a codemod that could fix this proactively. Even though it results in a runtime error, if it lives on a codepath that is not thoroughly tested, it might get missed.
There are some valid use cases for shell=True
in subprocess calls. In such cases, a developer may annotate this code with # noqa: S602
in order to indicate to bandit
and other tools that this code should not be considered insecure.
We should also honor this annotation when processing this codemod and we should not make changes to lines where this annotation is present.
remove-unused-imports
so that it can be used in multiple codemodshttps://docs.astral.sh/ruff/rules/subprocess-popen-with-shell-equals-true/
It was recently reported that the combine-startswith-endswith
codemod would be more effective if it handled some additional cases, including:
if x.startswith("foo") or x.startswith("bar") or x.startswith("baz"): ...
* Cases where more than one call uses a tuple:
if x.startswith(("foo", "bar")) or x.startswith(("baz", "biz")): ...
We should add test cases and handle these cases where possible.
In some cases the use-walrus-if
codemod makes a change such that the involved variable is no longer used at all. In other words, the original conditional is the only place where the result of the assignment was used.
We can use the scope provider and parent metadata in libCST to determine which such cases occur. In these cases we should not rewrite with the walrus operator at all but simply use the RHS of the original assign expression as the conditional value without any assignment at all.
We can actually see an example where the current behavior causes our own pipelines to fail in #258.
The fix-mutable-params
codemod appears to result in an AttributeError
in certain circumstances. This was originally detected while processing this open source project: https://github.com/zulip/zulip
After checking out use the following command to reproduce:
codemodder /PATH/TO/zulip --codemod-include=fix-mutable-params
It will be necessary to come up with a unit test that minimally reproduces the problem and then fix it.
File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/matchers/_visitors.py", line 515, in on_leave
retval = CSTTransformer.on_leave(self, original_node, updated_node)
File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_visitors.py", line 71, in on_leave
updated_node = leave_func(original_node, updated_node)
File "/Users/danieldavella/codemodder-python/src/core_codemods/fix_mutable_params.py", line 180, in leave_FunctionDef
body=updated_node.body.with_changes(body=new_body),
File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/base.py", line 325, in with_changes
return replace(self, **changes)
File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/dataclasses.py", line 1453, in replace
return obj.__class__(**changes)
File "<string>", line 6, in __init__
File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/base.py", line 117, in __post_init__
self._validate()
File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/statement.py", line 390, in _validate
if small_stmt.semicolon is None:
AttributeError: 'SimpleStatementLine' object has no attribute 'semicolon'
When investigating the jinja2 autoescape enabled codemod, we discovered some confusion around jinja2 docs and code. We opened a bug report for them pallets/jinja#1890
However, we don't have to wait until that is resolved. We should still not make this change
-env = Environment(autoescape=select_autoescape())
+env = Environment(autoescape=True)
We should at the very least fix this particular case, and if possible, fix any autoescape=callable_that_returns_True()
Django users write lots of models and may forget that not defining a __str__
will make the model really hard to read in admin tools.
It would be good to change this model
from django.db import models
class User(models.Model):
name = models.CharField(max_length=100)
....
To include:
def __str__(self):
return f'{self.name}'
The challenges I see to this is to decide what should __str__
return? Which fields should we include? Is something like this acceptable?
def __str__(self):
return f'{field for field in all_defined_fields}'
break
or continue
outside of a loop will result in a SyntaxError
(but it's still parsable by libcst). Many linters have a rule to check for this. In most cases the correct thing is to remove these keywords, but there may be some cases where the user intended to put the keyword elsewhere.
narg=len(sys.argv)
print "@length arg= ", narg
if narg == 1:
print "@Usage: input_filename nelements nintervals"
break
author probably meant to use something like sys.exit
but removing the break
is the right thing here
def print_even_numbers():
for i in range(100):
if i % 2 == 0:
print(i)
else:
continue # [not-in-loop]
intent is really unclear here, did the author mean to put continue
under the if statement? In this case, if we just remove the continue
, it will break the code. We should not do anything or remove the entire else
clause
Last analysis: Feb 22 | Next scheduled analysis: Feb 27
โ Nothing yet, but I'm continuing to monitor your PRs.
โ
You merged improvements I recommended View
โ
I also hardened PRs for you View
What would you like to see here? Let us know!
๐ Quick links
Pixee Docs | Codemodder by Pixee
๐งฐ Tools I work with
Sonar, CodeQL, Semgrep
๐ Pixee CLI
The power of my codemods in your local development environment. Learn more
๐ฌ Reach out
Feedback | Support
โค๏ธ Follow, share, and engage with Pixee: GitHub | LinkedIn | Slack
I've noticed a few changes recently where code is explicitly using loader=yaml.FullLoader
in yaml.load
.
What I'm wondering is whether we should respect such explicit usage of FullLoader
and leave it as-is? The current behavior of the codemod is to replace this with SafeLoader
, which is obviously safer. However it seems to me that there are potentially valid uses of FullLoader
as well, and if it is being used explicitly, we should respect it.
The PyYaml documentation has this to say:
Loads the full YAML language. Avoids arbitrary code execution. This is currently (PyYAML 5.1+) the default loader called by yaml.load(input) (after issuing the warning).
It also says:
WARNING: As of pyyaml-5.3.1 there are still trivial exploits. Do not use this on untrusted data for now.
So clearly this shouldn't be used on completely untrusted data. However it seems like there are also valid use cases. FullLoader
is still much more restrictive than UnsafeLoader
in terms of preventing arbitrary code execution.
TL;DR: I am suggesting that we may want to consider leaving FullLoader
as-is when it is explicitly used.
We need --codemod-include
and --codemod-exclude
to be able to include/exclude all codemods from a particular origin. For example, it would be useful to give an option like this:
--codemod-include=sonar:*
which could ensure all Sonar codemods are included.
Conversely, it would be useful to give an option like this:
---codemod-exclude=pixee:*
which would exclude all core Pixee codemods.
I think the pattern matching only needs to be supported at the level of origin. We don't need to support globs/patterns for specific names like pixee:python/xml-*
.
Currently our release action has no dependency on the tests completing successfully. We should make sure that the release action runs the tests first in order to prevent us from inadvertently releasing broken code.
Consider this list
bad = [
"hello"
"world"
"my",
"name"
]
It's really unlikely that I wanted to ignore the comma between the first and second element. A codemod that would instead remove str concat inside lists and make the concat separate elements would be what I would want. So:
good = [
"hello",
"world"
"my",
"name"
]
@pixeebot next
#302 Added django-model-without-dunder-str
codemod. there is a corresponding sonar rule for this.
First get an example of what sonar_issues.json would look like with this rule as a finding. This will give us a sense of the start/end position sonar returns for this rule
It looks like Sonar security JSON files may possibly include closed/fixed issues in the results. We should filter the list of issues to make sure that we do not include closed/fixed issues when applying codemods.
The relevant issue fields appear to be resolution
and status
:
"issues": [
{
// ...more fields here...
"resolution": "FIXED",
"status": "CLOSED",
// ...more fields here...
}
]
Let's make sure to ignore issues where resolution == "FIXED"
or status == "CLOSED"
. We may need to account for other values in the future.
Currently classes such as BaseCodemodTest
and BaseIntegrationTest
are defined within our tests
directory which is not part of the published package.
We want to enable external developers to create custom codemod plugins, which means that we should make these test utilities part of the published API.
This means moving these class definitions (and any dependencies) out of the tests
directory (which should never be part of the package distribution) and into some kind of utilities module. My initial thought is that they should go somewhere like src/codemodder/codemods/test_utilities/
although maybe there are better options.
This will enable external packages to import things like BaseCodemodTest
and use them within their own packages. We should do our best to remove any hard-coded assumptions in these base classes about directory paths, etc.
It is valid for for a requirements.txt
file to have lines with trailing comments:
foo>=1.2.3 # very informative comment
Right now we properly filter for lines that begin with comments but we also need to strip any trailing comments before parsing the requirement. This logic occurs here:
The logic above properly strips whitespace and filters lines that begin with comments but does not properly account for trailing comments.
We need to:
This is based on a Sonar rule: https://rules.sonarsource.com/python/RSPEC-5719/
We should implement this as a find and fix codemod but also as a Sonar codemod.
We currently have a codemod to replace deprecated @abstractproperty
. Ideally we should add deprecated @abstractclassmethod
to this as well, although it won't be reflected by the current name of the codemod.
We should be able to use the existing use-defusedxml
codemod to fix this issue. This will require separating the transformer implementation from the existing codemod so that it can be used by the new Sonar codemod.
Rule description: https://rules.sonarsource.com/python/RSPEC-2755/
We may encounter cases where all of the command arguments to a subprocess
call are hard-coded literals. In these cases hardening does not really make any sense and should be skipped.
A simple implementation would be to detect whether the immediate argument is a literal list that contains only literal strings. However a more sophisticated implementation would involve some constant propagation. As part of this issue we should explore the level of effort involved in the latter approach. If it's too difficult, implement the simpler approach for now and open another ticket.
This issue was initially described in pixee/python-security#10
Calling hasattr(some_obj, "__call__")
can yield different results than callable(obj)
in some cases, such as when __getattr__
is defined.
Same behavior:
>>> class One:
... pass
...
>>> one = One()
>>> callable(one)
False
>>> hasattr(one, "__call__")
False
Differing behavior:
>>> class Two:
... def __getattr__(self, name):
... pass
...
>>> two = Two()
>>> callable(two)
False
>>> hasattr(two, "__call__")
True
We should be using ruff
since it's much faster and supports the same rules: https://github.com/astral-sh/ruff
The following code is part of a pull request made by pixie to a setup script in one of my projects after a correctly triggering an AddRequestsTimeouts Codemod.
versions = requests.get(versions_url, timeout=60).json(timeout=60)
This clearly would result in a TypeError
since "timeout"
is not a valid kwarg for JSONDecoder.__init__
.
I assume that this issue is caused by the OpenAI function call outputs that are used to create code edits not being properly linted on the backend before the resulting code is included in a pull request.
A possible solution could be adding a valid function and argument parsing class to file_parsers that gathers the valid function names, argument names and types by checking the args in function definitions, typeing info and calls to kwargs.get/pop in each function for each file in the repo.
Then each function call in the AI written code could be linted for valid arguments+types and an error could be raised that triggers a subsequent OpenAI call requesting a revision and providing the valid args+types to the model to insure code is valid before including it a pull request.
As a backup in cases where multiple revisions produce invalid code, the valid args for functions in a given code snippet could be specified as an enum in the OpenAI function/tool definitions used to edit code.
Alternatively an Assistants API instance with the code_interpreter tool enabled could be prompted to test code snippets to prevent this issue but this would of course increase costs.
Good luck with this project absolutely awesome idea!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.