pixee / codemodder-python Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 9.0 2.18 MB

Python implementation of the Codemodder framework

License: GNU Affero General Public License v3.0

Dockerfile 0.01% Python 99.92% Makefile 0.07%

codemodder-python's People

Contributors

Stargazers

Watchers

Forkers

talhazafarjutt amaanabbasi ik-dc-pxi sahilrl stephencsnow

codemodder-python's Issues

Handle SQL queries that are formatting using string format operator

We need to enhance the SQL parameterization codemod so that it can handle queries that are (improperly) formatted using the string format operator.

Codemod to replace deprecated `pkg_resources`

Setuptools now has a deprecation warning for pkg_resources:

Use of pkg_resources is deprecated in favor of importlib.resources, importlib.metadata and their backports (importlib_resources, importlib_metadata). Some useful APIs are also provided by packaging (e.g. requirements and version parsing). Users should refrain from new usage of pkg_resources and should work to port to importlib-based solutions.

We should write a codemod that switches to the new APIs.

New dependencies in `requirements.txt` should include `--hash`

Details

Using --hash when describing dependencies in requirements.txt would help to ensure that the right version is being used and validate against any potential supply chain attacks. In requirements.txt it ends up looking something like this:

SomeLibrary==1.2.3 --hash=sha256:abcdef123456...

We should make sure to add the --hash value when updating requirements.txt files. This means that it will be necessary to encode the sha256 from PyPI in the dependency object itself. We should definitely do this for our security package and we should probably do it for defusedxml as well.

As far as I can tell, using --hash is not valid with our other dependency locations such as setup.py and pyproject.toml, but it bears a bit of further investigation.

This idea was originally proposed in pixee/python-security#10

Configure SonarCloud to ignore our test code

Sonar gets pretty upset about some of our test code. Probably because many of these tests have examples that are deliberately intended to trigger Sonar rules 😅

We need to make sure that test code is ignored by Sonar. This includes

tests/samples
integration_tests
Maybe others?

Sonar: fix "JWT should be signed and verified" with `jwt-decode-verify`

We should be able to use the jwt-decode-verify codemod to remediate this sonar rule.

https://rules.sonarsource.com/python/RSPEC-5659/

Add GitHub action to upload Sonar results to Pixee platform

This will enable the @pixeebot integration to provide hardening suggestions for Sonar rules. This is a good way for us to dogfood our own Sonar codemods.

We will likely need some guidance from @ryandens with defining the action.

IGNORE: Test Webhook

Remediate potential XPath injection

Based on this Sonar rule: https://rules.sonarsource.com/python/RSPEC-2091/

It would be useful to see whether we can generalize the existing SQL parameterization codemod to see how much of it could be reused to implement this.

Fix `dataclass` defaults

See this StackOverflow issue for a description of the problem: https://stackoverflow.com/questions/53632152/why-cant-dataclasses-have-mutable-defaults-in-their-class-attributes-declaratio

It would be useful to have a codemod that could fix this proactively. Even though it results in a runtime error, if it lives on a codepath that is not thoroughly tested, it might get missed.

`subprocess-shell-false` should honor `# noqa: S602`

Description

There are some valid use cases for shell=True in subprocess calls. In such cases, a developer may annotate this code with # noqa: S602 in order to indicate to bandit and other tools that this code should not be considered insecure.

We should also honor this annotation when processing this codemod and we should not make changes to lines where this annotation is present.

Implementation

We need to generalize the existing annotation detector in remove-unused-imports so that it can be used in multiple codemods
Make this particular codemod sensitive to the aforementioned annotation
Add some good unit tests

Resources:

https://docs.astral.sh/ruff/rules/subprocess-popen-with-shell-equals-true/

Handle more cases for `combine-startswith-endswith`

It was recently reported that the combine-startswith-endswith codemod would be more effective if it handled some additional cases, including:

Cases where more than two calls are involved:

if x.startswith("foo") or x.startswith("bar") or x.startswith("baz"): ...

Cases where one call already uses a tuple:
if x.startswith(("foo", "bar")) or x.startswith("baz"): ...

* Cases where more than one call uses a tuple:
if x.startswith(("foo", "bar")) or x.startswith(("baz", "biz")): ...

We should add test cases and handle these cases where possible.

Handle unused assignments in `use-walrus-if` codemod

Details

In some cases the use-walrus-if codemod makes a change such that the involved variable is no longer used at all. In other words, the original conditional is the only place where the result of the assignment was used.

We can use the scope provider and parent metadata in libCST to determine which such cases occur. In these cases we should not rewrite with the walrus operator at all but simply use the RHS of the original assign expression as the conditional value without any assignment at all.

We can actually see an example where the current behavior causes our own pipelines to fail in #258.

`fix-mutable-params` results in `AttributeError`

Description

The fix-mutable-params codemod appears to result in an AttributeError in certain circumstances. This was originally detected while processing this open source project: https://github.com/zulip/zulip

After checking out use the following command to reproduce:

codemodder /PATH/TO/zulip --codemod-include=fix-mutable-params

It will be necessary to come up with a unit test that minimally reproduces the problem and then fix it.

Traceback

 File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/matchers/_visitors.py", line 515, in on_leave
    retval = CSTTransformer.on_leave(self, original_node, updated_node)
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_visitors.py", line 71, in on_leave
    updated_node = leave_func(original_node, updated_node)
  File "/Users/danieldavella/codemodder-python/src/core_codemods/fix_mutable_params.py", line 180, in leave_FunctionDef
    body=updated_node.body.with_changes(body=new_body),
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/base.py", line 325, in with_changes
    return replace(self, **changes)
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/dataclasses.py", line 1453, in replace
    return obj.__class__(**changes)
  File "<string>", line 6, in __init__
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/base.py", line 117, in __post_init__
    self._validate()
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/statement.py", line 390, in _validate
    if small_stmt.semicolon is None:
AttributeError: 'SimpleStatementLine' object has no attribute 'semicolon'

`enable-jinja2-autoescape` should not change if `autoescape=select_autoescape()`

When investigating the jinja2 autoescape enabled codemod, we discovered some confusion around jinja2 docs and code. We opened a bug report for them pallets/jinja#1890

However, we don't have to wait until that is resolved. We should still not make this change

-env = Environment(autoescape=select_autoescape())
+env = Environment(autoescape=True)

We should at the very least fix this particular case, and if possible, fix any autoescape=callable_that_returns_True()

Codemod to add a `str` method to a Django model that didn't define it

Django users write lots of models and may forget that not defining a __str__ will make the model really hard to read in admin tools.
It would be good to change this model

from django.db import models

class User(models.Model):
    name = models.CharField(max_length=100)

   ....

To include:

    def __str__(self):
        return f'{self.name}'

The challenges I see to this is to decide what should __str__ return? Which fields should we include? Is something like this acceptable?

    def __str__(self):
        return f'{field for field in all_defined_fields}'

Codemod to remove `break` or `continue` outside of a loop

break or continue outside of a loop will result in a SyntaxError (but it's still parsable by libcst). Many linters have a rule to check for this. In most cases the correct thing is to remove these keywords, but there may be some cases where the user intended to put the keyword elsewhere.

examples

https://stackoverflow.com/questions/2462566/python-break-outside-loop

narg=len(sys.argv)
print "@length arg= ", narg
if narg == 1:
        print "@Usage: input_filename nelements nintervals"
        break

author probably meant to use something like sys.exit but removing the break is the right thing here

https://pylint.readthedocs.io/en/stable/user_guide/messages/error/not-in-loop.html

def print_even_numbers():
    for i in range(100):
        if i % 2 == 0:
            print(i)
    else:
        continue  # [not-in-loop]

intent is really unclear here, did the author mean to put continue under the if statement? In this case, if we just remove the continue, it will break the code. We should not do anything or remove the entire else clause

🧚🤖 Pixeebot Activity Dashboard

👋 This dashboard summarizes my activity on the repository, including available improvement opportunities.

Recommendations

Last analysis: Feb 22 | Next scheduled analysis: Feb 27

Open

#299

Available

✅ Nothing yet, but I'm continuing to monitor your PRs.

Completed

✅ You merged improvements I recommended View
✅ I also hardened PRs for you View

Metrics

What would you like to see here? Let us know!

Resources

📚 Quick links
Pixee Docs | Codemodder by Pixee

🧰 Tools I work with
Sonar, CodeQL, Semgrep

🚀 Pixee CLI
The power of my codemods in your local development environment. Learn more

💬 Reach out
Feedback | Support

❤️ Follow, share, and engage with Pixee: GitHub | LinkedIn | Slack

Should `harden-pyyaml` respect use of `FullLoader`?

I've noticed a few changes recently where code is explicitly using loader=yaml.FullLoader in yaml.load.

What I'm wondering is whether we should respect such explicit usage of FullLoader and leave it as-is? The current behavior of the codemod is to replace this with SafeLoader, which is obviously safer. However it seems to me that there are potentially valid uses of FullLoader as well, and if it is being used explicitly, we should respect it.

The PyYaml documentation has this to say:

Loads the full YAML language. Avoids arbitrary code execution. This is currently (PyYAML 5.1+) the default loader called by yaml.load(input) (after issuing the warning).

It also says:

WARNING: As of pyyaml-5.3.1 there are still trivial exploits. Do not use this on untrusted data for now.

So clearly this shouldn't be used on completely untrusted data. However it seems like there are also valid use cases. FullLoader is still much more restrictive than UnsafeLoader in terms of preventing arbitrary code execution.

TL;DR: I am suggesting that we may want to consider leaving FullLoader as-is when it is explicitly used.

Codemodder include and exclude flags should understand patterns

We need --codemod-include and --codemod-exclude to be able to include/exclude all codemods from a particular origin. For example, it would be useful to give an option like this:

--codemod-include=sonar:*

which could ensure all Sonar codemods are included.

Conversely, it would be useful to give an option like this:

---codemod-exclude=pixee:*

which would exclude all core Pixee codemods.

I think the pattern matching only needs to be supported at the level of origin. We don't need to support globs/patterns for specific names like pixee:python/xml-*.

Tests should run prior to release

Currently our release action has no dependency on the tests completing successfully. We should make sure that the release action runs the tests first in order to prevent us from inadvertently releasing broken code.

Codemod to fix string concat inside lists

Consider this list

bad = [
"hello"
"world"
"my",
"name"
]

It's really unlikely that I wanted to ignore the comma between the first and second element. A codemod that would instead remove str concat inside lists and make the concat separate elements would be what I would want. So:

good = [
"hello",
"world"
"my",
"name"
]

delete me

@pixeebot next

Sonar: `django-model-without-dunder-str`

#302 Added django-model-without-dunder-str codemod. there is a corresponding sonar rule for this.
First get an example of what sonar_issues.json would look like with this rule as a finding. This will give us a sense of the start/end position sonar returns for this rule

IGNORE: Testing integration

Ignore closed/fixed issues in Sonar security JSON results

It looks like Sonar security JSON files may possibly include closed/fixed issues in the results. We should filter the list of issues to make sure that we do not include closed/fixed issues when applying codemods.

The relevant issue fields appear to be resolution and status:

"issues": [
    {
      // ...more fields here...
      "resolution": "FIXED",
      "status": "CLOSED",
      // ...more fields here...
    }
]

Let's make sure to ignore issues where resolution == "FIXED" or status == "CLOSED". We may need to account for other values in the future.

Make test utilities part of the published package API

Currently classes such as BaseCodemodTest and BaseIntegrationTest are defined within our tests directory which is not part of the published package.

We want to enable external developers to create custom codemod plugins, which means that we should make these test utilities part of the published API.

This means moving these class definitions (and any dependencies) out of the tests directory (which should never be part of the package distribution) and into some kind of utilities module. My initial thought is that they should go somewhere like src/codemodder/codemods/test_utilities/ although maybe there are better options.

This will enable external packages to import things like BaseCodemodTest and use them within their own packages. We should do our best to remove any hard-coded assumptions in these base classes about directory paths, etc.

Requirements.txt parser needs to handle lines with trailing comments

It is valid for for a requirements.txt file to have lines with trailing comments:

foo>=1.2.3   # very informative comment

Right now we properly filter for lines that begin with comments but we also need to strip any trailing comments before parsing the requirement. This logic occurs here:

codemodder-python/src/codemodder/project_analysis/file_parsers/requirements_txt_file_parser.py

Line 31 in 2cdc91c

dependencies = set(line.strip() for line in lines if not line.startswith("#"))

The logic above properly strips whitespace and filters lines that begin with comments but does not properly account for trailing comments.

We need to:

Add a good test case
Update the parser to handle these kinds of lines

Add missing `self` or `cls` parameter to method

This is based on a Sonar rule: https://rules.sonarsource.com/python/RSPEC-5719/

We should implement this as a find and fix codemod but also as a Sonar codemod.

Codemod for deprecated `@abstractclassmethod`

We currently have a codemod to replace deprecated @abstractproperty. Ideally we should add deprecated @abstractclassmethod to this as well, although it won't be reflected by the current name of the codemod.

Sonar: fix "XML parsers should not be vulnerable to XXE attacks" with `use-defusedxml` codemod

We should be able to use the existing use-defusedxml codemod to fix this issue. This will require separating the transformer implementation from the existing codemod so that it can be used by the new Sonar codemod.

Rule description: https://rules.sonarsource.com/python/RSPEC-2755/

Do not apply `sandbox-process-creation` when arguments are literals

Description

We may encounter cases where all of the command arguments to a subprocess call are hard-coded literals. In these cases hardening does not really make any sense and should be skipped.

A simple implementation would be to detect whether the immediate argument is a literal list that contains only literal strings. However a more sophisticated implementation would involve some constant propagation. As part of this issue we should explore the level of effort involved in the latter approach. If it's too difficult, implement the simpler approach for now and open another ticket.

This issue was initially described in pixee/python-security#10

Codemod to replace use of `hasattr(obj, "call") with `callable`

Calling hasattr(some_obj, "__call__") can yield different results than callable(obj) in some cases, such as when __getattr__ is defined.

Same behavior:

>>> class One:
...     pass
...
>>> one = One()
>>> callable(one)
False
>>> hasattr(one, "__call__")
False

Differing behavior:

>>> class Two:
...      def __getattr__(self, name):
...          pass
...
>>> two = Two()
>>> callable(two)
False
>>> hasattr(two, "__call__")
True

Use `ruff` instead of `pylint`

We should be using ruff since it's much faster and supports the same rules: https://github.com/astral-sh/ruff

Add requests timeout codemod is adding timeout kwarg to other methods

Example of Issue

The following code is part of a pull request made by pixie to a setup script in one of my projects after a correctly triggering an AddRequestsTimeouts Codemod.

versions = requests.get(versions_url, timeout=60).json(timeout=60)

This clearly would result in a TypeError since "timeout" is not a valid kwarg for JSONDecoder.__init__.

Potential Cause

I assume that this issue is caused by the OpenAI function call outputs that are used to create code edits not being properly linted on the backend before the resulting code is included in a pull request.

Potential Solution

A possible solution could be adding a valid function and argument parsing class to file_parsers that gathers the valid function names, argument names and types by checking the args in function definitions, typeing info and calls to kwargs.get/pop in each function for each file in the repo.

Then each function call in the AI written code could be linted for valid arguments+types and an error could be raised that triggers a subsequent OpenAI call requesting a revision and providing the valid args+types to the model to insure code is valid before including it a pull request.

As a backup in cases where multiple revisions produce invalid code, the valid args for functions in a given code snippet could be specified as an enum in the OpenAI function/tool definitions used to edit code.

Alternatively an Assistants API instance with the code_interpreter tool enabled could be prompted to test code snippets to prevent this issue but this would of course increase costs.

Good luck with this project absolutely awesome idea!

pixee / codemodder-python Goto Github PK

codemodder-python's People

Contributors

Stargazers

Watchers

Forkers

codemodder-python's Issues

Details

Description

Implementation

Resources:

Details

Description

Traceback

examples

Recommendations

Open

Available

Completed

Metrics

Resources

Description

Example of Issue

Potential Cause

Potential Solution

Recommend Projects

Recommend Topics

Recommend Org