Code Monkey home page Code Monkey logo

codemodder-python's People

Contributors

andrecsilva avatar clavedeluna avatar dependabot[bot] avatar drdavella avatar nahsra avatar pixeebot avatar pixeebot[bot] avatar ryandens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

codemodder-python's Issues

Codemod to replace deprecated `pkg_resources`

Setuptools now has a deprecation warning for pkg_resources:

Use of pkg_resources is deprecated in favor of importlib.resources, importlib.metadata and their backports (importlib_resources, importlib_metadata). Some useful APIs are also provided by packaging (e.g. requirements and version parsing). Users should refrain from new usage of pkg_resources and should work to port to importlib-based solutions.

We should write a codemod that switches to the new APIs.

New dependencies in `requirements.txt` should include `--hash`

Details

Using --hash when describing dependencies in requirements.txt would help to ensure that the right version is being used and validate against any potential supply chain attacks. In requirements.txt it ends up looking something like this:

SomeLibrary==1.2.3 --hash=sha256:abcdef123456...

We should make sure to add the --hash value when updating requirements.txt files. This means that it will be necessary to encode the sha256 from PyPI in the dependency object itself. We should definitely do this for our security package and we should probably do it for defusedxml as well.

As far as I can tell, using --hash is not valid with our other dependency locations such as setup.py and pyproject.toml, but it bears a bit of further investigation.

This idea was originally proposed in pixee/python-security#10

Configure SonarCloud to ignore our test code

Sonar gets pretty upset about some of our test code. Probably because many of these tests have examples that are deliberately intended to trigger Sonar rules ๐Ÿ˜…

We need to make sure that test code is ignored by Sonar. This includes

  • tests/samples
  • integration_tests
  • Maybe others?

`subprocess-shell-false` should honor `# noqa: S602`

Description

There are some valid use cases for shell=True in subprocess calls. In such cases, a developer may annotate this code with # noqa: S602 in order to indicate to bandit and other tools that this code should not be considered insecure.

We should also honor this annotation when processing this codemod and we should not make changes to lines where this annotation is present.

Implementation

  • We need to generalize the existing annotation detector in remove-unused-imports so that it can be used in multiple codemods
  • Make this particular codemod sensitive to the aforementioned annotation
  • Add some good unit tests

Resources:

https://docs.astral.sh/ruff/rules/subprocess-popen-with-shell-equals-true/

Handle more cases for `combine-startswith-endswith`

It was recently reported that the combine-startswith-endswith codemod would be more effective if it handled some additional cases, including:

  • Cases where more than two calls are involved:
if x.startswith("foo") or x.startswith("bar") or x.startswith("baz"): ...
  • Cases where one call already uses a tuple:
    if x.startswith(("foo", "bar")) or x.startswith("baz"): ...
* Cases where more than one call uses a tuple:
if x.startswith(("foo", "bar")) or x.startswith(("baz", "biz")): ...

We should add test cases and handle these cases where possible.

Handle unused assignments in `use-walrus-if` codemod

Details

In some cases the use-walrus-if codemod makes a change such that the involved variable is no longer used at all. In other words, the original conditional is the only place where the result of the assignment was used.

We can use the scope provider and parent metadata in libCST to determine which such cases occur. In these cases we should not rewrite with the walrus operator at all but simply use the RHS of the original assign expression as the conditional value without any assignment at all.

We can actually see an example where the current behavior causes our own pipelines to fail in #258.

`fix-mutable-params` results in `AttributeError`

Description

The fix-mutable-params codemod appears to result in an AttributeError in certain circumstances. This was originally detected while processing this open source project: https://github.com/zulip/zulip

After checking out use the following command to reproduce:

codemodder /PATH/TO/zulip --codemod-include=fix-mutable-params

It will be necessary to come up with a unit test that minimally reproduces the problem and then fix it.

Traceback

 File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/matchers/_visitors.py", line 515, in on_leave
    retval = CSTTransformer.on_leave(self, original_node, updated_node)
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_visitors.py", line 71, in on_leave
    updated_node = leave_func(original_node, updated_node)
  File "/Users/danieldavella/codemodder-python/src/core_codemods/fix_mutable_params.py", line 180, in leave_FunctionDef
    body=updated_node.body.with_changes(body=new_body),
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/base.py", line 325, in with_changes
    return replace(self, **changes)
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/dataclasses.py", line 1453, in replace
    return obj.__class__(**changes)
  File "<string>", line 6, in __init__
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/base.py", line 117, in __post_init__
    self._validate()
  File "/Users/danieldavella/miniconda3/envs/test-python-codemod-package/lib/python3.10/site-packages/libcst/_nodes/statement.py", line 390, in _validate
    if small_stmt.semicolon is None:
AttributeError: 'SimpleStatementLine' object has no attribute 'semicolon'

`enable-jinja2-autoescape` should not change if `autoescape=select_autoescape()`

When investigating the jinja2 autoescape enabled codemod, we discovered some confusion around jinja2 docs and code. We opened a bug report for them pallets/jinja#1890

However, we don't have to wait until that is resolved. We should still not make this change

-env = Environment(autoescape=select_autoescape())
+env = Environment(autoescape=True)

We should at the very least fix this particular case, and if possible, fix any autoescape=callable_that_returns_True()

Codemod to add a `__str__` method to a Django model that didn't define it

Django users write lots of models and may forget that not defining a __str__ will make the model really hard to read in admin tools.
It would be good to change this model

from django.db import models

class User(models.Model):
    name = models.CharField(max_length=100)

   .... 

To include:

    def __str__(self):
        return f'{self.name}'

The challenges I see to this is to decide what should __str__ return? Which fields should we include? Is something like this acceptable?

    def __str__(self):
        return f'{field for field in all_defined_fields}'

Codemod to remove `break` or `continue` outside of a loop

break or continue outside of a loop will result in a SyntaxError (but it's still parsable by libcst). Many linters have a rule to check for this. In most cases the correct thing is to remove these keywords, but there may be some cases where the user intended to put the keyword elsewhere.

examples

narg=len(sys.argv)
print "@length arg= ", narg
if narg == 1:
        print "@Usage: input_filename nelements nintervals"
        break

author probably meant to use something like sys.exit but removing the break is the right thing here

def print_even_numbers():
    for i in range(100):
        if i % 2 == 0:
            print(i)
    else:
        continue  # [not-in-loop]

intent is really unclear here, did the author mean to put continue under the if statement? In this case, if we just remove the continue, it will break the code. We should not do anything or remove the entire else clause

๐Ÿงš๐Ÿค– Pixeebot Activity Dashboard

DashList

๐Ÿ‘‹ This dashboard summarizes my activity on the repository, including available improvement opportunities.

Recommendations

Last analysis: Feb 22 | Next scheduled analysis: Feb 27

Open

Available

โœ… Nothing yet, but I'm continuing to monitor your PRs.

Completed

โœ… You merged improvements I recommended View
โœ… I also hardened PRs for you View

Metrics

What would you like to see here? Let us know!

Resources

๐Ÿ“š Quick links
Pixee Docs | Codemodder by Pixee

๐Ÿงฐ Tools I work with
Sonar, CodeQL, Semgrep

๐Ÿš€ Pixee CLI
The power of my codemods in your local development environment. Learn more

๐Ÿ’ฌ Reach out
Feedback | Support


โค๏ธ Follow, share, and engage with Pixee: GitHub | LinkedIn | Slack

Should `harden-pyyaml` respect use of `FullLoader`?

I've noticed a few changes recently where code is explicitly using loader=yaml.FullLoader in yaml.load.

What I'm wondering is whether we should respect such explicit usage of FullLoader and leave it as-is? The current behavior of the codemod is to replace this with SafeLoader, which is obviously safer. However it seems to me that there are potentially valid uses of FullLoader as well, and if it is being used explicitly, we should respect it.

The PyYaml documentation has this to say:

Loads the full YAML language. Avoids arbitrary code execution. This is currently (PyYAML 5.1+) the default loader called by yaml.load(input) (after issuing the warning).

It also says:

WARNING: As of pyyaml-5.3.1 there are still trivial exploits. Do not use this on untrusted data for now.

So clearly this shouldn't be used on completely untrusted data. However it seems like there are also valid use cases. FullLoader is still much more restrictive than UnsafeLoader in terms of preventing arbitrary code execution.

TL;DR: I am suggesting that we may want to consider leaving FullLoader as-is when it is explicitly used.

Codemodder include and exclude flags should understand patterns

We need --codemod-include and --codemod-exclude to be able to include/exclude all codemods from a particular origin. For example, it would be useful to give an option like this:

--codemod-include=sonar:*

which could ensure all Sonar codemods are included.

Conversely, it would be useful to give an option like this:

---codemod-exclude=pixee:*

which would exclude all core Pixee codemods.

I think the pattern matching only needs to be supported at the level of origin. We don't need to support globs/patterns for specific names like pixee:python/xml-*.

Tests should run prior to release

Currently our release action has no dependency on the tests completing successfully. We should make sure that the release action runs the tests first in order to prevent us from inadvertently releasing broken code.

Codemod to fix string concat inside lists

Consider this list

bad = [
"hello"
"world"
"my",
"name"
]

It's really unlikely that I wanted to ignore the comma between the first and second element. A codemod that would instead remove str concat inside lists and make the concat separate elements would be what I would want. So:

good = [
"hello",
"world"
"my",
"name"
]

Sonar: `django-model-without-dunder-str`

#302 Added django-model-without-dunder-str codemod. there is a corresponding sonar rule for this.
First get an example of what sonar_issues.json would look like with this rule as a finding. This will give us a sense of the start/end position sonar returns for this rule

Ignore closed/fixed issues in Sonar security JSON results

It looks like Sonar security JSON files may possibly include closed/fixed issues in the results. We should filter the list of issues to make sure that we do not include closed/fixed issues when applying codemods.

The relevant issue fields appear to be resolution and status:

"issues": [
    {
      // ...more fields here...
      "resolution": "FIXED",
      "status": "CLOSED",
      // ...more fields here...
    }
]

Let's make sure to ignore issues where resolution == "FIXED" or status == "CLOSED". We may need to account for other values in the future.

Make test utilities part of the published package API

Currently classes such as BaseCodemodTest and BaseIntegrationTest are defined within our tests directory which is not part of the published package.

We want to enable external developers to create custom codemod plugins, which means that we should make these test utilities part of the published API.

This means moving these class definitions (and any dependencies) out of the tests directory (which should never be part of the package distribution) and into some kind of utilities module. My initial thought is that they should go somewhere like src/codemodder/codemods/test_utilities/ although maybe there are better options.

This will enable external packages to import things like BaseCodemodTest and use them within their own packages. We should do our best to remove any hard-coded assumptions in these base classes about directory paths, etc.

Requirements.txt parser needs to handle lines with trailing comments

It is valid for for a requirements.txt file to have lines with trailing comments:

foo>=1.2.3   # very informative comment

Right now we properly filter for lines that begin with comments but we also need to strip any trailing comments before parsing the requirement. This logic occurs here:

dependencies = set(line.strip() for line in lines if not line.startswith("#"))

The logic above properly strips whitespace and filters lines that begin with comments but does not properly account for trailing comments.

We need to:

  1. Add a good test case
  2. Update the parser to handle these kinds of lines

Codemod for deprecated `@abstractclassmethod`

We currently have a codemod to replace deprecated @abstractproperty. Ideally we should add deprecated @abstractclassmethod to this as well, although it won't be reflected by the current name of the codemod.

Do not apply `sandbox-process-creation` when arguments are literals

Description

We may encounter cases where all of the command arguments to a subprocess call are hard-coded literals. In these cases hardening does not really make any sense and should be skipped.

A simple implementation would be to detect whether the immediate argument is a literal list that contains only literal strings. However a more sophisticated implementation would involve some constant propagation. As part of this issue we should explore the level of effort involved in the latter approach. If it's too difficult, implement the simpler approach for now and open another ticket.

This issue was initially described in pixee/python-security#10

Codemod to replace use of `hasattr(obj, "__call__") with `callable`

Calling hasattr(some_obj, "__call__") can yield different results than callable(obj) in some cases, such as when __getattr__ is defined.

Same behavior:

>>> class One:
...     pass
...
>>> one = One()
>>> callable(one)
False
>>> hasattr(one, "__call__")
False

Differing behavior:

>>> class Two:
...      def __getattr__(self, name):
...          pass
...
>>> two = Two()
>>> callable(two)
False
>>> hasattr(two, "__call__")
True

Add requests timeout codemod is adding timeout kwarg to other methods

image

Example of Issue

The following code is part of a pull request made by pixie to a setup script in one of my projects after a correctly triggering an AddRequestsTimeouts Codemod.

versions = requests.get(versions_url, timeout=60).json(timeout=60)

This clearly would result in a TypeError since "timeout" is not a valid kwarg for JSONDecoder.__init__.

Potential Cause

I assume that this issue is caused by the OpenAI function call outputs that are used to create code edits not being properly linted on the backend before the resulting code is included in a pull request.

Potential Solution

A possible solution could be adding a valid function and argument parsing class to file_parsers that gathers the valid function names, argument names and types by checking the args in function definitions, typeing info and calls to kwargs.get/pop in each function for each file in the repo.

Then each function call in the AI written code could be linted for valid arguments+types and an error could be raised that triggers a subsequent OpenAI call requesting a revision and providing the valid args+types to the model to insure code is valid before including it a pull request.

As a backup in cases where multiple revisions produce invalid code, the valid args for functions in a given code snippet could be specified as an enum in the OpenAI function/tool definitions used to edit code.

Alternatively an Assistants API instance with the code_interpreter tool enabled could be prompted to test code snippets to prevent this issue but this would of course increase costs.

Good luck with this project absolutely awesome idea!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.