felix-hilden / pyfactor Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 0.0 257 KB

Source dependency visualisation for Python

Home Page: https://pyfactor.rtfd.org

License: MIT License

Python 100.00%

pyfactor's People

Contributors

Stargazers

Watchers

pyfactor's Issues

Improve top level statement discovery

Currently top level statements are only the ones directly under the root node. But I think only class or function definitions should break it. So everything else, like statements inside ifs and withs should be included on their own.

How dependencies are structured should be thought out. For example, are variables considered to be influenced by if statement dependencies? Does a with statement potentially produce a new variable? I think the answer is yes. For loops as well.

Graphviz syntax error in nodes with reserved words

I was experimenting with pyfactor to see how it operates and if it might help me decipher large codebases. I decided to test it out on the googler.py codebase as my guinea pig. Googler.py runs without a problem on my machine. However, when I run pyfactor against this source file, this is what my terminal looks like.

(py37) mark@mark-XPS-8910:/mnt/Data/CMR/PythonPrograms$ pyfactor googler.py
Error: : syntax error in line 118 near '->'

I suspect this error may be a red herring, but I don't have a clue where to begin debugging the problem because there doesn't
appear to be a syntax problem around that line in the source file. I also tried skipping imports and all it did was change the
line number of the syntax error as shown below:

(py37) mark@mark-XPS-8910:/mnt/Data/CMR/PythonPrograms$ pyfactor -si googler.py
Error: : syntax error in line 65 near '->'

Finally, I ran the googler code through black to reformat it - figuring it would change the line numbers and I might be able to deduce a pattern. Black increased the line count of googler.py from 3,659 to 4,100 lines. When I ran pyfactor against the blacked code it still showed the syntax error as being on line 118.

I thought you might be interested in seeing this edge case. If not, please just close this issue as I was just exploring.

I am using version 4.3.2 of googler.py in case that would be necessary to reproduce the error.

Good Luck with the project,
over and out,
Mark R.

Improve command line interface

There are loads of options to Graphviz we could expose, and make the documentation clearer. So let's give users the freedom.

Add option to choose graph root

This option would allow not to choose a parsing root (#7), but would choose a root node after parsing all nodes. This way users would be able to drill down into specific parts of the graph.

The option would select child nodes of the root recursively, and delete all other nodes. But I think coloring shouldn't be changed.

Correctly handle shadowing variables and assignment dependencies

Currently shadowing names are not excluded from dependencies. This might be the top thing to fix right away.

Two basic things should be taken into account: function definitions and class bodies. Nested functions could keep track of variables being defined before usage, nonlocal and global statements and variables in outer scopes. Class bodies are a bit of an exception, having the global scope, but not being propagated to method definitions. Then, lambda arguments can shadow variables too, as do for loops, withs and the walrus operator. Anything else?

Functionality to specify parsing root

I found this worked nicely when the target file was a collection of functions.
When running pyfactor on a file which contains a class object I was hoping to see a graph of relations between the members and variables of the class but it seems this is not possible? Instead it just lists the class itself and relation to imported libraries or variables defined globally.

Improve handling of loop elses and exception handlers

From #3: we should improve the dependency handling on control flow statements like for or try whose sections depend on what happens in other sections, i.e. raising or breaking. I think the sections should depend on the whole main body.

Use tooltips for useful information like docstrings

Graphviz can do tooltips for SVG images, which would allow us to for example have docstrings in the graph, giving much needed context for nodes or even edges.

Two approaches could be tried:

ast parsing can be used to detect constant string statements, but it is harder to link them back to the functions and particularly variables that use them. We would need to track the level of a node and then when a doc is detected on an appropriate level, extract it. But I'm not sure if ast can handle comments...
Source parsing could be the easier way out, and more flexible with various doc comments and such, and it would be as easy for variables as anything else once the rules are established.

Provide graph analysis

It could be useful to be able to analyse the graphs, not just view them. We could use color in the graphs in two ways, having hue reflect the type of node in question, and intensity as some sort of importance factor based on the number of connections to and from a node. A node can be assigned many colors and bg vs. border color can be used as well.

For example these kinds of nodes could be interesting:

Root node (brown): easy to spot, but could be included for completeness.
Leaf node (green): particularly ones that have many connections to different parts of the script could be good targets for refactoring out and importing. They can then be ignored and the graph simplifies.
Sub tree node (?): a node whose children are only children to that node means that the sub tree is isolated. Such things could be very easily refactored to a sub module. There is some sub graph functionality in Graphviz, maybe we could use that as well.
Hub node (?): a generalisation from leaf nodes, a node with many parents
Collector node (?): a generalisation from root nodes, a node with many children

The definitions above would need some sort of significance analysis, because every node is a hub or a collector to some degree, and not all sub trees are important. For sub trees the procedure could be something like this: when a low-level sub tree is found, if it's parent is also one, it is preferred, recursively. So we would get the largest possible sub tree.

Multi-file and recursive module analysis

When the library is a bit more mature, we could consider providing multi-file analysis. The first use case could be analysing the dependencies in a package with sub modules. It should be relatively straight forward to use the single-file functionality to connect names in multiple files together. But I bet there are some special cases that we should account for.

Multiple definitions for a single name

From #3. We have to decide how multiple definitions for a single name are handled. Consider these examples:

a = 1
b = 2

c = a
c = b

def foo():
    return a

def foo():
    return b

def bar():
    return c

bar = a

Here are my current thoughts: 1) the dependencies are merged 2) if all the definitions have the same type, that type is given in the visualisation, but if not, then we should use some unknown or multiple type

Color waypoint nodes

We can do better than coloring bridges: waypoint nodes (as I'm calling them just now) are nodes whose children can only be accessed via that node, making them a perfect candidate for separating from the rest of the source. So let's color them in.

Add option to collapse waypoints

It would be nice to be able to collapse the children of nodes after bridges to condense the visual representation. The nodes should meet specific criteria to ensure that they really are just children, recursively. Also, the nodes could be marked in some way, perhaps a doubled edge.

Stop naïvely collecting names from nodes

What was enough is not enough anymore. There is no place for naïve collection of AST names, because comprehensions can be used pretty much everywhere. We should instead use scoped methods, like we do in assignments already.

Handle new Python constructs

We have to address constructs added in the newest versions of Python. I can't remember the full list, but we should create that full list here at some point and then look to implementing it.

walrus assignment
positional-only parameters
...

Unknown assignment type

Our assignment parsing makes lots of assumptions about the statement structure. Thus, parsing fails with lots of builtin and third party modules. For example: globals()[name] = thing in the builtin collections module.

Import inside function

Imports inside functions (and presumably classes) are not registered as dependencies. This is because they technically do not create variables in the outer scope, nor depend on them. But with the multi-file analysis in #5, we ought to think about this.

felix-hilden / pyfactor Goto Github PK

pyfactor's People

Contributors

Stargazers

Watchers

pyfactor's Issues

Recommend Projects

Recommend Topics

Recommend Org