felix-hilden / pyfactor Goto Github PK
View Code? Open in Web Editor NEWSource dependency visualisation for Python
Home Page: https://pyfactor.rtfd.org
License: MIT License
Source dependency visualisation for Python
Home Page: https://pyfactor.rtfd.org
License: MIT License
Currently top level statements are only the ones directly under the root node. But I think only class or function definitions should break it. So everything else, like statements inside ifs and withs should be included on their own.
How dependencies are structured should be thought out. For example, are variables considered to be influenced by if statement dependencies? Does a with statement potentially produce a new variable? I think the answer is yes. For loops as well.
I was experimenting with pyfactor to see how it operates and if it might help me decipher large codebases. I decided to test it out on the googler.py codebase as my guinea pig. Googler.py runs without a problem on my machine. However, when I run pyfactor against this source file, this is what my terminal looks like.
(py37) mark@mark-XPS-8910:/mnt/Data/CMR/PythonPrograms$ pyfactor googler.py
Error: : syntax error in line 118 near '->'
I suspect this error may be a red herring, but I don't have a clue where to begin debugging the problem because there doesn't
appear to be a syntax problem around that line in the source file. I also tried skipping imports and all it did was change the
line number of the syntax error as shown below:
(py37) mark@mark-XPS-8910:/mnt/Data/CMR/PythonPrograms$ pyfactor -si googler.py
Error: : syntax error in line 65 near '->'
Finally, I ran the googler code through black to reformat it - figuring it would change the line numbers and I might be able to deduce a pattern. Black increased the line count of googler.py from 3,659 to 4,100 lines. When I ran pyfactor against the blacked code it still showed the syntax error as being on line 118.
I thought you might be interested in seeing this edge case. If not, please just close this issue as I was just exploring.
I am using version 4.3.2 of googler.py in case that would be necessary to reproduce the error.
Good Luck with the project,
over and out,
Mark R.
There are loads of options to Graphviz we could expose, and make the documentation clearer. So let's give users the freedom.
This option would allow not to choose a parsing root (#7), but would choose a root node after parsing all nodes. This way users would be able to drill down into specific parts of the graph.
The option would select child nodes of the root recursively, and delete all other nodes. But I think coloring shouldn't be changed.
Currently shadowing names are not excluded from dependencies. This might be the top thing to fix right away.
Two basic things should be taken into account: function definitions and class bodies. Nested functions could keep track of variables being defined before usage, nonlocal and global statements and variables in outer scopes. Class bodies are a bit of an exception, having the global scope, but not being propagated to method definitions. Then, lambda arguments can shadow variables too, as do for loops, withs and the walrus operator. Anything else?
I found this worked nicely when the target file was a collection of functions.
When running pyfactor on a file which contains a class object I was hoping to see a graph of relations between the members and variables of the class but it seems this is not possible? Instead it just lists the class itself and relation to imported libraries or variables defined globally.
From #3: we should improve the dependency handling on control flow statements like for
or try
whose sections depend on what happens in other sections, i.e. raising or breaking. I think the sections should depend on the whole main body.
Graphviz can do tooltips for SVG images, which would allow us to for example have docstrings in the graph, giving much needed context for nodes or even edges.
Two approaches could be tried:
ast
parsing can be used to detect constant string statements, but it is harder to link them back to the functions and particularly variables that use them. We would need to track the level of a node and then when a doc is detected on an appropriate level, extract it. But I'm not sure if ast
can handle comments...It could be useful to be able to analyse the graphs, not just view them. We could use color in the graphs in two ways, having hue reflect the type of node in question, and intensity as some sort of importance factor based on the number of connections to and from a node. A node can be assigned many colors and bg vs. border color can be used as well.
For example these kinds of nodes could be interesting:
The definitions above would need some sort of significance analysis, because every node is a hub or a collector to some degree, and not all sub trees are important. For sub trees the procedure could be something like this: when a low-level sub tree is found, if it's parent is also one, it is preferred, recursively. So we would get the largest possible sub tree.
When the library is a bit more mature, we could consider providing multi-file analysis. The first use case could be analysing the dependencies in a package with sub modules. It should be relatively straight forward to use the single-file functionality to connect names in multiple files together. But I bet there are some special cases that we should account for.
From #3. We have to decide how multiple definitions for a single name are handled. Consider these examples:
a = 1
b = 2
c = a
c = b
def foo():
return a
def foo():
return b
def bar():
return c
bar = a
Here are my current thoughts: 1) the dependencies are merged 2) if all the definitions have the same type, that type is given in the visualisation, but if not, then we should use some unknown
or multiple
type
We can do better than coloring bridges: waypoint nodes (as I'm calling them just now) are nodes whose children can only be accessed via that node, making them a perfect candidate for separating from the rest of the source. So let's color them in.
It would be nice to be able to collapse the children of nodes after bridges to condense the visual representation. The nodes should meet specific criteria to ensure that they really are just children, recursively. Also, the nodes could be marked in some way, perhaps a doubled edge.
What was enough is not enough anymore. There is no place for naïve collection of AST names, because comprehensions can be used pretty much everywhere. We should instead use scoped methods, like we do in assignments already.
We have to address constructs added in the newest versions of Python. I can't remember the full list, but we should create that full list here at some point and then look to implementing it.
Our assignment parsing makes lots of assumptions about the statement structure. Thus, parsing fails with lots of builtin and third party modules. For example: globals()[name] = thing
in the builtin collections
module.
Imports inside functions (and presumably classes) are not registered as dependencies. This is because they technically do not create variables in the outer scope, nor depend on them. But with the multi-file analysis in #5, we ought to think about this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.