saltudelft / libsa4py Goto Github PK
View Code? Open in Web Editor NEWLibSA4Py: Light-weight static analysis for extracting type hints and features
License: Apache License 2.0
LibSA4Py: Light-weight static analysis for extracting type hints and features
License: Apache License 2.0
The class NumberRemover
in cst_transformers.py
should replace imaginary numbers with the [number]
special token.
type_annot_cove
JSONOutput.md
file accordinglyTo uniquely identify classes within a file, a qualified name is required.
To add this feature, a new field, i.e., q_name
, will be added to both class representation and the final JSON output.
MRE:
import textwrap
from pprint import pprint
import libcst
from libcst import metadata
from libsa4py.cst_extractor import Extractor
from libsa4py.cst_transformers import TypeAnnotationRemover, TypeApplier
mod = libcst.parse_module(
textwrap.dedent(
"""
def f():
x: int = 10
def g():
x: str = "Hello World"
"""
)
)
mod_annotations = Extractor.extract(mod.code).to_dict()
pprint(mod_annotations["funcs"])
annoless = mod.visit(TypeAnnotationRemover())
annotated = metadata.MetadataWrapper(annoless, unsafe_skip_copy=True).visit(
TypeApplier(mod_annotations, apply_nlp=False)
)
new_annotations = Extractor.extract(annotated.code).to_dict()
pprint(new_annotations["funcs"])
Output:
[{'docstring': {'func': None, 'long_descr': None, 'ret': None},
'fn_lc': ((2, 0), (3, 24)),
'fn_var_ln': {'x': ((3, 4), (3, 5))},
'fn_var_occur': {'x': [['x', 'builtins', 'int']]},
'name': 'f',
'params': {},
'params_descr': {},
'params_occur': {},
'q_name': 'f',
'ret_exprs': [],
'ret_type': '',
'variables': {'x': 'builtins.int'}},
{'docstring': {'func': None, 'long_descr': None, 'ret': None},
'fn_lc': ((5, 0), (6, 35)),
'fn_var_ln': {'x': ((6, 4), (6, 5))},
'fn_var_occur': {'x': [['x', 'builtins', 'str']]},
'name': 'g',
'params': {},
'params_descr': {},
'params_occur': {},
'q_name': 'g',
'ret_exprs': [],
'ret_type': '',
'variables': {'x': 'builtins.str'}}]
[{'docstring': {'func': None, 'long_descr': None, 'ret': None},
'fn_lc': ((3, 0), (4, 24)),
'fn_var_ln': {'x': ((4, 4), (4, 5))},
'fn_var_occur': {'x': [['x', 'builtins', 'int']]},
'name': 'f',
'params': {},
'params_descr': {},
'params_occur': {},
'q_name': 'f',
'ret_exprs': [],
'ret_type': '',
'variables': {'x': 'builtins.int'}},
{'docstring': {'func': None, 'long_descr': None, 'ret': None},
'fn_lc': ((6, 0), (7, 21)),
'fn_var_ln': {'x': ((7, 4), (7, 5))},
'fn_var_occur': {'x': []},
'name': 'g',
'params': {},
'params_descr': {},
'params_occur': {},
'q_name': 'g',
'ret_exprs': [],
'ret_type': '',
'variables': {'x': ''}}]
Note that the second dict for the function g
does not contain a type for the variable x
, despite the original clearly finding the anntotation builtins.str
A tentative fix that I applied was to add the following to the leave_FunctionDef
method of TypeApplier
, but I am unsure of its correctness.
self.last_visited_assign_t_name = None
self.last_visited_assign_t_count = 0
Some resetting of self.last_visited_assign_t_name
seems to be in order, or perhaps this should be assigned a fully qualified name instead of just the variable's name.
I hope this helps (or that I have made a mistake :) )
The SpaceAdder
CST transformer doesn't add an extra space in some rare cases like ~
operator.
Line and column numbers should be added to the final JSONOutput in the field fn_lc
.
Currently, type names are extracted from its slot without considering the module from which it's imported. For instance:
from typing import List
x: List[int] = []
The type of x
should be extracted as typing.List[int]
.
The Visitor
class should also include self variables in multiple assignments in the variables of a class method. For instance:
self.a, self.b = 45, 10
Currently, the walrus operator (:=
) can only be extracted and tested if Python 3.8+ is present in a user's system.
A workaround is needed to at least skip the unit test for the walrus operator if the Python version is lower than 3.8.
Type aliases should be resolved when extracting type annotations. For instance:
Vector = list[float]
Currently, Vector
is extracted, not the original type (i.e. list[float]
).
In the pipeline, an optional argument can be added to type-check processed source code file.
Here are the required changes:
tc
to modules in JSON files. Has either True
or False
value.--tc
to the process
command.type-check.py
.Pipeline
class to run the type-checker (e.g. mypy)It is expected to see all usages of a typed variable in a file to have its types. See the unit test test_propagated_types_file
and the file exp_outputs/propagated_types.py
in tests
branch.
The TypeAdder
class in cst_transformations.py
should be improved.
merge.py
moduleThe calculated type annotation coverage for both classes and modules is not the same as the manual calculation of these for a small code snippet.
In the Pipeline
class, skip empty files when processing. Also, a CLI arg might be useful for this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.