Code Monkey home page Code Monkey logo

autosoft-dev / tree-hugger Goto Github PK

View Code? Open in Web Editor NEW
125.0 9.0 11.0 1.41 MB

A light-weight, extendable, high level, universal code parser built on top of tree-sitter

License: MIT License

Python 80.65% PHP 0.79% Java 0.74% JavaScript 0.85% C++ 0.50% Jupyter Notebook 16.44% Shell 0.03%
python parsing tree-sitter cli ast languages parser universal python-binding machine-learning-on-source-code programming-language-theory data-mining code-mining java javascript cpp php

tree-hugger's Issues

Java Queries not Included in Pypi Package

tree-hugger version: 0.10.0

I tried to use the JavaParser:

from tree_hugger.core import JavaParser

jp = JavaParser(library_loc="/content/my-languages.so")

However, I ran into this error:

QueryFileNotFoundError                    Traceback (most recent call last)
<ipython-input-3-c60a404b4311> in <module>()
      2 from tree_hugger.core import JavaParser
      3 
----> 4 jp = JavaParser(library_loc="/content/my-languages.so")

2 frames
/usr/local/lib/python3.6/dist-packages/tree_hugger/core/queries.py in fromFile(query_file_path)
     21     def fromFile(query_file_path: str):
     22         if not Path(query_file_path).exists() or not Path(query_file_path).is_file():
---> 23             raise QueryFileNotFoundError(f"Cound not find {query_file_path}")
     24 
     25         with open(query_file_path) as f:

QueryFileNotFoundError: Cound not find /usr/local/lib/python3.6/dist-packages/tree_hugger/core/parser/java/queries.yml

It seems the somehow the pypi package doesn't include this queries.yml file for the java parser. The python one works just fine and when I copy the java queries.yml into the location that tree-hugger expects, it works as expected. I think there must be some weirdness happening with the pypi packaging that is causing this.

You can reproduce this using this notebook: https://colab.research.google.com/drive/1v7np3pmSIuig-Js1xVP62MbTrc9ru6c2?usp=sharing

BTW love the library! I was actually gonna create something like this, but glad to see now I have a place I can just directly contribute too :D.

Normalize the API

Despite trying hard, the Primary API of tree-hugger is yet not normalized. We need to make a detailed description of the standard method names and normalize it across language implementations.

Can we make parsers stateless?

Right now Parser objects are stateful (in a sense that each obejct keep the parse tree internally and also the raw code and all, of the file / code string that it parsed last)

Questions

  • Is it necessary to make them stateless (imagine creating a single parser and using it in parallel on multiple files using a thread or multi-process based architecture)

  • If the answer to the above is Yes then how do we do that?


This may need a significant re-design of tree-hugger works at the moment. So, if we do that we need to keep in mind that we must not break backward compatibility.

Add a factory function to generate the proper paerser object

As an alternative to doing this

from tree_hugger.core import PythonParser

pp = PythonParser()

We also suggest having something like this

from tree_hugger.language_factory import get_parser

pp = get_parser('Python')

As most of the parsers expose a very similar API so having a factory makes sense in this regard.

Issue: Incompatible for Mac Version

**I get the following error while trying to run for Mac M1 chip:

JavaScript

from tree_hugger.core import JavascriptParser
jsp = JavascriptParser(library_loc='py_php_js_cpp_java_darwin_64.so')
jsp.parse_file("index.js")
jsp.get_all_function_names()**

File ~/miniforge3/envs/env_tensor/lib/python3.9/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    371 self._FuncPtr = _FuncPtr
    373 if handle is None:
--> 374     self._handle = _dlopen(self._name, mode)
    375 else:
    376     self._handle = handle

OSError: dlopen(py_php_js_cpp_java_darwin_64.so, 0x0006): tried: '/Users/zunaira/miniforge3/envs/env_tensor/lib/python3.9/lib-dynload/../../py_php_js_cpp_java_darwin_64.so' (no such file), '/Users/zunaira/miniforge3/envs/env_tensor/bin/../lib/py_php_js_cpp_java_darwin_64.so' (no such file), 'py_php_js_cpp_java_darwin_64.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e))), '/usr/local/lib/py_php_js_cpp_java_darwin_64.so' (no such file), '/usr/lib/py_php_js_cpp_java_darwin_64.so' (no such file), '/Users/zunaira/Desktop/JScode2vec/CodeSummarizer/py_php_js_cpp_java_darwin_64.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)

Add PHP to the mix

Add PHP language API (class) to the mix

Goal -

  • Create all the APIs that PythonParser supports for PHPParser also

  • Write docstrings for each parser methods.

No such attribute error

When I am trying to use python tree hugger on a file according to this example, attributes like "get_all_function_names ", "get_all_function_doctrings " are working, but "get_all_class_methods " and other methods api references are not working. But they are mentioned as available in the Readme. I am getting "No such attribute" error.

import pandas as pd
import numpy as np
from tree_hugger.core import PythonParser

pp = PythonParser(library_loc="/home/xyz/my-languages.so")

pp.parse_file('FeatureV.py')

pp.get_all_class_methods()

This is producing the following error: -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-29-d1fcd6b98ce4> in <module>
----> 1 pp.get_all_class_methods()

AttributeError: 'PythonParser' object has no attribute 'get_all_class_methods'

But some of the attributes are working in the same code, example: -

import pandas as pd
import numpy as np
from tree_hugger.core import PythonParser

pp = PythonParser(library_loc="/home/xyz/my-languages.so")

pp.parse_file('FeatureV.py')

pp.get_all_class_names()

Output: -

['FeatureView', 'ResetListener', 'ShowSrcListener', 'LinkListener']

Better documentation

Well, the title of the issue tells it all, but just to clarify, we may need a better documentation (something in the read the docs or something like that, or a github page)

It is going to be a tricky work and a very necessary one.

"Parser library path is 'None'."

After doing pip install tree-hugger, installing the VC++ build tools, rebooting and running create_libs python, everything seemed to be ready on my Windows 10 PC (which has WSL installed - not sure if that's important).

However, the following example produces an exception:

from tree_hugger.core import PythonParser
def my_func():
  pp = PythonParser()
  pp.parse_file("try_tree_hugger.py")
  print(pp.get_all_function_names())
my_func()
  File ".\try_tree_hugger.py", line 8, in <module>
    my_func()
  File ".\try_tree_hugger.py", line 4, in my_func
    pp = PythonParser()
  File "C:\Users\Family\AppData\Local\Programs\Python\Python38-32\lib\site-packages\tree_hugger\core\python_parser.py", line 31, in __init__
    super(PythonParser, self).__init__('python', 'python_quaries', library_loc, query_file_path)
  File "C:\Users\Family\AppData\Local\Programs\Python\Python38-32\lib\site-packages\tree_hugger\core\code_parser.py", line 33, in __init__
    raise ParserLibraryNotFoundError("Parser library path is 'None'. Please either set up the environment or call the constructor with the path")
tree_hugger.exceptions.ParserLibraryNotFoundError: Parser library path is 'None'. Please either set up the environment or call the constructor with the path

Make queries.yml file a part of package (so that it is installed automatically)

We have it external, which means people wont be able to use tree-hugger without donwloading/creating it before. That is not a good UX plus does not serve the basic design philosophy well. So we will need to make it a part of main code base.

Which means we need to create an assets directory under tree_hugger and then create a sub-directory called queries which will have a file called queries.yml

We can start by copying the example_queries.yml file.

We will also need to make sure that it becomes a part of the core code.

Windows support

The pip install works for Windows however, even if the create_libs works it seems that the generated library is not the right one. We need to figure that out and bring support to windows.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.