Code Monkey home page Code Monkey logo

codequestion's People

Contributors

davidmezzetti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codequestion's Issues

Migrate from word vector models to sentence transformers models

Since the original release in January 2020 there has been a lot of progress! sentence-transformers models now perform better than the models currently in codequestion with similar speed (even on CPUs!).

Models in codequestion 2.0 should move to sentence-transformers.

Vector model file not found

Hello,

Thank you very much for the project. But I have one small issue, right now it seems that when you run python -m codequestion.download it downloads a configuration file that will be used by codequestion to load the model.

The path to the model seems hardcoded to /home/dmezzett/.codequestion/vectors/stackexchange-300d.magnitude
How can we specify to codequestion to use our home or modify the config file?

Best

Upgrade to txtai 5.x

txtai 5.0 was recently released and much has happened since the last version of codequestion!

The next release of codequestion should replace questions.db with storing content directly in the index. Topics and path traversal should also be added via semantic graphs.

Add code quality checks

Add the following standard processes and procedures.

  • Unit tests
  • Test coverage
  • GitHub actions workflow
  • Pre-commit code quality checks

UserWarning: Trying to unpickle estimator TruncatedSVD from version 0.23.1 when using version 0.23.2

~/miniconda3/envs/deepl/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator TruncatedSVD from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(
codequestion query shell

Received the warning above when launching codequestion shell after a fresh install.

System details:

  • Ubuntu 18.04
  • Miniconda Python 3.8
  • CPU-only pytorch 1.6

52177 segmentation fault codequestion

โžœ python3.10 -m pip install codequestion

sformers, torch, txtai, codequestion
Successfully installed MarkupSafe-2.1.2 codequestion-2.0.0 faiss-cpu-1.7.3 html2markdown-0.1.7 huggingface-hub-0.13.4 jinja2-3.1.2 mpmath-1.3.0 networkx-3.1 python-louvain-0.16 scipy-1.10.1 sympy-1.11.1 tokenizers-0.13.3 torch-2.0.0 transformers-4.28.1 txtai-5.5.0

โžœ python3.10 -m codequestion.download

Downloading model from https://github.com/neuml/codequestion/releases/download/v2.0.0/cqmodel.zip to /var/folders/4b/fykz7dvx2fj550ml_6t5qkww0000gn/T/cqmodel.zip
100%|
Decompressing model to /Users/tonis/.codequestion
Download complete

โžœ codequestion

Loading model from /Users/tonis/.codequestion/models/stackexchange
[1]    58256 segmentation fault  codequestion

I also tried in venv. But I'm not a Python expert

file not found /home/dmezzett/.codequestion/vectors/stackexchange-300d.magnitude

root@0497bd526f2b:/# codequestion
Loading model from /root/.codequestion/models/stackexchange
/usr/local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator TruncatedSVD from version 0.22.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
warnings.warn(
Traceback (most recent call last):
File "/usr/local/bin/codequestion", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/codequestion/shell.py", line 35, in main
Shell().cmdloop()
File "/usr/local/lib/python3.8/cmd.py", line 105, in cmdloop
self.preloop()
File "/usr/local/lib/python3.8/site-packages/codequestion/shell.py", line 21, in preloop
self.embeddings, self.db = Query.load()
File "/usr/local/lib/python3.8/site-packages/codequestion/query.py", line 127, in load
embeddings.load(path)
File "/usr/local/lib/python3.8/site-packages/codequestion/embeddings.py", line 332, in load
self.vectors = self.loadVectors(self.config["path"])
File "/usr/local/lib/python3.8/site-packages/codequestion/embeddings.py", line 104, in loadVectors
raise IOError(ENOENT, "Vector model file not found", path)
FileNotFoundError: [Errno 2] Vector model file not found: '/home/dmezzett/.codequestion/vectors/stackexchange-300d.magnitude'

test requires to specify source

The example in the readme simply has python -m codequestion.evaluate but that give an error of missing -s {SOME SOURCE} or --source {SOME SOURCE}.

I was able to run it with python -m codequestion.evaluate -s test (assuming it ran after following the rest of the steps).

pip install results No matching distribution found for torch>=1.4.0 (from txtai>=1.2.0->codequestion)

System: Windows 10 (x64) running Python 3.8.1 and pip 20.2.3.

ERROR: Could not find a version that satisfies the requirement torch>=1.4.0 (from txtai>=1.2.0->codequestion) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.4.0 (from txtai>=1.2.0->codequestion)
(env) D:\code\codequestion>python -m pip install --upgrade pip
Collecting pip
  Using cached https://files.pythonhosted.org/packages/4e/5f/528232275f6509b1fff703c9280e58951a81abe24640905de621c9f81839/pip-20.2.3-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 19.2.3
    Uninstalling pip-19.2.3:
      Successfully uninstalled pip-19.2.3
Successfully installed pip-20.2.3

Here's what the full run looks like.

(env) D:\code\codequestion>pip install codequestion
Collecting codequestion
  Using cached codequestion-1.1.0-py3-none-any.whl (17 kB)
Collecting tqdm==4.48.0
  Using cached tqdm-4.48.0-py2.py3-none-any.whl (67 kB)
Collecting txtai>=1.2.0
  Using cached txtai-1.2.0-py3-none-any.whl (20 kB)
Collecting mdv>=1.7.4
  Using cached mdv-1.7.4.tar.gz (54 kB)
Collecting html2text>=2020.1.16
  Using cached html2text-2020.1.16-py3-none-any.whl (32 kB)
Collecting numpy>=1.18.4
  Downloading numpy-1.19.2-cp38-cp38-win_amd64.whl (13.0 MB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 13.0 MB 6.4 MB/s
Collecting annoy>=1.16.3
  Downloading annoy-1.16.3.tar.gz (644 kB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 644 kB 6.4 MB/s
Collecting pymagnitude-lite>=0.1.43
  Downloading pymagnitude_lite-0.1.143-py3-none-any.whl (34 kB)
Collecting nltk>=3.5
  Using cached nltk-3.5.zip (1.4 MB)
Collecting sentence-transformers>=0.3.3
  Using cached sentence-transformers-0.3.6.tar.gz (62 kB)
Collecting fasttext>=0.9.2
  Downloading fasttext-0.9.2.tar.gz (68 kB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 68 kB 4.8 MB/s
Collecting hnswlib>=0.4.0
  Downloading hnswlib-0.4.0.tar.gz (17 kB)
Collecting scikit-learn>=0.23.1
  Downloading scikit_learn-0.23.2-cp38-cp38-win_amd64.whl (6.8 MB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 6.8 MB 3.3 MB/s
Collecting regex>=2020.5.14
  Using cached regex-2020.7.14-cp38-cp38-win_amd64.whl (264 kB)
Collecting transformers==3.0.2
  Downloading transformers-3.0.2-py3-none-any.whl (769 kB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 769 kB 6.4 MB/s
ERROR: Could not find a version that satisfies the requirement torch>=1.4.0 (from txtai>=1.2.0->codequestion) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.4.0 (from txtai>=1.2.0->codequestion)

Vector model file not found (cord19-300d.magnitude)

Hi,

I get the following error when running python -m paperai.index

raise IOError(ENOENT, "Vector model file not found", path)
FileNotFoundError: [Errno 2] Vector model file not found: 'C:\\Users\\x\\.cord19\\vectors\\cord19-300d.magnitude'

PS. I am quite new to all this; so, apologies if the mistake is on my end.

Upgrade to txtai 6.0

This change will update the minimum dependency for codequestion to txtai 6.0.

The main code change needed here is with the scoring package. With the addition of term indexing, checks need to be added to determine if a scoring index is for term indexing or word vectors weighting.

ImportError: Faiss library is not installed

Trying to configure on Windows 10, I seem to have gotten everything installed but get this traceback when I run it:

(keras-gpu-2) C:\Users\bbate>codequestion
The system cannot find the path specified.
2020-09-15 13:57:44.515137: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Loading model from C:\Users\bbate\.codequestion\models\stackexchange
Traceback (most recent call last):
  File "c:\users\bbate\miniconda3\envs\keras-gpu-2\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\bbate\miniconda3\envs\keras-gpu-2\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\bbate\Miniconda3\envs\keras-gpu-2\Scripts\codequestion.exe\__main__.py", line 7, in <module>
  File "c:\users\bbate\miniconda3\envs\keras-gpu-2\lib\site-packages\codequestion\shell.py", line 48, in main
    Shell().cmdloop()
  File "c:\users\bbate\miniconda3\envs\keras-gpu-2\lib\cmd.py", line 105, in cmdloop
    self.preloop()
  File "c:\users\bbate\miniconda3\envs\keras-gpu-2\lib\site-packages\codequestion\shell.py", line 22, in preloop
    self.embeddings, self.db = Query.load()
  File "c:\users\bbate\miniconda3\envs\keras-gpu-2\lib\site-packages\codequestion\query.py", line 127, in load
    embeddings.load(path)
  File "c:\users\bbate\miniconda3\envs\keras-gpu-2\lib\site-packages\txtai\embeddings.py", line 258, in load
    self.embeddings = ANN.create(self.config)
  File "c:\users\bbate\miniconda3\envs\keras-gpu-2\lib\site-packages\txtai\ann.py", line 51, in create
    raise ImportError("Faiss library is not installed")
ImportError: Faiss library is not installed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.