Code Monkey home page Code Monkey logo

codeqai's Introduction

codeqai

Build Publish License

Search your codebase semantically or chat with it from cli. Keep the vector database superfast up to date to the latest code changes. 100% local support without any dataleaks.
Built with langchain, treesitter, sentence-transformers, instructor-embedding, faiss, lama.cpp, Ollama, Streamlit.

โœจ Features

  • ๐Ÿ”Ž ย Semantic code search
  • ๐Ÿ’ฌ ย GPT-like chat with your codebase
  • โš™๏ธ ย Synchronize vector store and latest code changes with ease
  • ๐Ÿ’ป ย 100% local embeddings and llms
    • sentence-transformers, instructor-embeddings, llama.cpp, Ollama
  • ๐ŸŒ ย OpenAI and Azure OpenAI support
  • ๐ŸŒณ ย Treesitter integration

Note

There will be better results if the code is well documented. You might consider doc-comments-ai for code documentation generation.

๐Ÿš€ Usage

Start semantic search:

codeqai search

Start chat dialog:

codeqai chat

Synchronize vector store with current git checkout:

codeqai sync

Start Streamlit app:

codeqai app

Note

At first usage, the repository will be indexed with the configured embeddings model which might take a while.

๐Ÿ“‹ Requirements

  • Python >=3.9,<3.12

๐Ÿ“ฆ Installation

Install in an isolated environment with pipx:

pipx install codeqai

โš  Make sure pipx is using Python >=3.9,<3.12.
To specify the Python version explicitly with pipx, activate the desired Python version (e.g. with pyenv shell 3.X.X) and intall with:

pipx install codeqai --python $(which python)

If you are still facing issues using pipx you can also install directly from source through PyPI with:

pip install codeqai

However, it is recommended to use pipx to benefit from isolated environments for the dependencies.
Visit the Troubleshooting section for solutions of known issues during installation.

Note

Some packages are not installed by default. At first usage it is asked to install faiss-cpu or faiss-gpu. Faiss-gpu is recommended if the hardware supports CUDA 7.5+. If local embeddings and llms are used it will be further asked to install sentence-transformers, instructor or llama.cpp.

๐Ÿ”ง Configuration

At first usage or by running

codeqai configure

the configuration process is initiated, where the embeddings and llms can be chosen.

Important

If you want to change the embeddings model in the configuration later, delete the cached files in ~/.cache/codeqai. Afterwards the vector store files are created again with the recent configured embeddings model. This is neccessary since the similarity search does not work if the models differ.

๐ŸŒ Remote models

If remote models are used, the following environment variables are required. If the required environment variables are already set, they will be used, otherwise you will be prompted to enter them which are then stored in ~/.config/codeqai/.env.

OpenAI

export OPENAI_API_KEY = "your OpenAI api key"

Azure OpenAI

export OPENAI_API_TYPE = "azure"
export AZURE_OPENAI_ENDPOINT = "https://<your-endpoint>.openai.azure.com/"
export OPENAI_API_KEY = "your Azure OpenAI api key"
export OPENAI_API_VERSION = "2023-05-15"

Note

To change the environment variables later, update the ~/.config/codeqai/.env manually.

๐Ÿ“š Supported Languages

  • Python
  • Typescript
  • Javascript
  • Java
  • Rust
  • Kotlin
  • Go
  • C++
  • C
  • C#
  • Ruby

๐Ÿ’ก How it works

The entire git repo is parsed with treesitter to extract all methods with documentations and saved to a local FAISS vector database with either sentence-transformers, instructor-embeddings or OpenAI's text-embedding-ada-002.
The vector database is saved to a file on your system and will be loaded later again after further usage. Afterwards it is possible to do semantic search on the codebase based on the embeddings model.
To chat with the codebase locally llama.cpp or Ollama is used by specifying the desired model. For synchronization of recent changes in the repository, the git commit hashes of each file along with the vector Ids are saved to a cache. When synchronizing the vector database with the latest git state, the cached commit hashes are compared to the current git hash of each file in the repository. If the git commit hashes differ, the related vectors are deleted from the database and inserted again after recreating the vector embeddings. Using llama.cpp the specified model needs to be available on the system in advance. Using Ollama the Ollama container with the desired model needs to be running locally in advance on port 11434. Also OpenAI or Azure-OpenAI can be used for remote chat models.

๏ผŸFAQ

Where do I get models for llama.cpp?

Install the huggingface-cli and download your desired model from the model hub. For example

huggingface-cli download TheBloke/CodeLlama-13B-Python-GGUF codellama-13b-python.Q5_K_M.gguf

will download the codellama-13b-python.Q5_K_M model. After the download has finished the absolute path of the model .gguf file is printed to the console.

Important

llama.cpp compatible models must be in the .gguf format.

๐Ÿ›Ÿ Troubleshooting

  • During installation with pipx

    pip failed to build package: tiktoken
    
    Some possibly relevant errors from pip install:
      error: subprocess-exited-with-error
      error: can't find Rust compiler
    

    Make sure the rust compiler is installed on your system from here.

  • During installation of faiss

    ร— Building wheel for faiss-cpu (pyproject.toml) did not run successfully.
    โ”‚ exit code: 1
    โ•ฐโ”€> [12 lines of output]
        running bdist_wheel
        ...
    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for faiss-cpu
    Failed to build faiss-cpu
    ERROR: Could not build wheels for faiss-cpu, which is required to install pyproject.toml-based projects
    

    Make sure to have codeqai installed with Python <3.12. There is no faiss wheel available yet for Python 3.12.

๐ŸŒŸ Contributing

If you are missing a feature or facing a bug don't hesitate to open an issue or raise a PR. Any kind of contribution is highly appreciated!

codeqai's People

Contributors

bhargavnova avatar dependabot[bot] avatar fynnfluegge avatar nisarg1112 avatar shreyahegde18 avatar yenif avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.