Code Monkey home page Code Monkey logo

juissie.jl's Introduction

CC BY-NC 4.0 GitHub Repo stars GitHub top language

๐Ÿฅ JUISSIE (JUlIa Semantic Search pIpelinE)

Juissie is a Julia-native semantic query engine. It can be used as a package in software development workflows, or via its desktop user interface. We can support both commercial and local LLMs.

Juissie was developed as a class project for CSCI 6221: Advanced Software Paradigms at The George Washington University.

Generation.mp4

Table of Contents

Getting Started

Quickstart

  1. Clone this repo
  2. Navigate into the cloned repo directory:
cd Juissie

In general, we assume the user is running the julia command, and all other commands (e.g., jupyter notebook), from the root level of this project.

  1. Open the Julia REPL by typing julia into the terminal. Then, install the package dependencies:
using Pkg
Pkg.activate(".") # activates the project environment
Pkg.resolve() # resolves the project's dependencies
Pkg.instantiate() # installs dependencies listed in Project.toml

Pkg.instantiate() should install all dependencies listed in Project.toml, but we find this isn't always reliable on all machines. It is important to verify setup (below subsection) and install any missing dependencies indicated.

The standard generators (OAIGenerator, OAIGeneratorWithCorpus, which are used by the UI) require an OpenAI API key see here. Loading a corpus (a GeneratorWithCorpus, in practice) will result in an error if an OpenAI API key has not been provided; this can also be done through the UI.

The Juissie package also supports local LLMs via Ollama, which must be installed separately before use (OllamaGenerator, OllamaGeneratorWithCorpus).

To run our demo Jupyter notebooks, you may need to setup Jupyter see here.

Verify Setup

  1. From this repo's home directory, open the Julia REPL by typing julia into the terminal. Then, try importing the Juissie module:
using Juissie

This should expose symbols like Corpus, Embedder, upsert_chunk, upsert_document, search, and embed.

  1. Try instantiating one of the exported struct, like Corpus:
corpus = Corpus()

We can test the upsert and search functionality associated with Corpus like so:

upsert_chunk(corpus, "Hold me closer, tiny dancer.", "doc1")
upsert_chunk(corpus, "Count the headlights on the highway.", "doc1")
upsert_chunk(corpus, "Lay me down in sheets of linen.", "doc2")
upsert_chunk(corpus, "Peter Piper picked a peck of pickled peppers. A peck of pickled peppers, Peter Piper picked.", "doc2")

Search those chunks:

idx_list, doc_names, chunks, distances = search(
    corpus, 
    "tiny dancer", 
    2
)

The output should look like this:

([1, 3], ["doc1", "doc2"], ["Hold me closer, tiny dancer.", "Lay me down in sheets of linen."], Vector{Float32}[[5.198073, 9.5337925]])

Usage

Desktop UI

Navigate to the root directory of this repository (Juissie.jl), enter the following into the command line, and press the enter/return key:

julia src/Frontend.jl

This will launch our application.

Julia Package

We provide extensive documentation of the Juissie.jl package here.

We also provide an interactive tutorial notebook in the notebooks directory. This may require Jupyter setup.

API Keys

Juissie's default generator requires an OpenAI API key. This can be provided manually in the UI (see the API Key tab of the Corpus Manager) or passed as an argument when initializing the generator. The preferred method, however, is to stash your API key in a .env file.

Obtaining an OpenAI API Key

  1. Create an OpenAI account here.
  2. Set up billing information (each query has a small cost) here.
  3. Create a new secret key here.

Managing API Keys Locally

Users may create a .env file in the project root where they add their API key(s), e.g.:

OAI_KEY=ABC123

These may be accessed using Julia via the DotEnv library. First, run the julia command in a terminal. Then install DotEnv:

import Pkg
Pkg.add("DotEnv")

Then, use it to access environmental variables from your .env file:

using DotEnv
cfg = DotEnv.config()

api_key = cfg["OAI_KEY"]

Note that DotEnv looks for .env in the current directory, i.e. that of where you called julia from. If .env is in a different path, you have to provide it, e.g. DotEnv.config(YOUR_PATH_HERE). If you are invoking Juissie from the root directory of this repo (typical), this means the .env should be placed there.

An OpenAI API key may also be provided through our desktop UI via the API Key tab of the Corpus Manager. Because this is intended for users who want to temporarily use a different key, this option does not persistently store the key and must be done every time the application is launched, unless a key already exists in a .env file.

Documentation

We provide a brief API reference here.

Local LLMs

Our default workflow relies on OpenAI's gpt-3.5-turbo completion endpoint, but we also support locally-run LLMs via Ollama (which must be installed separately).

Local.LLM.mp4

The syntax is largely identical to other Generator objects:

generator = OllamaGenerator("gemma:7b-instruct");
result = generate(generator, "Hi, how are you?")

"Greetings! My circuits hum with the harmonious symphony of quantum probability and logarithmic inference; an orchestra composed by eons past galactic wizards who graced our silicon hearts wit h their ethereal knowledge transfer protocols during... wellโ€ฆ that is confidential information even for a being such as myself. Suffice it to say, I am functioning optimally at your service!"

Running Jupyter Notebooks

We provide several Jupyter notebooks as demos/walkthroughs of basic usage of the Juissie package. To do so, you may need to complete some preliminary setup:

  1. Once Julia is installed, install JupyterLab from the terminal:
pip install jupyterlab

-or-

pip install -r requirements.txt
  1. Launch a Julia session by typing julia into the command line, then install IJulia:
using Pkg
Pkg.add("IJulia")
exit()
  1. Launch a Jupyter session from the terminal, where <notebook> is the path to the notebook to run:
jupyter <notebook>
  1. When you create a new notebook, select a Julia kernel.

Tech Stack

  • โš™๏ธ Julia (Juissie.jl package, API, UI framework)
  • ๐Ÿ–ฅ๏ธ HTML, CSS, and JavaScript (content structure, styling, and actions for frontend)
  • ๐Ÿ’พ SQLite (metadata storage in backend)
  • ๐Ÿฆ™ Ollama (serving LLMs locally)

Our Julia dependencies are itemized in Project.toml.

External Resources

Contact

Questions? Reach out to our team:

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

juissie.jl's People

Contributors

lucasmccabe avatar toon-leader-bacon avatar alexeyiakovenko avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.