Code Monkey home page Code Monkey logo

virtualpi's Introduction

Virtual PI

This script reads all of the PDFs in a directory, and uses a Large Language Model to make the document content available for answering natural language prompts, via Slack.

The script is trivial.. it just stands on the shoulders of giants such as PaperQA, OpenAI Embeddings, FAISS, langchain, and Slack Bolt.

Why the name? When your Principal Investigator goes on holidays, you need a Virtual PI to answer the difficult questions!

This work was first inspired by a conversation with the authors of Galactic ChitChat: Using Large Language Models to Converse with Astronomy Literature, who implemented a similar tool, using a similar software stack. Virtual PI was first implemented and used for querying documentation for an astronomical instrument, MAVIS.

Configuration

API keys

# create your .env file:
cp .env-example .env 
# set your environment variables:
vim .env

Launching the bot

To run the script, you require:

  • A directory with the PDFs you wish the expert system to ingest (e.g., ./pdfs/*.pdfs)
  • A working Python3 environment with the following packages available:
    • pip3 install -r requirements.txt
  • An OpenAI API key.
  • You can Create a new Slack app that is preconfigured with the neccessary permissions by pressing the green 'Create App' button on that link.
    • You can change the name of your app/bot (you'll use this to interact with it on Slack, by editing the 'manifest' file when the option is presented.
    • You will need to copy the App and Bot Tokens to set as environment variables, as described below.

The three API tokens you have generated should be exported to your shell environment at runtime:

export OPENAI_API_KEY="sk-M...M"
export SLACK_APP_TOKEN="xapp-1...d"
export SLACK_BOT_TOKEN="xoxb-2...C"

e.g., by sourceing the .env file after modifying it.

Then you can start the app as follows.

python3 virtualpi.py /path/to/your/PDF/directory/

Recording Reactions

In some cases, you may wish to gather the reactions to bot messages (e.g., for further optimisation of the bot) by scanning a channel. Assuming the .env is setup correctly, you can save this data to disk bot_messages.json using by running the scan_messages.py script:

python scan_messages.py

To get the bot's user id (required in .env), find the bot's profile on your slack channel, and copy the id shown (starting with U...), e.g.:

Saving State

When the script starts it will check if a pickled version of the dense vector containing the documents is already available in the PDF directory. If found it will use that existing state (which saves time and the cost of API calls), otherwise it will parse the PDFs, embed them into the FAISS dense vector and then save this state for next time.

NB: If you add/remove PDFs you will need to remove the state file!

rm /path/to/your/PDF/directory/docs.pkl

Add to Slack Workspace

By now your app should be happily running. The final step is to actually add it to your Slack workspace.

  • In Slack, Click the '... More' on the top left.
  • Select 'Apps'.
  • Select the new app you created above.
  • Then go to a Slack channel and tag the app with a question e.g. @WhateverYouCalledYourApp what is the meaning of this?

NB: The app will only respond to mentions in a Slack channel, not to DMs.

An example interaction is shown below: alt text

Docker

Running with Docker is probably the easiest all round solution, but can make debugging a bit more tedious. To run with docker, use:

docker build -t virtualpi:latest
docker run --restart=unless-stopped -d -v ./pdfs:/app/pdfs --env-file=./.env virtualpi

This has the benefit of allowing multiple bots running on varied pdf sources. You can build the image once, then spin up a new container (changing the ./pdfs directory and probably .env.

For Docker, remove the export and quotation marks from the .env file. TODO: fix this hack.

virtualpi's People

Contributors

davidbrodrick avatar jcranney avatar

Stargazers

Mike Smith avatar John F. Wu avatar

Forkers

jhu-clsp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.