Virtual PI

This script reads all of the PDFs in a directory, and uses a Large Language Model to make the document content available for answering natural language prompts, via Slack.

The script is trivial.. it just stands on the shoulders of giants such as PaperQA, OpenAI Embeddings, FAISS, langchain, and Slack Bolt.

Why the name? When your Principal Investigator goes on holidays, you need a Virtual PI to answer the difficult questions!

This work was first inspired by a conversation with the authors of Galactic ChitChat: Using Large Language Models to Converse with Astronomy Literature, who implemented a similar tool, using a similar software stack. Virtual PI was first implemented and used for querying documentation for an astronomical instrument, MAVIS.

Configuration

API keys

# create your .env file:
cp .env-example .env 
# set your environment variables:
vim .env

Launching the bot

To run the script, you require:

A directory with the PDFs you wish the expert system to ingest (e.g., ./pdfs/*.pdfs)
A working Python3 environment with the following packages available:
- pip3 install -r requirements.txt
An OpenAI API key.
You can Create a new Slack app that is preconfigured with the neccessary permissions by pressing the green 'Create App' button on that link.
- You can change the name of your app/bot (you'll use this to interact with it on Slack, by editing the 'manifest' file when the option is presented.
- You will need to copy the App and Bot Tokens to set as environment variables, as described below.

The three API tokens you have generated should be exported to your shell environment at runtime:

export OPENAI_API_KEY="sk-M...M"
export SLACK_APP_TOKEN="xapp-1...d"
export SLACK_BOT_TOKEN="xoxb-2...C"

e.g., by sourceing the .env file after modifying it.

Then you can start the app as follows.

python3 virtualpi.py /path/to/your/PDF/directory/

Recording Reactions

In some cases, you may wish to gather the reactions to bot messages (e.g., for further optimisation of the bot) by scanning a channel. Assuming the .env is setup correctly, you can save this data to disk bot_messages.json using by running the scan_messages.py script:

python scan_messages.py

To get the bot's user id (required in .env), find the bot's profile on your slack channel, and copy the id shown (starting with U...), e.g.:

Saving State

When the script starts it will check if a pickled version of the dense vector containing the documents is already available in the PDF directory. If found it will use that existing state (which saves time and the cost of API calls), otherwise it will parse the PDFs, embed them into the FAISS dense vector and then save this state for next time.

NB: If you add/remove PDFs you will need to remove the state file!

rm /path/to/your/PDF/directory/docs.pkl

Add to Slack Workspace

By now your app should be happily running. The final step is to actually add it to your Slack workspace.

In Slack, Click the '... More' on the top left.
Select 'Apps'.
Select the new app you created above.
Then go to a Slack channel and tag the app with a question e.g. @WhateverYouCalledYourApp what is the meaning of this?

NB: The app will only respond to mentions in a Slack channel, not to DMs.

An example interaction is shown below:

Docker

Running with Docker is probably the easiest all round solution, but can make debugging a bit more tedious. To run with docker, use:

docker build -t virtualpi:latest
docker run --restart=unless-stopped -d -v ./pdfs:/app/pdfs --env-file=./.env virtualpi

This has the benefit of allowing multiple bots running on varied pdf sources. You can build the image once, then spin up a new container (changing the ./pdfs directory and probably .env.

For Docker, remove the export and quotation marks from the .env file. TODO: fix this hack.

jcranney / virtualpi Goto Github PK

virtualpi's Introduction

Virtual PI

Configuration

API keys

Launching the bot

Recording Reactions

Saving State

Add to Slack Workspace

Docker

virtualpi's People

Contributors

Stargazers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent