Code Monkey home page Code Monkey logo

codexdb's Introduction

Overview

CodexDB allows users to specify natural language instructions, together with their SQL queries. It uses OpenAI's GPT-3 Codex model to generate code for query processing that complies with those instructions. This enables far-reaching customization, ranging from the selection of frameworks for query processing to custom logging output. In doing so, CodexDB blurs the line between user and developer.

Setup

The following instructions have been tested on an EC2 instance of type t2.medium with Ubuntu 22.04 OS and 25 GB of disk space.

  1. After logging into the EC2 instance, run the following command (from /home/ubuntu):
git clone https://github.com/itrummer/CodexDB
  1. Switch into the CodexDB root directory:
cd CodexDB/
  1. Install pip if it is not yet installed, e.g., run:
sudo apt update
sudo apt install python3-pip
  1. Use pip to install required dependencies (make sure to use sudo):
sudo pip install -r requirements.txt
  1. Download and unzip the SPIDER dataset for benchmarking:
cd ..
sudo pip install gdown
gdown 1iRDVHLr4mX2wQKSgA9J8Pire73Jahh0m
sudo apt install unzip
unzip spider.zip
  1. Pre-process the SPIDER data set:
cd CodexDB
PYTHONPATH=src python3 src/codexdb/prep/spider.py /home/ubuntu/spider
  1. Set the following environment variables:
  • CODEXDB_TMP designates a working directory into which CodexDB writes temporary files (e.g., Python code for query execution).
  • CODEXDB_PYTHON is the name (or path) of the Python interpreter CodexDB uses to test the Python code it generates. E.g., set the two variables using the following commands:
export CODEXDB_TMP=/tmp
export CODEXDB_PYTHON=python3

Running CodexDB

WARNING: CodexDB generates Python code for query execution via large language models. Since CodexDB cannot guarantee to generate correct code, it is highly recommended to avoid running CodexDB on your primary machine. Instead, run CodexDB on a temporary EC2 instance and log into the Web interface from your primary machine.

  1. Start the CodexDB Web interface (replace [OPENAI_API_ACCESS_KEY] with your OpenAI access key!):
streamlit run src/codexdb/gui.py [OPENAI_API_ACCESS_KEY] /home/ubuntu/spider
  1. After executing the command above, you should see two URLs on the console:
  • Network URL
  • External URL

If using CodexDB on your local machine, open the first URL on your Web browser. If using CodexDB on a remote machine, open the second URL via your local Web browser. You may have to enable external access in the second case. E.g., when running CodexDB on Amazon EC2, make sure to add an inbound rule allowing TCP access on port 8501.

Troubleshooting

CodexDB only works with specific versions of the sqlglot SQL parsing library. If you encounter frequent errors in plan.py, check the installed version of sqlglot by running pip show sqlglot in the terminal. The required version is 1.16.1. If you see a different version number, uninstall sqlglot (sudo pip uninstall sqlglot) and reinstall the required version (e.g., by running pip install sqlglot==1.16.1).

CodexDB only supports a restricted class of SQL queries via the "plan" prompt. In particular, it only supports the specific join syntax used in the queries of the SPIDER benchmark. If your query falls outside of the class of supported queries, you can switch to the "query" prompt by selecting the corresponding prompt style in the "Prompt Configuration" section (see buttons on the left side of the Web interface). This prompt style does not integrate a summary of processing steps into the prompt and may therefore degrade quality.

How to cite

@article{Trummer2022b,
author = {Trummer, Immanuel},
journal = {PVLDB},
number = {11},
pages = {2921 -- 2928},
title = {{CodexDB: Synthesizing code for query processing from natural language instructions using GPT-3 Codex}},
volume = {15},
year = {2022}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.