Inline mapper for splitting documents and calculating OpenAI embeddings, for purposes of building vectorstore knowledge base usable by GPT and ChatGPT.
Built with the Meltano Tap SDK for Singer Taps.
- https://github.com/meltanolabs/tap-beautifulsoup - [Coming soon!] A tap for scraping web content from ReadTheDocs and other websites.
- https://github.com/meltanolabs/target-chromadb - A Singer target that can be used to load documents and embeddings created by this library.
- https://github.com/meltanolabs/gpt-ext - A Meltano utility which can be used to chat with the documents after they are loaded into the vector store.
Before using this tap for ReadTheDocs, you first need to download the site locally using wget:
% MY_RTD_SITE=sdk.meltano.com
% wget -r -A.html https://${MY_RTD_SITE}/en/latest/
A full list of supported settings and capabilities for this tap is available by running:
map-gpt-embeddings --about
This Singer tap will automatically import any environment variables within the working directory's
.env
if the --config=ENV
is provided, such that config values will be considered if a matching
environment variable is set either in the terminal context or in the .env
file.
You will need an OpenAI API Key to calculate embeddings using OpenAI's models. Free accounts are rate limited to 60 calls per minute. This is different from ChatGPT Plus account and requires a per-API call billing method established with OpenAI.
You can easily run map-gpt-embeddings
by itself or in a pipeline using Meltano.
map-gpt-embeddings --version
map-gpt-embeddings --help
map-gpt-embeddings --config CONFIG --discover > ./catalog.json
Follow these instructions to contribute to this project.
pipx install poetry
poetry install
Create tests within the map_gpt_embeddings/tests
subfolder and
then run:
poetry run pytest
You can also test the map-gpt-embeddings
CLI interface directly using poetry run
:
poetry run map-gpt-embeddings --help
Testing with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd map-gpt-embeddings
meltano install
Now you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke map-gpt-embeddings --version
# OR run a test `elt` pipeline:
meltano elt map-gpt-embeddings target-jsonl
See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.