Pull high-quality, efficient embeddings for PubMed, arXiv and Wikipedia from Huggingface and use for local LLM inference/Retrieval Augmented Generation (RAG)
Current jupyter notebook advises a python 3.10 environment, but under a fresh conda environment, several dependencies (torch*) likely need to be built from wheel and conflicts occur between beam and dill (for starters).
It appears that both environment configuration and requirements need to be updated.
Whilst a docker container may be the easiest way to distribute dataclysm, would still love to have a repeatable way to deploy with conda or venv alone - as I usually need a few drinks to be fully up for docking.