Prompt. Generate Synthetic Data. Train & Align Models.
DataDreamer is a powerful open-source Python library for prompting, synthetic data generation, and training workflows. It is designed to be simple, extremely efficient, and research-grade.
Installation pip3 install datadreamer.dev |
|
demo.py |
Result of demo.py |
---|---|
See the full demo script |
See the synthetic dataset and the trained model |
🚀 For more demonstrations and recipes see the Quick Tour page. |
With DataDreamer you can:
- 💬 Create Prompting Workflows: Create and run multi-step, complex, prompting workflows easily with major open source or API-based LLMs.
- 📊 Generate Synthetic Datasets: Generate synthetic datasets for novel tasks or augment existing datasets with LLMs.
- ⚙️ Train Models: Align models. Fine-tune models. Instruction-tune models. Distill models. Train on existing data or synthetic data.
- ... learn more about what's possible in the Overview Guide
DataDreamer is:
- 🧩 Simple: Simple and approachable to use with sensible defaults, yet powerful with support for bleeding edge techniques.
- 🔬 Research-Grade: Built for researchers, by researchers, but accessible to all. A focus on correctness, best practices, and reproducibility.
- 🏎️ Efficient: Aggressive caching and resumability built-in. Support for techniques like quantization, parameter-efficient training (LoRA), and more.
- 🔄 Reproducible: Workflows built with DataDreamer are easily shareable, reproducible, and extendable.
- 🤝 Makes Sharing Easy: Publishing datasets and models is simple. Automatically generate data cards and model cards with metadata. Generate a list of any citations required.
- ... learn more about the motivation and design principles behind DataDreamer.
coming soon...
Please reach out to us via email ([email protected]) or on Discord if you have any questions, comments, or feedback.
Copyright © 2024, Ajay Patel. Released under the MIT License.
Thank you to the maintainers at Hugging Face and LiteLLM for accepting contributions neccessary for DataDreamer and providing upstream support.