This repository contains a Python script for ingesting JSON data from ZIP files into Elasticsearch, optimized for bulk upload and duplicate prevention.
- Bulk upload to Elasticsearch
- Duplicate record prevention
- Configuration via
.ini
file - Comprehensive logging and error handling
- Supports multiple operations (ingest, enrich_download, enrich_upload)
Create a .ini
file with the following structure:
[elasticsearch]
host = your-elasticsearch-host
port = 9200
scheme = https
username = your-username
password = your-password
[cases]
case1 = /path/to/folder1
case2 = /path/to/folder2
[logging]
level = INFO
- Python 3.9+
- Elasticsearch 7.x
pip
package manager
git clone https://github.com/yourusername/elasticsearch-json-ingestion.git
cd elasticsearch-json-ingestion
pip install -r requirements.txt
Run the script:
python ingest.py --config path/to/your/config.ini
OR
Specify an operation (ingest, enrich_download, or enrich_upload):
python ingest.py --config path/to/your/config.ini --operation <operation>
veloastic-main/
│
├── ingest.py
├── config.ini
├── requirements.txt
└── README.md
The script logs its activity per JSON file, including errors and warnings.
- Fork the repository.
- Create a branch.
- Make your changes.
- Commit and push.
- Submit a pull request.
Licensed under the MIT License.