CLIP (Contrastive Language–Image Pre-training) training code for Bangla.
Live Demo: HuggingFace Space
python >= 3.9
pip install -r requirements.txt
The model consists of an EfficientNet / ResNet image encoder and a BERT text encoder and was trained on multiple datasets from Bangla image-text domain. To start training,
python train_clip_bangla.py
- Search App Code: bangla-image-search
- Article: medium