Linhui Xiao's Projects
Huggingface Transformers + Adapters = ❤️
Code for ALBEF: a new vision-language pre-training method
Simple Optical Character Recognizer (english-ocr-image-to-text-recognition-sample-trainig-alphabet-photo-data-database-dataset)
An open autonomous driving platform
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
(TPAMI 2024) A Survey on Open Vocabulary Learning
A curated list of reinforcement learning with human feedback resources (continually updated)
A Survey on Open Visual Grounding
My book list
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Caffe: a fast open framework for deep learning.
Extrinsic Calibration of a Camera and 2d Laser
A rule-based tunnel in Go.
Contrastive Language-Image Pretraining
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
Simple image captioning model
Video embeddings for retrieval with natural language queries
Prompt Learning for Vision-Language Models
斯坦福的cs231n课程的assignments,非常好的课程,在这里也要强推
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
《数据结构》-严蔚敏.吴伟民-教材源码与习题解析
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Dynamic Early Exit for Image Captioning
deep learning for image processing including classification and object-detection etc.