Comments (6)
Hi @fuweifu-vtoo
Sorry for the late reply. Essentially the Text Encoder in CLIP is also a BERT, so it's not much different from Grounding DINO. The difference lies in how these two models model the text prompt. Grounding DINO feeds the whole sentence into BERT, and then takes the embedding of the corresponding text as the representation. In T-Rex2, we use Phrase as the input to BERT, and only take the output of CLS TOKEN as the text representation. As for the reason, since T-Rex2 is a late fusion structure, i.e., the text embedding will not interact with the image features, and will only compute the similarity with the query at the final output layer, we want to make this text representation as simple as possible, i.e., no matter how long the input phrases are, we just want to represent them as one embedding.
from t-rex.
Thank you for your detailed explanation~
Another question I also hope to get your answer is, does T-Rex2 freeze the CLIP text encoder during training?
from t-rex.
Also, how long does it take to train a T-Rex2 with swin Transformer tiny model on 16 NVIDIA A100 GPUs with a total batch size of 128?
from t-rex.
The CLIP text encoder is not frozen during the training process. It takes around 3 days to train a Swin-T model
from t-rex.
@Mountchicken How many iters did Trex v2 T train? Three days is much shorter than my estimated time.
from t-rex.
T-Rex2 has gone through multiple rounds of training. We will first train the text prompt and then load this weights before training with visual prompt. The last training phase took about 3 days and 100000 iterations were trained.
from t-rex.
Related Issues (20)
- Will TRex-1 model file will be provided ? HOT 1
- What did you guys open source? Maybe added this on your docs HOT 1
- H
- Object tracking in video HOT 1
- 跑偏了。 HOT 1
- demo result HOT 1
- K set of InfoNCE Loss at Region-Level Contrastive Alignment HOT 2
- Requesting customize_embedding api failed without any error message HOT 6
- code HOT 1
- How to get my token HOT 1
- Testing on the Videos? HOT 2
- About the T-Rex / T-Rex2 demo HOT 6
- 礼貌询问,请问有没有训练代码呀 HOT 1
- API免费试用次数用完后嗯么收费? HOT 1
- contrastive embedding HOT 1
- Access to T-REX2 HOT 3
- Request website is not available HOT 1
- About the training process of T-rex2 HOT 6
- How to get the coordinate position information of all detection frames HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from t-rex.