visionlearninggroup / text-to-clip_retrieval Goto Github PK
View Code? Open in Web Editor NEWImplementation for "Multilevel Language and Vision Integration for Text-to-Clip Retrieval"
License: MIT License
Implementation for "Multilevel Language and Vision Integration for Text-to-Clip Retrieval"
License: MIT License
Dear author:
The blob defined in /caffe3d/include/caffe/blob.hpp has the dimensional of 4,
inline int num() const { return LegacyShape(0); } /// @brief Deprecated legacy shape accessor channels: use shape(1) instead. inline int channels() const { return LegacyShape(1); } /// @brief Deprecated legacy shape accessor height: use shape(2) instead. inline int height() const { return LegacyShape(2); } /// @brief Deprecated legacy shape accessor width: use shape(3) instead. inline int width() const { return LegacyShape(3); } inline int LegacyShape(int index) const { CHECK_LE(num_axes(), 4) << "Cannot use legacy accessors on Blobs with > 4 axes."; CHECK_LT(index, 4); CHECK_GE(index, -4);
but the data layer of the RPN model has dimensional of 5, /experienments/Text_to_Clip/test_fast/test_rpn.prototxt
layer {
name: "data"
top: 'data'
type: "Input"
input_param {
shape {dim: 1 dim: 3 dim: 768 dim: 112 dim: 112}
}
}
When i use the RPN model, there is a check failure.
Where do I need to change? Looking forward to your reply!
Hi,
I am not familiar with Caffe and I have a question about your query-guided Segment Proposal Network.
As you mentioned in README.md, there are three stages in the pipeline:
Your paper used an LSTM to embed the query. However, I cannot find the participation of LSTM in the first stage (training the SPN). And where can I find the code of this LSTM module?
Hi,
Thanks for putting this together and make it public.
Are the following files required to run this project?
caffe3d/python/caffe/__init__.pyc
caffe3d/python/caffe/classifier.pyc
caffe3d/python/caffe/detector.pyc
caffe3d/python/caffe/io.pyc
caffe3d/python/caffe/net_spec.pyc
caffe3d/python/caffe/pycaffe.pyc
experiments/Text_to_Clip/test_fast/_init_paths.pyc
lib/nms/__init__.pyc
lib/tdcnn/__init__.pyc
lib/tdcnn/config.pyc
lib/tdcnn/nms_wrapper.pyc
lib/tdcnn/twin_transform.pyc
lib/utils/__init__.pyc
lib/utils/blob.pyc
lib/utils/timer.pyc
If not, it would be better to remove them. It's usually a good practice to not tracked them by adding *.pyc
into a .gitignore
file in the root folder.
I managed to reproduce all the testing/inference results. My results are below:
There is a discrepancy of 0.1%, 15.48% v.s. 15.6% claimed in the README. That looks reasonable to me. Could you please confirm so?
Namespace(gt_file='../../../../preprocess/caption_gt_test.json', pred_file='../sim_iter_5000.p', recall=[1, 5, 10], tiou=[0.1, 0.3, 0.5, 0.7])
[email protected] : [1, 5, 10]
0.639247311828
0.99247311828
0.99623655914
[email protected] : [1, 5, 10]
0.513709677419
0.948924731183
0.989784946237
[email protected] : [1, 5, 10]
0.337096774194
0.764784946237
0.922043010753
[email protected] : [1, 5, 10]
0.154838709677
0.447043010753
0.618279569892
Is the recursive clone really needed during the installation - step 1?
Apparently, caffe3d
was added as a whole in this repo. Please confirm if it's unneeded.
Hi,
there are a lot of version of Charade dataset, May I know which one should I choose?
VU17_Charades.zip (Annotations and evaluation scripts)
Training and Validation videos (scaled to 480p, 13 GB)
Training and Validation videos (original size) (55 GB)
Training and Validation videos as RGB frames at 24fps (76 GB)
Training and Validation videos as Optical Flow at 24fps (45 GB)
Training and Validation videos as Two-Stream FC7 features (RGB stream, 12 GB)
Training and Validation videos as Two-Stream FC7 features (Flow stream, 15 GB)
thanks and regards
Dear author:
Could you provide the Text-to-Clip model on activitynet dataset?
thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.