cvlab-stonybrook / learningtocounteverything Goto Github PK

License: MIT License

Python 100.00%

learningtocounteverything's Introduction

Learning To Count Everything

This is the official implementation of the following CVPR 2021 paper:

Learning To Count Everything
Viresh Ranjan, Udbhav Sharma, Thu Nguyen and Minh Hoai
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Link to arxiv preprint: https://arxiv.org/pdf/2104.08391.pdf

Short presentation video

Dataset download

Images can be downloaded from here: https://drive.google.com/file/d/1ymDYrGs9DSRicfZbSCDiOu0ikGDh5k6S/view?usp=sharing

Precomputed density maps can be found here: https://archive.org/details/FSC147-GT

Place the unzipped image directory and density map directory inside the data directory.

Installation with Conda

conda create -n fscount python=3.7 -y

conda activate fscount

python -m pip install matplotlib opencv-python notebook tqdm

conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.0 -c pytorch

Quick demo

Provide the input image and also provide the bounding boxes of exemplar objects using a text file:

python demo.py --input-image orange.jpg --bbox-file orange_box_ex.txt

Use our provided interface to specify the bounding boxes for exemplar objects

python demo.py --input-image orange.jpg

Evaluation

We are providing our pretrained FamNet model, and the evaluation code can be used without the training.

Testing on validation split without adaptation

python test.py --data_path /PATH/TO/YOUR/FSC147/DATASET/ --test_split val

Testing on val split with adaptation

python test.py --data_path /PATH/TO/YOUR/FSC147/DATASET/ --test_split val --adapt

Training

python train.py --gpu 0

Citation

If you find the code useful, please cite:

@inproceedings{m_Ranjan-etal-CVPR21,
  author = {Viresh Ranjan and Udbhav Sharma and Thu Nguyen and Minh Hoai},
  title = {Learning To Count Everything},
  year = {2021},
  booktitle = {Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition (CVPR)},
}

learningtocounteverything's People

Contributors

Stargazers

Watchers

Forkers

giangbui reddy-preetham ahurta92 nikhilgoel1997 pzsuen bboyhanat prashanth2692 ggsonic huyen-spec tildenj jie311 cv-ip buiduchanh wushilian tungbui198 kpollux qiuqiu-lwf blue-vision adisol freephys quannguyenaut vinhdevnguyen trannguyen1510 huyhoang17 mustphd thorpham benedictflorance cddchen shirley-xie ztyxd jiezishu737 anqck linh0704 shyzhang960522 zyweven neurotech-hq celsopitta lyp0413 xu3xiwang jsheo96 yeshuangfu gravo8 iqra0908 kkk222iu vito1011 thesaab sysuzgg tamnguyenvan hao-jian-qiang xueg-zhou shir3bar xiesongtian leminhuit peakyrick 4ursmile beulah-s sxstone potatothanh fxczy streamer-ap anglebinbin aniruth-gundawar ltan-101104 alenaliu peter12398 happybuby erica-yang lkyqyy hannasiegfried

learningtocounteverything's Issues

Finetuning Code

Can you share the fine-tuning code on custom dataset? @Viresh-R

About training time

Hello, I just paid attention to your work. After trying to run the code, I estimated the running time of 1500 epoch, which is about 10 days. Is this time normal?

Bounding boxes for FSC

Hi @Viresh-R
how did we get the exemplar bbx for fsc dataset ?

annotation_FSC147_384.json

did you plot them manually ?

Hi,
I tried to train your model on my own dataset. I labeled images on CVAT (you recommended) but the annotations are in XML format and you coded for .json file. I was wondering how did you create json file out of XML, although I manually changed it to json but doest not working. I will be thankful for your guidance

Divisible by 8

Hi @Viresh-R,
Why should the resized image be divisible by 8?
Can I discard this and use e.g. scale_factor * W?

class resizeImageWithGT_org(object):
    """
    If either the width or height of an image exceed a specified value, resize the image so that:
        1. The maximum of the new height and new width does not exceed a specified value
        2. The new height and new width are divisible by 8
        3. The aspect ratio is preserved
    No resizing is done if both height and width are smaller than the specified value
    By: Minh Hoai Nguyen ([email protected])
    Modified by: Viresh
    """

Category Label

Do you have the category label of each image? @Viresh-R

How to append new data

I label some new data, but I don't know how to make point map to density map.

COCO-VAL & COCO-TEST

Hi~
I am very interested in your work and following your work. I notice that you also implement experiments in a subset of COCO dataset. However, I do not know which images compose the dataset. So it is hard for me to follow your work. Could you please provide the dataset of COCO_VAL & COCO_TEST?

how to create my datasets

dear author，how to make the datasets, can you tell me ,pls

About annotation.json

Dear Author, I found that the annotation format in your FSC-147 annotation.json file does not belong to any of the CVAT tools provided. Your json file does not contain the category label, and the categories are in another txt. Can you tell me how you got your annotation.json? I'm trying to construct annotations on other datasets in the same format as FSC147, but I have no idea how to get the annotation.json format. I'd really appreciate it if you could give me some hints.

Hello, could you tell me how the density map is generated

how the density map is generated？

about Input and output sizes

Details for training few-shot detectors on FSC 147 dataset.

Thank you for elaborating a comprehensive dataset and a well-designed class-agnostic counting model. We are particular interested in some details on training Few-shot detectors on FSC 147 dataset.

As you have mentioned in the paper, FR few-shot detector and FSOD few-shot detector are trained on FSC 147 dataset for comparison. As the training of detector requires bounding box annotations, did you annotated all bounding boxes on the training set, or swapping the task head of detectors to density map regressors?

We are currently testing few-shot detectors on FSC 147 dataset. We want to follow your settings. It could not be better if you can release the training code for few-shot detectors.

Thanks again for such a fundamental and solid work.

Continue training after crash or shutdown

Hello sir/madam, is there a way to continue training in the situation where my PC crashes or disconnected from the network? A quick feedback would be very much appreciated

About FSC-147 dataset image licensing?

Hi @Viresh-R

What is the licensing for the 6135 images in the dataset? Can I annotate the images and train the model described in your paper from scratch for commercial purposes?

Thanks for the answer in advance.

Regards,
Lakshmi Narayan

Batch Size

Hello,
We noticed the input of FamNet is the whole image and batch size equals to 1.
Have you tried random crop and increase the batch size?

Best wishes! Thank you for your answers in advance!

Using pretrained MobileNet V3 as backbone

Hey,

The problem of counting objects of interest in everyday scenes seems to be a mobile vision application requirement (low compute/high speed constraints). Did you consider using MobileNet V3 as the backbone of your network during training instead of using Resnet50? If yes, what layers in the MobileNet V3 blocks did you use to generate the feature maps. Also, how well did it perform in terms of MAE during validation?

Thanks for the answers in advance.

Regards,
Lakshmi Narayan

Could you please provide the images before resize to 384?

Thanks a lot!

locate each object which have been counted

Hi author,thank you for your awesome job.how to locate the object according the density map?

Aspect ratio not preserved on some images

Dear authors,

I am participating with my team in ML Reproducibility Challenge 2021 and as part of that we analysed how you resized your images to height 384. We noticed that for some images (there are 1200 such images, one example is 1386.jpg) there is a significant differences (more than 0.01) in height and width ratios listed in .json file. Because of that there is a difference (in pixels) in width of your resized images and width of the resized images, if the exact aspect ratio (up to a rounding error) from original size would be preserved. Those width differences are significant (average difference is around 128.5 pixels). Here is the distribution of those differences:

We were wondering if there is a particular reason that those images were resized this way and would appreciate any insight.

Thanks and best regards,
Domen

Cannot open Google link "https://drive.google.com/file/d/1ymDYrGs9DSRicfZbSCDiOu0ikGDh5k6S/view?usp=sharing"

Cannot open Google link "https://drive.google.com/file/d/1ymDYrGs9DSRicfZbSCDiOu0ikGDh5k6S/view?usp=sharing",
Is anyone who can provide the BaiduNetdisk's link???

LearningToCountEverything

count mang objects/
this paper is last year,how about latest papers this year?

About the baselines in your CVPR2021 paper

Hello, in your paper, you compare FamNet with object detectors including: Faster RCNN, RetinaNet, and Mask RCNN in Table 2. In my view, Mask RCNN is very similar to Faster RCNN with an additional segmentation head for semantic segmentation. I wonder the main differences between Faster RCNN and Mask RCNN in your experiments.

error in Intallation of torchvision

The following error was observed while setting up:

(fscount) cs@lai-cs:~/Desktop/SemAug$ conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.0 -c pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

torchvision==0.5.0

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

The multi-scale feature extraction module consists of the first four blocks from a pre-trained ResNet-50 backbone

In the proposed model using ResNet-50 backbone instead of another deeper network. Is there any specific reason to use ResNet-50?

Need some help to count the number of fruits in the garden

Dear Doctor,

I used this model to count the number of fruit in the garden but the result is not good.
Please help me to improve the model in order to make it more accuracy.

The input images

The output images
01s_box.txt

02_box.txt

03_box.txt

04_box.txt

05_box.txt

06_box.txt

debug.log

The standard deviation of the Gaussian kernel

Hi,

Can I ask why the standard deviation of the Gaussian kernel is a quarter of the window size written in the second paragraph in section 3.2?

I understand the whole Gaussian kernel design is adaptive to the sizes of objects. But why using a quarter of the window size for the standard deviation specifically?

Any help would be highly appreciated.

about image 3510.jpg

I found that the image ``3510.jpg'' has 4 queries instead of 3.

Val-COCO and Test-COCO

Thank you for your great work. How can I get Val-COCO and Test-COCO datasets?

Questions about the difference between the real image size and annotated image size

Sorry for the bother, but I found the real image size is inconsistent with that of annotation_FSC147_384.json. For example, the size of download image 1050.jpg is 469 x 384 (W x H), but the size in annotation_FSC147_384.json is 1300 x 1065. I wonder why these two shapes are different and how can I use the bbox in annotation_FSC147_384.json? @Viresh-R @v-hoainm

When will FSC-147 be released?

The validation and test sets can be accessed but there is no ground truth. When will the whole dataset be released?

Some questions about the FamNet ...

I notice the backbone is fixed in your work, Why not train the backbone during training and adapting?
Why not use lower level features from the backbone? Like "feat_map2" or so.

Experiments on CARPK

Hello
I use the pre-trained model provided by you to conduct shot-3 experiments on CARPK, and the result is MAE: 43.02, RMSE: 58.12. Is this normal? How many examples are used in the CARPK experiment in the paper?

Pre-trained model

Hey,

I am wondering on how many epochs have you pre-trained the provided model. I noticed that default value is 1500. Is the provided model pre-trained on 1500 epochs?

And have you considered training with max pooling instead of mean, and if so, what were the results?
And also, have you considered training for multiple scales / feature maps?

Thank you for your answers in advance!

Best,
Matija

How to achieve multi GPU training

About other benchmarks

Hi~

Your work is very interesting, and I am following you work.
I need to compare the performance of GMN, FamNet, and our model, and your FamNet works well.
However, GMN always diverges. I have searched github and found 2 repos of GMN. However, both diverges.

I wonder if you could share your GMN code with me.

My email address is [email protected].

Thanks in advance.

Cannot open the link "https://archive.org/details/FSC147-GT"

Hi, I can't open this link https://archive.org/details/FSC147-GT, Is anyone who can provide the BaiduNetdisk's link???

Available platform plugins are: eglfs, minimal, minimalegl, offscreen, vnc, xcb.

Thank you very much for your marvelous work!

However, after I installed PyQt5 and PyQt6 and tried to input the following command, respectively:
python demo.py --input-image t3.jpg

I received the same error messages,

...
Got keys from plugin meta data ("offscreen")
QFactoryLoader::QFactoryLoader() looking at "/root/miniconda3/envs/abc/plugins/platforms/libqvnc.so"
Found metadata in lib /root/miniconda3/envs/abc/plugins/platforms/libqvnc.so, metadata=
{
"IID": "org.qt-project.Qt.QPA.QPlatformIntegrationFactoryInterface.5.3",
"MetaData": {
"Keys": [
"vnc"
]
},
"className": "QVncIntegrationPlugin",
"debug": false,
"version": 329991
}

Got keys from plugin meta data ("vnc")
QFactoryLoader::QFactoryLoader() looking at "/root/miniconda3/envs/abc/plugins/platforms/libqxcb.so"
Found metadata in lib /root/miniconda3/envs/abc/plugins/platforms/libqxcb.so, metadata=
{
"IID": "org.qt-project.Qt.QPA.QPlatformIntegrationFactoryInterface.5.3",
"MetaData": {
"Keys": [
"xcb"
]
},
"className": "QXcbIntegrationPlugin",
"debug": false,
"version": 329991
}

Got keys from plugin meta data ("xcb")
QFactoryLoader::QFactoryLoader() checking directory path "/root/miniconda3/envs/abc/bin/platforms" ...
loaded library "/root/miniconda3/envs/abc/plugins/platforms/libqoffscreen.so"
QObject::moveToThread: Current thread (0x557d8a72e500) is not the object's thread (0x557d8cbb0630).
Cannot move to target thread (0x557d8a72e500)

qt.qpa.plugin: Could not load the Qt platform plugin "offscreen" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, minimal, minimalegl, offscreen, vnc, xcb.

Aborted (core dumped)

Any help is appreciated.

Dot annotations

Dear Authors,

I didn't find the dot annotations of FSC147 dataset.
Could you provide a download link for it?

Thank you for your answers in advance!

Could you provide density map generator codes?

Dear Authors,

Could you upload the codes to generate density map for training?

Thank you!

Hi, I really like your work.I want to use it for my customized dataset, could you please tell me how can I train it for my customized dataset?plus which format u r using for training? like coco format?

What is the prediction count for the canned demo?

Thank you for your innovative solution to an important problem.
Reproducing the "eval" portion, I get a prediction count of 29.15.

Is that what I should expect?

Invocation and output below.

(pyt1.2) auro@auro-ml:~/LearningToCountEverything$ python demo.py --input-image orange.jpg --bbox-file orange_box_ex.txt

Namespace(adapt=False, bbox_file='orange_box_ex.txt', gpu_id=0, gradient_steps=100, input_image='orange.jpg', learning_rate=1e-07, model_path='./data/pretrainedModels/FamNet_Save1.pth', output_dir='.', weight_mincount=1e-09, weight_perturbation=0.0001)

Bounding boxes: [[71, 49, 104, 83], [134, 119, 169, 151], [7, 200, 44, 236]]

/home/auro/anaconda3/envs/pyt1.2/lib/python3.8/site-packages/torch/nn/functional.py:3060: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
warnings.warn("Default upsampling behavior when mode={} is changed "

===> The predicted count is: 29.15
===> Visualized output is saved to ./orange_out.png

Code for this

Dear @Viresh-R
are you also the author of this paper https://arxiv.org/abs/2205.14212
if yes , do you have the linked code ?

About testing on new images

Hi, could we use it for testing on images without providing the exampler bounding boxes?
thanks in advance