Code Monkey home page Code Monkey logo

learningtocounteverything's Introduction

Learning To Count Everything

image

This is the official implementation of the following CVPR 2021 paper:

Learning To Count Everything
Viresh Ranjan, Udbhav Sharma, Thu Nguyen and Minh Hoai
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Link to arxiv preprint: https://arxiv.org/pdf/2104.08391.pdf

Short presentation video

Short Presentation

Dataset download

Images can be downloaded from here: https://drive.google.com/file/d/1ymDYrGs9DSRicfZbSCDiOu0ikGDh5k6S/view?usp=sharing

Precomputed density maps can be found here: https://archive.org/details/FSC147-GT

Place the unzipped image directory and density map directory inside the data directory.

Installation with Conda

conda create -n fscount python=3.7 -y

conda activate fscount

python -m pip install matplotlib opencv-python notebook tqdm

conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.0 -c pytorch

Quick demo

Provide the input image and also provide the bounding boxes of exemplar objects using a text file:

python demo.py --input-image orange.jpg --bbox-file orange_box_ex.txt 

Use our provided interface to specify the bounding boxes for exemplar objects

python demo.py --input-image orange.jpg

Evaluation

We are providing our pretrained FamNet model, and the evaluation code can be used without the training.

Testing on validation split without adaptation

python test.py --data_path /PATH/TO/YOUR/FSC147/DATASET/ --test_split val

Testing on val split with adaptation

python test.py --data_path /PATH/TO/YOUR/FSC147/DATASET/ --test_split val --adapt

Training

python train.py --gpu 0

Citation

If you find the code useful, please cite:

@inproceedings{m_Ranjan-etal-CVPR21,
  author = {Viresh Ranjan and Udbhav Sharma and Thu Nguyen and Minh Hoai},
  title = {Learning To Count Everything},
  year = {2021},
  booktitle = {Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition (CVPR)},
}

learningtocounteverything's People

Contributors

gshoai avatar viresh-r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

learningtocounteverything's Issues

About training time

Hello, I just paid attention to your work. After trying to run the code, I estimated the running time of 1500 epoch, which is about 10 days. Is this time normal?

Training on new dataset

Hi,
I tried to train your model on my own dataset. I labeled images on CVAT (you recommended) but the annotations are in XML format and you coded for .json file. I was wondering how did you create json file out of XML, although I manually changed it to json but doest not working. I will be thankful for your guidance

Divisible by 8

Hi @Viresh-R,
Why should the resized image be divisible by 8?
Can I discard this and use e.g. scale_factor * W?

class resizeImageWithGT_org(object):
    """
    If either the width or height of an image exceed a specified value, resize the image so that:
        1. The maximum of the new height and new width does not exceed a specified value
        2. The new height and new width are divisible by 8
        3. The aspect ratio is preserved
    No resizing is done if both height and width are smaller than the specified value
    By: Minh Hoai Nguyen ([email protected])
    Modified by: Viresh
    """

COCO-VAL & COCO-TEST

Hi~
I am very interested in your work and following your work. I notice that you also implement experiments in a subset of COCO dataset. However, I do not know which images compose the dataset. So it is hard for me to follow your work. Could you please provide the dataset of COCO_VAL & COCO_TEST?

About annotation.json

Dear Author, I found that the annotation format in your FSC-147 annotation.json file does not belong to any of the CVAT tools provided. Your json file does not contain the category label, and the categories are in another txt. Can you tell me how you got your annotation.json? I'm trying to construct annotations on other datasets in the same format as FSC147, but I have no idea how to get the annotation.json format. I'd really appreciate it if you could give me some hints.

Details for training few-shot detectors on FSC 147 dataset.

Thank you for elaborating a comprehensive dataset and a well-designed class-agnostic counting model. We are particular interested in some details on training Few-shot detectors on FSC 147 dataset.

As you have mentioned in the paper, FR few-shot detector and FSOD few-shot detector are trained on FSC 147 dataset for comparison. As the training of detector requires bounding box annotations, did you annotated all bounding boxes on the training set, or swapping the task head of detectors to density map regressors?

We are currently testing few-shot detectors on FSC 147 dataset. We want to follow your settings. It could not be better if you can release the training code for few-shot detectors.

Thanks again for such a fundamental and solid work.

Continue training after crash or shutdown

Hello sir/madam, is there a way to continue training in the situation where my PC crashes or disconnected from the network? A quick feedback would be very much appreciated

About FSC-147 dataset image licensing?

Hi @Viresh-R

What is the licensing for the 6135 images in the dataset? Can I annotate the images and train the model described in your paper from scratch for commercial purposes?

Thanks for the answer in advance.

Regards,
Lakshmi Narayan

Batch Size

Hello,
We noticed the input of FamNet is the whole image and batch size equals to 1.
Have you tried random crop and increase the batch size?

Best wishes! Thank you for your answers in advance!

Using pretrained MobileNet V3 as backbone

Hey,

The problem of counting objects of interest in everyday scenes seems to be a mobile vision application requirement (low compute/high speed constraints). Did you consider using MobileNet V3 as the backbone of your network during training instead of using Resnet50? If yes, what layers in the MobileNet V3 blocks did you use to generate the feature maps. Also, how well did it perform in terms of MAE during validation?

Thanks for the answers in advance.

Regards,
Lakshmi Narayan

Aspect ratio not preserved on some images

Dear authors,

I am participating with my team in ML Reproducibility Challenge 2021 and as part of that we analysed how you resized your images to height 384. We noticed that for some images (there are 1200 such images, one example is 1386.jpg) there is a significant differences (more than 0.01) in height and width ratios listed in .json file. Because of that there is a difference (in pixels) in width of your resized images and width of the resized images, if the exact aspect ratio (up to a rounding error) from original size would be preserved. Those width differences are significant (average difference is around 128.5 pixels). Here is the distribution of those differences:
error_dist
We were wondering if there is a particular reason that those images were resized this way and would appreciate any insight.

Thanks and best regards,
Domen

About the baselines in your CVPR2021 paper

Hello, in your paper, you compare FamNet with object detectors including: Faster RCNN, RetinaNet, and Mask RCNN in Table 2. In my view, Mask RCNN is very similar to Faster RCNN with an additional segmentation head for semantic segmentation. I wonder the main differences between Faster RCNN and Mask RCNN in your experiments.
企业微信截图_16351600676080

error in Intallation of torchvision

The following error was observed while setting up:

(fscount) cs@lai-cs:~/Desktop/SemAug$ conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.0 -c pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • torchvision==0.5.0

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

The standard deviation of the Gaussian kernel

Hi,

Can I ask why the standard deviation of the Gaussian kernel is a quarter of the window size written in the second paragraph in section 3.2?

I understand the whole Gaussian kernel design is adaptive to the sizes of objects. But why using a quarter of the window size for the standard deviation specifically?

Any help would be highly appreciated.

Some questions about the FamNet ...

  1. I notice the backbone is fixed in your work, Why not train the backbone during training and adapting?
  2. Why not use lower level features from the backbone? Like "feat_map2" or so.

Experiments on CARPK

Hello
I use the pre-trained model provided by you to conduct shot-3 experiments on CARPK, and the result is MAE: 43.02, RMSE: 58.12. Is this normal? How many examples are used in the CARPK experiment in the paper?

Pre-trained model

Hey,

I am wondering on how many epochs have you pre-trained the provided model. I noticed that default value is 1500. Is the provided model pre-trained on 1500 epochs?

And have you considered training with max pooling instead of mean, and if so, what were the results?
And also, have you considered training for multiple scales / feature maps?

Thank you for your answers in advance!

Best,
Matija

About other benchmarks

Hi~

Your work is very interesting, and I am following you work.
I need to compare the performance of GMN, FamNet, and our model, and your FamNet works well.
However, GMN always diverges. I have searched github and found 2 repos of GMN. However, both diverges.

I wonder if you could share your GMN code with me.

My email address is [email protected].

Thanks in advance.

Available platform plugins are: eglfs, minimal, minimalegl, offscreen, vnc, xcb.

Thank you very much for your marvelous work!

However, after I installed PyQt5 and PyQt6 and tried to input the following command, respectively:
python demo.py --input-image t3.jpg

I received the same error messages,


...
Got keys from plugin meta data ("offscreen")
QFactoryLoader::QFactoryLoader() looking at "/root/miniconda3/envs/abc/plugins/platforms/libqvnc.so"
Found metadata in lib /root/miniconda3/envs/abc/plugins/platforms/libqvnc.so, metadata=
{
"IID": "org.qt-project.Qt.QPA.QPlatformIntegrationFactoryInterface.5.3",
"MetaData": {
"Keys": [
"vnc"
]
},
"className": "QVncIntegrationPlugin",
"debug": false,
"version": 329991
}

Got keys from plugin meta data ("vnc")
QFactoryLoader::QFactoryLoader() looking at "/root/miniconda3/envs/abc/plugins/platforms/libqxcb.so"
Found metadata in lib /root/miniconda3/envs/abc/plugins/platforms/libqxcb.so, metadata=
{
"IID": "org.qt-project.Qt.QPA.QPlatformIntegrationFactoryInterface.5.3",
"MetaData": {
"Keys": [
"xcb"
]
},
"className": "QXcbIntegrationPlugin",
"debug": false,
"version": 329991
}

Got keys from plugin meta data ("xcb")
QFactoryLoader::QFactoryLoader() checking directory path "/root/miniconda3/envs/abc/bin/platforms" ...
loaded library "/root/miniconda3/envs/abc/plugins/platforms/libqoffscreen.so"
QObject::moveToThread: Current thread (0x557d8a72e500) is not the object's thread (0x557d8cbb0630).
Cannot move to target thread (0x557d8a72e500)

qt.qpa.plugin: Could not load the Qt platform plugin "offscreen" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, minimal, minimalegl, offscreen, vnc, xcb.

Aborted (core dumped)


Any help is appreciated.

Dot annotations

Dear Authors,

I didn't find the dot annotations of FSC147 dataset.
Could you provide a download link for it?

Thank you for your answers in advance!

What is the prediction count for the canned demo?

orange_out
IMG_1092_out

Thank you for your innovative solution to an important problem.
Reproducing the "eval" portion, I get a prediction count of 29.15.

Is that what I should expect?

Invocation and output below.

(pyt1.2) auro@auro-ml:~/LearningToCountEverything$ python demo.py --input-image orange.jpg --bbox-file orange_box_ex.txt

Namespace(adapt=False, bbox_file='orange_box_ex.txt', gpu_id=0, gradient_steps=100, input_image='orange.jpg', learning_rate=1e-07, model_path='./data/pretrainedModels/FamNet_Save1.pth', output_dir='.', weight_mincount=1e-09, weight_perturbation=0.0001)

Bounding boxes: [[71, 49, 104, 83], [134, 119, 169, 151], [7, 200, 44, 236]]

/home/auro/anaconda3/envs/pyt1.2/lib/python3.8/site-packages/torch/nn/functional.py:3060: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
warnings.warn("Default upsampling behavior when mode={} is changed "

===> The predicted count is: 29.15
===> Visualized output is saved to ./orange_out.png

About testing on new images

Hi, could we use it for testing on images without providing the exampler bounding boxes?
thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.