alexolsen / deepweeds Goto Github PK

View Code? Open in Web Editor NEW

183.0 14.0 84.0 550 KB

A Multiclass Weed Species Image Dataset for Deep Learning

Home Page: https://www.nature.com/articles/s41598-018-38343-3

License: Apache License 2.0

Python 36.92% Makefile 3.52% C++ 59.57%

dataset deep-learning weed-species weed image-dataset queensland resnet-50 inceptionv3

deepweeds's Introduction

DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning

This repository makes available the source code and public dataset for the work, "DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning", published with open access by Scientific Reports: https://www.nature.com/articles/s41598-018-38343-3. The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighbouring flora. In our work, the dataset was classified to an average accuracy of 95.7% with the ResNet50 deep convolutional neural network.

The source code, images and annotations are licensed under CC BY 4.0 license. The contents of this repository are released under an Apache 2 license.

Download the dataset images and our trained models

images.zip (468 MB)
models.zip (477 MB)

Due to the size of the images and models they are hosted outside of the Github repository. The images and models must be downloaded into directories named "images" and "models", respectively, at the root of the repository. If you execute the python script (deepweeds.py), as instructed below, this step will be performed for you automatically.

TensorFlow Datasets

Alternatively, you can access the DeepWeeds dataset with TensorFlow Datasets, TensorFlow's official collection of ready-to-use datasets. DeepWeeds was officially added to the TensorFlow Datasets catalog in August 2019.

Weeds and locations

The selected weed species are local to pastoral grasslands across the state of Queensland. They include: "Chinee apple", "Snake weed", "Lantana", "Prickly acacia", "Siam weed", "Parthenium", "Rubber vine" and "Parkinsonia". The images were collected from weed infestations at the following sites across Queensland: "Black River", "Charters Towers", "Cluden", "Douglas", "Hervey Range", "Kelso", "McKinlay" and "Paluma". The table and figure below break down the dataset by weed, location and geographical distribution.

Table 1. The distribution of DeepWeeds images by weed species (row) and location (column).

Figure 2. The geographical distribution of DeepWeeds images across northern Australia (Data: Google, SIO, NOAA, U.S. Navy, NGA, GEBCO; Image © 2018 Landsat / Copernicus; Image © 2018 DigitalGlobe; Image © 2018 CNES / Airbus).

Data organization

Images are assigned unique filenames that include the date/time the image was photographed and an ID number for the instrument which produced the image. The format is like so: YYYYMMDD-HHMMSS-ID, where the ID is simply an integer from 0 to 3. The unique filenames are strings of 17 characters, such as 20170320-093423-1.

labels

The labels.csv file assigns species labels to each image. It is a comma separated text file in the format:

Filename,Label,Species
...
20170207-154924-0,jpg,7,Snake weed
20170610-123859-1.jpg,1,Lantana
20180119-105722-1.jpg,8,Negative
...

Note: The specific label subsets of training (60%), validation (20%) and testing (20%) for the five-fold cross validation used in the paper are also provided here as CSV files in the same format as "labels.csv".

models

We provide the most successful ResNet50 and InceptionV3 models saved in Keras' hdf5 model format. The ResNet50 model, which provided the best results, has also been converted to UFF format in order to construct a TensorRT inference engine.

resnet.hdf5
inception.hdf5
resnet.uff

deepweeds.py

This python script trains and evaluates Keras' base implementation of ResNet50 and InceptionV3 on the DeepWeeds dataset, pre-trained with ImageNet weights. The performance of the networks are cross validated for 5 folds. The final classification accuracy is taken to be the average across the five folds. Similarly, the final confusion matrix from the associated paper aggregates across the five independent folds. The script also provides the ability to measure the inference speeds within the TensorFlow environment.

The script can be executed to carry out these computations using the following commands.

To train and evaluate the ResNet50 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model resnet.
To train and evaluate the InceptionV3 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model inception.
To measure inference times for the ResNet50 model, use python3 deepweeds.py inference --model models/resnet.hdf5.
To measure inference times for the InceptionV3 model, use python3 deepweeds.py inference --model models/inception.hdf5.

Dependencies

The required Python packages to execute deepweeds.py are listed in requirements.txt.

tensorrt

This folder includes C++ source code for creating and executing a ResNet50 TensorRT inference engine on an NVIDIA Jetson TX2 platform. To build and run on your Jetson TX2, execute the following commands:

cd tensorrt/src
make -j4
cd ../bin
./resnet_inference

Citations

If you use the DeepWeeds dataset in your work, please cite it as:

IEEE style citation: “A. Olsen, D. A. Konovalov, B. Philippa, P. Ridd, J. C. Wood, J. Johns, W. Banks, B. Girgenti, O. Kenny, J. Whinney, B. Calvert, M. Rahimi Azghadi, and R. D. White, “DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning,” Scientific Reports, vol. 9, no. 2058, 2 2019. [Online]. Available: https://doi.org/10.1038/s41598-018-38343-3 ”

BibTeX

@article{DeepWeeds2019,
  author = {Alex Olsen and
    Dmitry A. Konovalov and
    Bronson Philippa and
    Peter Ridd and
    Jake C. Wood and
    Jamie Johns and
    Wesley Banks and
    Benjamin Girgenti and
    Owen Kenny and 
    James Whinney and
    Brendan Calvert and
    Mostafa {Rahimi Azghadi} and
    Ronald D. White},
  title = {{DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning}},
  journal = {Scientific Reports},
  year = 2019,
  number = 2058,
  month = 2,
  volume = 9,
  issue = 1,
  day = 14,
  url = "https://doi.org/10.1038/s41598-018-38343-3",
  doi = "10.1038/s41598-018-38343-3"
}

deepweeds's People

Contributors

Stargazers

Watchers

Forkers

johndpope hsiyjnd lzu-tianchi hardi-sura taanis98 denisdpr srikanthadya alzayats zkghit alexjs96 perryxdeng sawon1234 pmaini sushantjha8 pengxbin jingweimo avkash linhduongtuan erdiansanheng zgle-fork deepenpatel19 salihah-rilvan jayerick albertojmedina hlydecker deepmeditativemind saultes45 amidumohammed yf-indrata dylanclarkoffical chlorovision jojosr wiekiang abrantes-scihub cq2019git lokote420 greencultureai tengfeixue-victor hrshwrdhn snowbhr06 praneet1997 lovecove orngeatom steger123 prateekmalhotra robmullaney mobsystems 2021-paper-fun picasso999 hami-sh josemenber sand-train sogggy ryangarza slamijami pandafangtao nazirahme basanda-gif husnain08 kmarif asmaamahmoudsaeed adarshsiva scottyy-m raji-ahmad jazmany23 desmondto clementkfj weenleen 2022-paper-fun sgraine jmad1v07 audiowiz babebbu neukaren j87 samkenxstream dcbalderas watcharabulsak main-c-int zxl-206906 haitianwang pranav-on-github

deepweeds's Issues

Enhancement / scale / rotation invariance built into network

I was reading your paper -
. Then, each image was randomly scaled both vertically and horizontally in the range of [0.5, 1]. Each colour channel was randomly shifted within the range of ±25 (i.e. approximately ±10% of the maximum available 8-bit colour encoding range [0, 255]). To account for illumination variance, pixel intensity was randomly shifted within the [−25, +25] range, shifting all colour channels uniformly. In addition, pixel intensity was randomly scaled within the [0.75, 1.25] range

Are you familiar with this repo https://github.com/tueimage/SE2CNN ?
It may be more fruitful in optimizing network by automagically providing roto scale invariance.

Can u Help Me Type Error

Good luck with. I have encountered such an error while instructing the model according to your instructions. I get a Type error in "train_data_generator" in "deepweeds.py". How can I solve this error? If you help. I'm glad. Thank you good work.

Request: Original or higher resolution dataset

Hi Alex and Team, thanks for your great work.

Would it be possible to obtain the original dataset of images?

I've found that a model can be trained and tested with high accuracy after replicating your process with Resnet-50 and PyTorch, however I'm struggling with inference on images outside the dataset - they're generally much poorer (in particular, the confusion matrix results between Lantana, Snake Weed, and Rubber Vine). I would like to experiment with different transform techniques as I believe preserving aspect ratio of the weed, management of color/contrast (etc) may help.

Cheers,
Mitch

EDIT: Some samples, confusion matrix, etc.
DeepWeeds_Ten_Samples_OutofdatasetInference_1Oct2019.pdf

not an issue - related FasterRCNN

https://github.com/johndpope/FasterRCNNTutorial

Location data for images

Hi,

Thanks for providing this dataset.

Since you've collected your dataset from different named locations (Black River, Charter Towers etc.), I would like to test how well my model generalises by learning about a particular species from one location and testing it in another location. Do you have data on the location where each image was taken? Is it perhaps something that could be inferred from the date?

Inference on single images

Hi Alex, thanks so much for sharing the code! I am new to deep learning and found your commented code very helpful and clear.

I am attempting to run inference on single images from your study, using one of your pre-trained models. I think I’m running into some issues though. I downloaded the ResNet-50 model (resnet.hdf5) and loaded it to make predictions on single images at a time (using deepweeds.inference() ). However all the predictions are for the Negative class, since this probability is always much higher than the remaining classes.

Also tried running model.predict_generator() (from deepweeds.cross_validate() ) on just a test subset of data (‘test_subset0.csv’), to see if the predictions turned out differently. This was done on a Google Colab notebook with GPU, but it seems to be hanging and not completing with both ~3500 images and ~10 images (to see if runtime was the issue).

Do you know what I might be doing wrong?

Thanks!

Souce code and model request

I read your research paper and downloaded your images. It's a great effort compared to other available datasets. Could you please share some source code and model to replicate your results presented in the paper? I need it for a class project and it would be a huge task if I were to build the system for basic frameworks. Thanks.

Issues with global variables

@AlexOlsen, thank you for putting this dataset together. After playing with your code and reading your publication, I think I see an error in your global variables in deepweeds.py:

Line 37: Should MAX_EPOCH = 32?
Line 40: Should STOPPING_PATIENCE = 2?
Line 43: Shouldn't these be strings? CLASSES = [str(i) for i in range(9)]

Paper details clarification

Hi Alex,

This is really a very good work. Congrats !!

I would like to ask you some help to understand 2 points in your work.

I’m a little bit confused because you are using transfer learning, and I saw in other works that for transfer learning in general we only train a new 'top'. But you make all the layers from the pre-trained ResNet ‘trainable’ and then train the model for just two single epochs. How could just two epochs give you a good accuracy with only aprox. 1k images per class training the full network? Also, there is just a single Dense layer resposible for the prediction (binary_crossentropy).
You are using a ‘negative’ class. But the number of negative examples is bigger than the total of all the other classes. If someone decide to use a negative class as well, how to calculate (or balance) the negative class with the other classes ?

Best Regards.
Kleyson Rios.

No labels in the images.zip

There, I am interested in working on the data. Thanks for your work.

However, I can not find the labels in the images.zip.

There are no folders in the unzip folder. All images are just in one single folder.

Thanks

long time for the first inference

Hi,
I've bumped into this dataset and think it's pretty cool ! Nice work indeed 👍

I give it a quick try here to measure inference times
https://github.com/alext234/deep-weeds-experiments/blob/master/inference-times.ipynb

I realise the first inference is always much slower. Is there any particular reason why it is so? Perhaps this behaviour might be something very specific to keras/tf