Code Monkey home page Code Monkey logo

phenocamcnn2's Introduction

This is the repository for the following study:

Taylor S.D., Browning D.M. 2022. Classification of Daily Crop Phenology in PhenoCams Using Deep Learning and Hidden Markov Models. Remote Sensing. 14(2):286. https://doi.org/10.3390/rs14020286 [Preprint, Data & Code Archive]

Note the initial version was titled "Deep learning models for identifying crop and field attributes from near surface cameras" and was changed during the review process.

File Structure

A. Initial Preparation

phenocam_data_prep/

  1. generate_site_list.R - with the phenocamr package make a list of cameras to use in the study to site_list.csv.
  2. download_phenocam_data.R - for each camera download the Gcc and and transition dates to data/phenocam_gcc/.
  3. generate_training_image_list.R - for each camera use the Gcc transition dates to partition each calendar year into distinct periods of senesced, growth, peak, and senescing. Randomly choose images among these periods for each camera to make a list of training images. Creates the file data/images_for_annotation.csv. Also creates the file data/full_image_list.csv which is is the mid-day image for every available day for all sites in site_list.csv.
  4. download_training_phenocam_images.R - download all images in data/images_for_annotation.csv.
  5. generate_extra_image_list.R - for each image in data/images_for_annotation.csv get the download link of all images from the respective day. These are the ~80k additional images taken from 0900-1500 described in the text. Creates the file data/extra_images_for_fitting.csv
  6. download_extra_phenocam_images.R - download all images in data/extra_images_for_fitting.csv to the folder data/extra_phenocam_train_images/
  7. download_all_phenocam_images.R - download all images in data/full_image_list.csv to data/phenocam_all_images/

B. Training Image Annotation

train_image_annotation/

  1. imageant_config2.ias - configuration file for the annotation software, imageant. https://gitlab.com/stuckyb/imageant. Note I used an older version that was is currently availble, and this ias file will not work with the current version. In fact, the one I used is so old it does not have a version number. But it lives at commit 3c2fd39.
  2. imageant_session2.csv - this is a session file for imageant
  3. image_classes.csv - pairing for annotation numeric and text labels. (eg. dominant cover class 1 = vegetation).
  4. merge_new_crop_types.R - A little needed data munging. See file for details.
  5. image_annotations.csv - The final annotation file from imagant, eg. the file with all the training/validation data labels. This is used in model fitting and final evaluation.

C. Model Fitting and prediction

  1. fit_keras_model.py - VGG16 model fitting. Uses all annotated and "extra" images to fit the model and writes the keras file data/vgg16_v4_20epochs.h5. Excluding images due to low class prevalance, the train/test split, and resampling using weights is done here.
  2. apply_keres_model.py - Using the fitted VGG16 model make predictions on everything in data/extra_phenocam_train_images/ and data/phenocam_all_images/. Writes those predictions to data/vgg16_v4_20epochs_predictions.csv

D. Postprocessing

classification_postprocessing/

  1. prep_predictions_for_hmm.R - takes the file data/vgg16_v4_20epochs_predictions.csv and preps the predictions for the HMM aspect (see apply_hmm_model.py). Creates several files data/image_predictions_for_*.
  2. final_processing.R- Takes output from the HMM model (./data/hmm_output.csv) and applies the final post-processing steps (see text) producing ./data/final_predictions.csv.
  3. postprocessing_tools.R - helper functions.

E. Hidden Markov Model (HMM)

  1. hmm_stuff/hmm_model_definitions.py - This describes the HMM models using the pomegranate package. https://pomegranate.readthedocs.io.
  2. apply_hmm_model.py - Applies HMM model to the data/image_predictions_for_* files, applies the HMM models, and creates ./data/hmm_output.csv.

F. Analayis

analysis/

  1. evaluate_predictions.R - calculate error metrcis and create manuscript F1/precision/recall figures. This uses all files in the process: data/vgg16_v4_20epochs_predictions.csv, image_annotations.csv, and ./data/final_predictions.csv.
  2. timeseries_figures.R - produces the colorful timeseries figures for each site year.
  3. site_map_and_table.R - produces supplemental map and site table.
  4. single_image_diagnostic_plots.R - produces the supplemental figures where prediction probabilites for single images are displayed.

Workflow

The modelling workflow went as so.

  1. Determine the needed images and download all of them (A)
  2. Annotate all the images (B).
  3. Fit the vgg16 model and predict on the full image dataset (C)
  4. Apply post processing (D,E). With the VGG16 output post-processing is in the following order:
    1. prep_predictions_for_hmm.R
    2. apply_hmm_model.py
    3. final_processing.R
  5. Analize and Visualize (F).

Data

None of the phenocam images are in the repo but can be downloaded with the scripts in phenocam_data_prep/.
The following files are not in the github repo because they are too large, but can be found in the zenodo repo (https://doi.org/10.5281/zenodo.5579796)

  • data/vgg16_v4_20epochs.h5 - this is the fitted keras classification model.
  • data/vgg16_v4_20epochs_predictions.csv - this contains the initial image classifications prior to post-processing.
  • data/final_predictions.csv - the final predictions after post-processing.

Using the predictions

If you'd like to use the predictions in remote sensing models or elsewhere you need the following two files:

  • data/final_predictions.csv - the final predictions after post-processing.
  • site_list.csv - site metadata.

final_predictions.csv has, for all available sites, a date and predicted status for the three categories. site_list.csv has the latitude and longitude of all sites, and other metadata from the phenocam database. The files are joined via the phenocam_name column.
The final_predictions.csv file has predictions for sites in site_list.csv, given the constraints described in the paper, through 2021-09-27.

phenocamcnn2's People

Contributors

sdtaylor avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

phenocamcnn2's Issues

Journal to submit to

Note: submitting Oct 1 to put it in FY22

Environmental Research Letters - https://iopscience.iop.org/journal/1748-9326
classification stuff may be too specific for here, but maybe worth a shot. Scope seems to emphasize policy relevant stuff, but also management. Agriculture is well within scope.

Science of remote sensing - fits scope but would need to justify novelty. ie "beyond Gcc with phenocams"

Agricultural and Forest Meteorology - absolutely fits here, $3750

Applications in Plant Sciences - $1500
https://bsapubs.onlinelibrary.wiley.com/journal/21680450
EIC sent invite based on preprint

MDPI Sensors - $2400 - several special issues related to machine learning/AI in AG.
machine learning in ag special issue, due oct 2021
Application of Artificial Neural Network and Sensing in Advanced Agriculture - due march 2022
Deep Learning Methods for Remote Sensing - due march 2022

multi-label model classification notes

Vgg16 model using 6714 annotated images. 20k train image sample size, 0.2 validation fraction, (224,224) image size.

lr = 0.01, epsilon = 0.1, 50 epochs

in confusion matrices the columns are predicted and rows are true labels

dominant cover

              precision    recall  f1-score   support

     unknown       0.59      0.71      0.65        14
  vegetation       0.90      0.91      0.91       793
     residue       0.67      0.65      0.66       293
        soil       0.57      0.53      0.55       144
        snow       0.90      0.77      0.83        82
       water       0.41      0.81      0.54        16

    accuracy                           0.80      1342
   macro avg       0.67      0.73      0.69      1342
weighted avg       0.81      0.80      0.80      1342

class_description  unknown  vegetation  residue  soil  snow  water
class_description                                                 
unknown                 10           1        3     0     0      0
vegetation               0         725       40    19     5      4
residue                  1          61      190    33     2      6
soil                     0          15       45    77     0      7
snow                     6           1        5     5    63      2
water                    0           0        2     1     0     13

crop type

               precision    recall  f1-score   support

       uknown       0.56      0.71      0.63        14
unknown_plant       0.74      0.63      0.68       327
  large_grass       0.67      0.94      0.78       238
  small_grass       0.78      0.80      0.79       258
        other       0.67      0.54      0.60       188
       fallow       0.38      0.87      0.53        15
         none       0.86      0.75      0.80       302

     accuracy                           0.73      1342
    macro avg       0.67      0.75      0.69      1342
 weighted avg       0.75      0.73      0.73      1342

class_description  uknown  unknown_plant  large_grass  small_grass  other  fallow  none
class_description                                                                      
uknown                 10              1            0            0      0       0     3
unknown_plant           0            205           48           15     22       7    30
large_grass             0              9          223            4      2       0     0
small_grass             0             12           16          206     21       2     1
other                   0             18           38           27    101       1     3
fallow                  0              0            0            0      1      13     1
none                    8             33            7           12      3      11   228

crop status

             precision    recall  f1-score   support

     unknown       0.56      0.71      0.63        14
   emergence       0.70      0.58      0.63       190
      growth       0.80      0.91      0.85       438
     flowers       0.76      0.69      0.72       160
   senescing       0.58      0.64      0.60       138
    senesced       0.56      0.55      0.55       100
     no_crop       0.87      0.78      0.83       302

    accuracy                           0.75      1342
   macro avg       0.69      0.70      0.69      1342
weighted avg       0.75      0.75      0.75      1342

class_description  unknown  emergence  growth  flowers  senescing  senesced  no_crop
class_description                                                                   
unknown                 10          0       1        0          0         0        3
emergence                0        110      53        1          4         2       20
growth                   0         22     400        8          5         3        0
flowers                  0          0      26      110         23         1        0
senescing                0          0      11       25         88        12        2
senesced                 0          5       2        1         28        55        9
no_crop                  8         20       6        0          5        26      237

image classifier improvements

use some sort of time series segmentation net.

some potential useful things:

timeseries + machine learning package https://dmbee.github.io/seglearn/index.html
several img segmentation models https://github.com/divamgupta/image-segmentation-keras
the penguin video CNN paper https://www.biorxiv.org/content/10.1101/2020.06.29.177261v1.full
in depth segmentation explainer: https://divamgupta.com/image-segmentation/2019/06/06/deep-learning-semantic-segmentation-keras.html

Classification Categories

BBCH categories Straight from wikipedia https://en.wikipedia.org/wiki/BBCH-scale

Principal growth stages

  1. Germination, sprouting, bud development
  2. Leaf development
  3. Formation of side shoots, tillering
  4. Stem elongation or rosette growth, shoot development
  5. Development of harvestable vegetative plant parts, bolting
  6. Inflorescene emergence, heading
  7. Flowering
  8. Development of fruit
  9. Ripening or maturity of fruit and seed
  10. Senescence, beginning of dormancy

Being used here:
See updated multiple category classifications below

class value name description
0 Emergence First shoots and/or leaves are visible.
1 Growth State Plants have several distince leaves and/or tillers visible, but no visible tassels,flowers,or fruit.
2 Tassles or Flowers Plants have distinct tassels, flowers, or fruit.
3 Senescing 10% or more of visible plants are senescing.
4 Fully Senesced 90% or more of visible plants are senesced.
5 Harvested and/or Plowed Field Over 50% of the primary field has been harvested or plowed.
6 Snow Covered Field Over 10% of the camera field of view has snow.
7 Flooded Field Over 10% of the camera field of view has standing water.
8 Unknown The image is blurry, obstructed, or otherwise indiscernible.

TODO:

manuscript:

  • do data/software citations
  • need a table/map of sites
  • need some example images
  • need license statement for all phenocam images
  • cite Borowiec 2021 - https://doi.org/10.32942/osf.io/nt3as
  • new figure 1
  • fix precision: dom cover in accuracy figs

rerun everything with:

  • sdtaylor/PhenocamCNN#6 sites excluded from the get go
  • other/fallow combined? Yes, only 86 images of fallow, 700+ of other
  • set the seed on held out data so they can be used later, or mark it in R somewhere
  • save val data frame for easier eval later

Other things

  • deal with snow issue. If snow or water in dom category, then remove no crop in crop type and status just like removing blurry. Or actually changing to unknown sounds better.

organize folders/files

ie put A, B, C on folders for primary steps. prepend files with 1, 2, 3, etc.

unsuitable sites sites

These are cameras with 2+ distinct plots in the field of view. They're reallyy annoying to classify and aren't actually representative of ag land outside research areas.

sites with numerous experimental plots

arsmnswanlake1
kelloggcorn2
kelloggcorn3
kelloggmiscanthus
kelloggoldfield
kingmanfarm
NEON.D10.STER
silverton
tworfta

Pasture sites

uiefprairie2
archboldavir
archboldavirx
archboldbahia
archboldpnot
archboldpnotx
tworfpr
harvardfarmnorth
harvardfarmsouth
harvardgarden
rosemountnprs
sweetbriargrass
meadpasture
NEON.D04.LAJA.DP1.00033
wolfesneckfarm

Other weird ones

turkeypointenf02
tidmarshplymouth

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.