Code Monkey home page Code Monkey logo

agrifoodpy's Introduction

AgriFoodPy

Documentation Status Tests

AgriFoodPy is a collection of methods for manipulating and modelling agrifood data. It provides modelling for a variety of aspects of the food system, including food consumption paterns, environmental impact and emissions data, population and land use. It also provides an interface to run external models by using xarray as the data container.

In addition to this package, we have also pre-packaged some datasets for use with agrifood. These can be found on the agrifoodpy_data repository https://github.com/FixOurFood/agrifoodpy-data

Installation:

AgriFoodPy can be installed using pip, by running

pip install agrifoodpy

UK data to test the package is available from the agrifoodpy_data repository which currently can be installed using

pip install git+https://github.com/FixOurFood/agrifoodpy-data.git@importable

Usage:

Each of the four basic modules on AgriFoodPy (Food, Land, Impact, Population) has its own set of basic array manipulation functionality, a set of modelling methods to extract basic metrics from datasets, and interfaces with external modelling packages and code.

Agrifoodpy employs xarray accesors to provide additional functionality on top of the array manipulation provided by xarray.

Basic usage of the accesors depend on the type of array being manipulated. The following examples uses the food module with the importable UK data mentioned above:

# import the FoodBalanceSheet accessor and FAOSTAT from agrifoodpy_data
from agrifoodpy.food.food import FoodBalanceSheet
from agrifoodpy_data.food import FAOSTAT
import matplotlib.pyplot as plt

# Extract data for the UK (Region=229)
food_uk = FAOSTAT.sel(Region=229)

# Compute the Self-sufficiency ratio using the fbs accessor SSR function
SSR = food_uk.fbs.SSR(per_item=True)

# Plot the results using the fbs accessor plot_years function
SSR.fbs.plot_years()
plt.show()

To use the specific models and interfaces to external code, these need to be imported

# import the FoodBalanceSheet accessor and FAOSTAT from agrifoodpy_data
from agrifoodpy.food.food import FoodBalanceSheet
from agrifoodpy_data.food import FAOSTAT
import agrifoodpy.food.model as food_model
import matplotlib.pyplot as plt

# Extract data for the UK in 2020 (Region=229, Year=2020)
food_uk = FAOSTAT.sel(Region=229, Year=2020)

# Scale consumption of meat to 50%, 
food_uk_scaled = food_model.balanced_scaling(food_uk,
                                            items=2731,
                                            element="food",
                                            origin="production",
                                            scale=0.5,
                                            constant=True)

# Plot bar summary of resultant food quantities
food_uk_scaled.fbs.plot_bars(elements=["production","imports"],
                            inverted_elements=["exports","food"])
plt.show()

In he future, we plan to implement a pipeline manager to automatize certain aspects of the agrifood execution, and to simulate a comprehensive model where all aspects of the food system are considered simultaneously.

Examples and documentation

Examples demonstrating the functionality of AgriFoodPy can be the found in the package documentation. These include the use of accessors to manipulate data and access to basic models.

Contributing

AgriFoodPy is an open-source project which aims at improving the transparency of evidence base food system interventions and policy making. As such, we are happy to hear the input and ideas from the community.

If you want to contribute, have a look at the discussions page or open a new issue

For a comprehensive guide, please refer to the contributing guidelines to open a pull request to contribute new functionality

agrifoodpy's People

Contributors

jucordero avatar sarahbridle avatar

Stargazers

 avatar  avatar Ian Harrison avatar

Watchers

Kostas Georgiou avatar

agrifoodpy's Issues

Reading in impact data

The example to read impact data (GHGE, land use, eutrophication) is too complex and doesn't generate an easy to read file, nor is it read from one.

An alternative could be to generate a file with each of the values from e.g. PN18 using an assignment matrix to match to FAOSTAT items, and notes to each of the values when modifications are applied (i.e. production corrections)

  • Rename notebook to be more general
  • Include other impacts (Land use, eutrophication, etc.)
  • Do for regional data

Add in ALC category 3a, 3b

BMV = best and most versatile = 1, 2, 3a

There is another dataset ``likelihood of best and most versatile agricultural land'' from Natural England which is 1, 2, 3a

So to find 3a, intersect this with 1, 2, 3, 4, 5

Add examples, including a dashboard

In order to keep development for the codebase and dashboard coordinated and contained within a single repository, add the dashboard code to this repository as an example under the 'examples' folder.

Assigning values to label coordinates when copying from existing values in add_years, add_regions, add_items

Currently, when adding new years, regions or items to a dataarray or dataset, the non-dimension coordinates are not set or copied from the template set by "copy_from". This typically means these values have to be set manually after, which adds a few lines of code.

da = da.fbs.add_items("apples", copy_from="beef")
da["item_type"].loc[{"Item":"apples"}] = "fruit"
da["item_origin"].loc[{"Item":"apples"}] = "plant"

To avoid having to set these manually every time, it would be nice to have the option to pass a dictionary with the new values for selected non-dimension coordinates

coord_dict = {"item_type":"fruit", "item_origin":"plant"}
da = da.fbs.add_items("apples", copy_from="beef", coords=coord_dict)

This would be specially useful with label coordinates including names and / or numerical IDs

Use rasterio or rioxarray for Land Data Arrays

The LandDataArray class implements a few basic map data manipulation methods, with various degrees of success.
There are alternatives already developed which we can use rather than trying to reinvent the wheel.

rasterio has all the functionality we have tried to implement, and some more tried and tested methods we can wrap around.

rioxarray Uses xarray accessors to work with rasterio, which makes it even more suitable for our purposes.

JOSS review: Please add link somewhere to the actual examples that are available on the repo

The authors have added really useful examples here- https://github.com/FixOurFood/AgriFoodPy/tree/main/examples/modules

However, users have no way of knowing there are some detailed examples available already. I would recommend that the authors add a link somewhere in the README to these examples, so the users would know that they exist.

Else this gives the impression that no examples are available, which is clearly untrue.

openjournals/joss-reviews#6305

Feature: Host notebook examples on Binder

Binder could provide an easy interface for users to access AgriFoodPy examples without having to download any data.

A quick example is here at this URL (will take a wee while to spin up):
https://mybinder.org/v2/gh/FixOurFood/AgriFoodPy/HEAD

TODOs to make this a working solution:

  • Ensure that notebooks install all packages from current directory structure
  • Make agrifoodpy an installable package and install it in the Binder environment
  • Add Binder badges to the repo README
  • Configure the Binder with the additional features we want e.g. Voila support
  • Assess whether the free infrastructure meets our user interface demands

Refactor: Reduce disk size of git history

At time of writing, cloning the repo involved downloading ~165MB of data, which is large for a git repository. In addition, by far the largest culprit is the .git/ directory, which would imply that there is a lot of relatively large files which were committed in the past but are no longer part of the project. Give the high number of data files stored in this repository, I suspect them to be the main source of this bloat.

I don't suggest we do anything about this yet, but once the repository starts approaching a state in which it looks most like the finished thing, we can look at refactoring the git history to remove this bloat.

For posterity, here are some links to potential solutions:

Land use class and methods

Create a class to integrate land use data, following a compatible format to the ones used in population and food data

Make it possible to dissagregate plot_bars by Non-dimension coordinates

The current implementation of plot_bars allows disaggregation of the the data using the show keywork.
If the name of a dimension coordinate is given, then the bars are segmented and coloured according to the individual coordinates of that dimension.

But when choosing a non-dimension coordinate (Here we employ these as labelling coordinates), the function doesn't detect these coordinates and returns a plot with a solid color for all the quantities summed.

A way to solve this would be to change to create an array with a different coordinate sistem, based on the labelling dimensions.
To do this, we can use the group_sum accessor method we have already implemented in the base accessor class

if show not in fbs.dims and show in fbs.coords:
    fbs = fbs.fbs.group_sum(show)

and then proceed as usual

Bug in land plot function

The order of the extent array passed to imshow is incorrect, which results in plots having inverted dimensions.
Should be an easy fix

ndarray version of FAOSTAT Food Balance Sheets

Current output of the FAO data reading notebook is a CSV file, which doesn't address the issue of complex navigation as a function of the different slices through the data: year, food item, element, region.

Options exist to add multidimensionality to the arrays: python ndarray is the simplest one, but alternatives include the use of labels for human readable indexing, like Xarray and Pandas MultiIndex.

  • Duplicate the Reading_FAOSTAT_data notebook and implement a multidimensional, label based indexing
  • Output an array that can be read in the same format by the package
  • Store values in different units. Does it make sense to store all unts? ({g, kcal, gprot, gfat} / capita / day ) or should those be computed by the user?

Impact of changing Production or Food Supply of Items on Import /Export Quantity?

  • Calculate impact of changing Production on Import / Export Quantity - should be general to any vector change in Item Production, but demonstrate in a notebook for a simple example e.g. 50% drop in animal production (equal across the board for all animal products

  • Ditto for Consumption - very nearly done - roughly 3 lines of code left to write to balance kcal to be constant i.e. decrease animal kcal according to plant kcal increase

Improvements to scale_add

Currently, the scale_add function transfers quantities between elements in a FoodBalanceSheet dataset, but doesn't pay much attention to element_out after the delta has been added or subtracted.
Something I've found myself doing very often is to check whether any quantity in element_out falls below zero. This is useful if, for instance, negative quantities do not make sense and / or the excess must be transferred to another element

ds_out = ds.fbs.scale_add(element_in="food", element_out="production", scale=scale, items=items)
excess = ds_out["production"].where(ds_out["production"] < 0, other=0)
ds_out["production"] -= excess
ds_out["imports"] += excess

An "excess" or "fallback" parameter could be used to transfer these quantities to an alternative element, in a similar way to the balanced_scaling model we already have.

Continuous Integration

Setup CI on the repository. This includes automatic tests, style and possibly other checks

Will update this issues with a list of tasks once I dig deeper into this

Unable to import FoodBalanceSheet

I'm trying AgriFoodPy following the Usage written in README, but I get the following import error. Is there anything I can do to address this error on my end?

(agrifoodpy) LC:~/Desktop/agrifoodpy$ python
Python 3.11.4 (main, Jul  7 2023, 12:16:42) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from agrifoodpy.food import FoodBalanceSheet
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'FoodBalanceSheet' from 'agrifoodpy.food' (/Users/xxx/.pyenv/versions/agrifoodpy/lib/python3.11/site-packages/agrifoodpy/food/__init__.py)
>>> from agrifoodpy_data.food import FAOSTAT

plot_years requires importing the FoodBalanceSheet class even for non-food arrays

The plot_years function creates a line plot for quantities over the year coordinate. Is is defined in the FoodElementSheet accessor class in the food module, but can be often used to generate similar plot for other types of arrays, and even different coordinates.

It could be a good idea to move this to the base class and make the function more flexible, allowing the user to define which coordinate is plotted.
This will also eliminate the need to explicitly import the FoodElementSheet class when a similar plot is required.

If a inheriting accessor shouldn't have this function, it can be redefined to raise an exception.

Create unit tests

In order to appropriately set CI (#25) and in preparation for v0.1, unit tests for functions must be written.
No idea how to test plotting functions, but some others shouldn't be difficult to do.

List of tasks coming soon...

Start using poetry for package management?

I'm just pip installing rasterio, but wondering if there is an easy way to log this change / make it simple for future users - perhaps poetry?

[note to self - we're using conda here not pip, don't mix. do conda install rasterio instead]

or how do I edit the FOF setup?

Refactor: Data

It looks like most folders in this repository have data files colocated with code. This I expect is due to this being a work in progress, but I'm putting this issue in as a reminder to tidy up the datasets to be in one location with no duplication.

Also relevant to #20.

plot_years with a size 1 "show" dimension

matplotlib complains when trying to generate a plot_years plot on a DataArray with a size-1 named dimension and using it as the "show" parameter.

To reproduce the error:

from agrifoodpy.food.food import FoodElementSheet
import xarray as xr
import numpy as np

da_1 = xr.DataArray(np.random.rand(5,1),
                        coords=[('Year', [2010, 2011, 2012, 2013, 2014]),
                                ('Region', ['A'])],
                        dims=['Year', 'Region'])
fbs_1 = FoodElementSheet(da_1)

fbs_1.plot_years(show="Region")

I tried using da.squeeze() to remove size-1 named dimension and it does seems to fix the issue.

Reading spatial data and saving to arrays to be read by the package

What is the current agricultural land use in the UK, and the spatial distribution of its GHGE footprint / kcal / protein?

Add and Fix constructors for Impact and Population arrays

The Impact module has a constructor which generates an xarray dataset with the following coordinates:

Coordinates:
   *Impact       (Impact) ...
   *Item           (Item) ...
   *Region       (Region) ...
Data variables:
   value      (Item, Impact, Region) float32 ...

while the distributed Impact datasets are xarray datasets (PN18, PN18_FAOSTAT) have the following structure:

Coordinates:
   *Item         (Item) ...
   *Region       (Region) ...
Data variables:
   Impact 1      (Item, Region) float32 ...
   Impact 2      (Item, Region) float32 ...
   ...

I think the later is more appropriate, as each dataset is typically measured in different units. Having a single matrix to generate all sort of impact measures is not ideal in my opinion.
Either way, one standard has to be chosen. I have decided not to add years to the Impact datasets but this can be easily changed in the future.

The Population module does not have a constructor. One should be added to follow the structure defined in the distributed datasets (UN, Population_FAOSTAT):

  • Fix the constructor for the Impact module
  • Add a constructor for population data
  • Check constructur for foodsupply data

Add flexibility to the SSR and IDR methods

Current implementation relies exclusively in the definition of the SSR and IDR given by

$$ \begin{align} SSR &= \frac{\text{production}}{\text{production} + \text{imports} - \text{exports}}\\ IDR &= \frac{imports}{production + imports - exports} \end{align} $$

While the definition is correct and the code allows choosing the specific dataarrays used for each component, the current implementation does not allow to use the simpler definitions of SSR and IDR:

$$ \begin{align} SSR &= \frac{\text{production}}{\text{domestic use}}\\ IDR &= \frac{\text{imports}}{\text{domestic use}} \end{align} $$

A solution could be to add a new keyword domestic that defaults to None and takes priority over imports and exports when defined, even if either or both of imports and exports exist.
The defaults of imports and exports do not need to change.

JOSS review: No class named "FAOSTAT" is available as mentioned in example or the same does not work with current instructions

The example in the README is based on a FAOSTAT class that does not exist. Should the example be based on existing available classes such as "FoodBalanceSheet" ?

It seems the authors should add the "pip install git+https://github.com/FixOurFood/agrifoodpy-data.git@importable" instructions explicitly in the README for the example to work correctly

openjournals/joss-reviews#6305

Prepare setup.py, pyproject.toml, setup.cfg for version 0.1 distribution

In order to package AgriFoodPy for publication in PyPI we must decide on the backend to build the package and create/modify the configuration files accordingly.

  • Define the backend
  • Create setup.cfg if necessary
  • Create pyproject.toml if necessary
  • Modify setup.py to reflect version number and correct metadata

Feature: READMEs in every folder

This project appears to be well organised into folders, but it would be useful for people new to the project if there were README files in most folders explaining what is there and including references or sources if things like data are stored there.

Add option to assign output coordinate values from attributes in area_overlap

The area overlap returns a 2D array with the corresponding area for each combination of values from two segmentation maps.

The coordinate values assigned to each dimension of the output map are the unique values in the segmentation maps by default, or an input array of dimension values set by the dim_left and dim_right keywords.

It would be good to have an automatic output option which would select the values to use in the coordinates from a well defined attribute entry in each array.

Repository / Package name. Logo

We need to agree on the name of the package

  • Rename repository
  • Modify contributing guidelines to reflect changes to repository name. This includes instructions to modify remotes in local git repositories
  • Rename the organization
  • Create a logo and add it to the repository

Refactor: CSV to be comma separated

Currently most of the CSV files (but not all) are : separated. Unfortunately, GitHub does not render this in a pretty fashion like it does with CSV files separated with ,.

Would you consider refactoring the CSV files to be consistently comma-separated? Or is this in order to preserve the original data format?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.