Code Monkey home page Code Monkey logo

foodpred's Introduction

codecov

Source code of foodpred.com, a web application to compute the ecological impact of consumer goods in the food industry. The predictions are classifications of the CIQUAL dataset based on products from openfoodfacts in all languages. The predictions only look at the text of the product. That means factors such as transport are not tailored to your location or the specific brand, but rather taken as an average over all available products.

Use case

For a lot of products in many languages, there is reliable and good-quailtiy data available. The purpose of foodpred is not to replace this data, but to provide an educated guess when such data is not available. Examples include slight variations in product names, typos, and different languages. The use case is then also more in the realm of automated matching, where manual classification is not a feasible option. When you are looking at just a handful of products, I recommend doing the classification yourself.

This project is in development and the public API is subject to change. The project is self-funded.

Repository contents

  • /app: code that enters the Docker image (Dockerfile is at root for wider buidl context) (onnx, AWS Lambda)
  • /harrygobert: source code for finetuning Transformer model (transformers, torch, lightning)
  • /terraform: infrastructure code (terraform)
  • /frontend: minimal web frontend (react)
  • /.github: CI/CD jobs using OIDC

Evaluation and limitations

Creating a unified benchmark for this task is a work in progress. Right now the selection metric is just a simple top-1 validation accuracy for 5-fold cross-validation over the entire dataset, using the fixed seed in the code. Whichever hyperparameter configuration receives the highest score is then subsequently trained on the whole dataset and packaged into production format.

There are several limitations to this approach, most importantly the schewed language distribution since more than half of the dataset products are French, and for this reason the current model does not perform as good for other languages.

Prod model

The production model is a fine-tuned version of distilbert-base-multilingual-cased. The training curve can be seen via this WANDB report. The hyperparameter configuration used achieves a top-1 accuracy of 0.77 using 5-fold cross-validation as described above.

EcoScore calculation

The displayed score is calculated based on the implementation of OpenFoodFacts (code snippet):

# Formula to transform the Environmental Footprint single score to a 0 to 100 scale
# Note: EF score are for mPt / kg in Agribalyse, we need it in micro points per 100g

# Milk is considered to be a beverage
if (has_tag($product_ref, 'categories', 'en:beverages')
    or (has_tag($product_ref, 'categories', 'en:milks')))
{
    # Beverages case: score = -36*\ln(x+1)+150score=โˆ’ 36 * ln(x+1) + 150
    $product_ref->{ecoscore_data}{agribalyse}{is_beverage} = 1;
    $product_ref->{ecoscore_data}{agribalyse}{score}
        = round(-36 * log($agribalyse{$agb}{ef_total} * (1000 / 10) + 1) + 150);
}
else {
    # 2021-02-17: new updated formula: 100-(20 * ln(10*x+1))/ln(2+ 1/(100*x*x*x*x))  - with x in MPt / kg.
    $product_ref->{ecoscore_data}{agribalyse}{is_beverage} = 0;
    $product_ref->{ecoscore_data}{agribalyse}{score} = round(
        100 - 20 * log(10 * $agribalyse{$agb}{ef_total} + 1) / log(
            2 + 1 / (
                      100 * $agribalyse{$agb}{ef_total}
                    * $agribalyse{$agb}{ef_total}
                    * $agribalyse{$agb}{ef_total}
                    * $agribalyse{$agb}{ef_total}
            )
        )
    );
}
if ($product_ref->{ecoscore_data}{agribalyse}{score} < 0) {
    $product_ref->{ecoscore_data}{agribalyse}{score} = 0;
}
elsif ($product_ref->{ecoscore_data}{agribalyse}{score} > 100) {
    $product_ref->{ecoscore_data}{agribalyse}{score} = 100;

Basically, for most products the Eco-Score formula is:

100 - (20 * ln(10EF + 1)) / ln(2 + 1/(100 * EF^4))

Whereas for beverages it is:

- 36 * ln(EF+1) + 150

Additional product-specific attributes are added as positive or negative bonus points to arrive to a final normalised score. This score is thresholded to fall within a range of 0 to 100. On foodpred.com, the bonus points and thresholding steps are currently not classified and thus omitted. This means the raw score is displayed without bonus/minus points and without this normalisation. For reference, after bonus/minus points, a score between 0 and 20 would have a label of E and a score between 80 and 100 would have a label of A.

foodpred's People

Contributors

baskrahmer avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.