Code Monkey home page Code Monkey logo

hi-paris-hackathon-2023's Introduction

HI!CKATHON 2023

Hi! Paris logo

GROUP NAME : Group 14

Date : January 14th, 2023

I. OVERVIEW

1. Project Background and Description

Nowadays, while real estate greatly contributes to GHG emissions, investments in insulation works are still far behind their objectives.

The purpose of the project is to use Machine Learning to predict energy consumption of buildings. A better understanding of the importance of insulation works on the energy consumption of buildings will enhance energy transition investments.

2. Project Scope

The project scope will include the Machine Learning regression task and the business pitch.

Therefore, on a technical standpoint, the project will aim to preprocess the data to facilitate learning, designing and training a model capable of predicting the target values.

On a business standpoint, the project will also aim to prepare a business pitch that leverages said ML model and that estimate energy and cost savings.

However, while a proper ML pipeline will be designed using Python libraries, the said application will not be developed during the Hackathon.

The repository containing the code for preprocessing, training and inference will be uploaded on Gitlab, alongside this document.

3. Presentation of the group

First name Last name Year of studies & Profile School Skills Roles/Tasks
Agathe MINARO Master 2 ENSAE ENSAE Data Science Modeling, Training
Guillaume PHILIPPE Master 2 Data Science IP Paris Data Science, Computer Science Training, Testing
Huu Tan MAI Master 2 Telecom Paris Telecom Paris Data Engineering/Science Preprocessing, Refactoring, Writing
Max Cameron WU Master 2 Data Science IP Paris Data Science Feature Engineering, Finetuning
Yassine HARGANE Master 2 HEC Paris HEC Paris Business Engineering Business

4. Task Management

The team set a meeting every morning to discuss the progress and the next steps. Regarding the code, we use a git repository for version control and collaborative working.

On the first day, most of the team effort was directed towards preprocessing. Max and Tan worked on feature engineering and the whole team contributed to preprocessing. Tan further refactored the preprocessing pipeline and Guillaume made the exploratory data analysis and wrote a script to easily train and compare models. Yassine started working on the business pitch.

On the second day, Agathe and Max continued to train and fine-tune models. Tan and Guillaume worked on the report. Afterwards, the whole team worked on the business pitch.

II. PROJECT MANAGEMENT

1. Data Understanding

At the start of the challenge, the team spent time on the exploratory data analysis (EDA).

We noticed that the quality of the data was not optimal. For instance, some features were missing for a large number of observations, and some continuous features contain outliers. Regarding categorical features, some of them were poorly encoded.

Following this analysis, we agreed that the preprocessing step will be very important for the quality of the predictive model.

2. Data Preprocessing

A lot of time and effort was put in data preprocessing.

Many problems were encountered because of inconsistencies in sklearn libraries that did not allow us to recover the feature names after the column transformations in pipelines. As our team have not had time to edit the sklearn classes and to fix the inconsistencies, we decided to manually edit the datasets in order to engineer the features outside of pipelines.

The training data was truncated so to remove the outliers and to include more samples that represent the majority of the data. The test data was not truncated.

Likewise, since our group emphasized interpretability for our model, we refrained from using dimensionality reduction methods such as PCA.

3. Modeling Development

To preface this section, we would like to emphasize that we completely excluded models with little interpretability such as neural networks. Hence, our group prioritized models that allow us to visualize features importance, such as linear models or forest models.

We started by training a linear model, a decision tree and a random forest as baselines. We finally decided to opt for a XGBoost model, which is a forest model with gradient boosting that is known to perform well on tabular data.

We then fine-tuned the model by performing a grid search on the hyperparameters. We also used a cross-validation to evaluate the quality of the model. The metric used to evaluate the model is the explained variance score.

4. Deployment Strategy

In the pre-inference time, the data will be preprocessed using our unique pipeline and the model will be loaded.

We plan to deploy the AI solution as Software as a Service (SaaS) on a cloud platform. This will allow us to easily scale the solution and to make it available to a large number of users. We could deploy it both as an API, to permit users to easily request the service , and also as a platform with an user-friendly interface.

Since the model is explainable, the user could also view the most important features that contribute to the energy consumption of their buildings.

The AI solution can be used either by private and public companies or private individuals.

Furthermore, in the case where we could receive new data, we could retrain the model to keep it up to date.

III. CARBON FOOTPRINT LIMITATION

The group has tried to limit the models to strictly classic Machine Learning approaches (no Deep Learning approach) and to work with relatively lightweight models whenever possible.

Energy consumption of GPUs is notoriously high. Therefore, we decided against using GPUs for training our models. Moreover, we shortened the training time by using a smaller yet better dataset.

We also slept and ate locally :)

IV. CONCLUSION

During this intense 48 hours, we have been able to develop a model that is able to predict the energy consumption of buildings with a good explained variance.

We have also been able to prepare a business pitch that leverages the model to estimate energy and cost savings. We have focused on performance and explainability to enable our future customers to anticipate, identify and solve their problems.

hi-paris-hackathon-2023's People

Contributors

gphilippee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.