Code Monkey home page Code Monkey logo

younicam-ai's Introduction

younicam-AI

Machine Learning project developed in PySpark. The ML model uses the dataset related to the anonymized presences registered via the Younicam mobile application in the University of Camerino's buildings to predict the number of people in a room during a precise time interval.

TPOT is used in the model training phase to get the best combination between the ML model and hyperparameters.

Get started

Prerequisites

Under your home directory, find a file named .bash_profile, .bashrc or .zshrc. This name might be different according to the operation system or version. After that, open the bash shell startup file and past the script below:

export SPARK_HOME="/opt/spark"
export PATH="$SPARK_HOME/bin:$PATH"

If you want Jupyter Notebook to be opened when launching PySpark, add also the variables below:

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

Now you are able to launch PySpark from any directory with the underneath command:

pyspark

Install dependencies

To install the project dependencies run the following command:

pip install -r requirements.txt

Notice that the TPOT pipeline needs some additional dependencies listed in the TPOT installation docs.

Usage

Launch PySpark, as described above, and browse into the project directory to execute the notebooks.

If the Jupyter Notebook doesn't open automatically with PySpark, open it using the command below:

jupyter notebook /path/to/notebook

The TPOT pipeline notebook was used in order to find the best combination between ML model and hyperparameters. It outputs a .py pipeline to run the selected ML model with its configurations. We used the returned pipeline inside the Model Training notebook in order to perform additional operation around the training (e.g. save intermediate dataset, evaluation).

Structure

The repository has the following folder structure:

  • data : contains the original dataset plus some other intermediary transformations in json format
  • notebooks : contains all the notebooks used during experimentation. There are a notebook for the collection and preparation phases, one for the training and evaluation phases, one for the predictions visualization and another one to execute the TPOT pipeline.
  • predictions : contains the final predictions results in csv format

Authors

younicam-ai's People

Contributors

yuripaoloni avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.