Code Monkey home page Code Monkey logo

mathtransformer's Introduction

Challenge

Implement a deep neural network model that learns to expand single variable polynomials. Model input is factorized sequence and output is predicted expanded sequence.

  • (7-3*z)*(-5*z-9)=15*z**2-8*z-63
  • (7-3*z)*(-5*z-9) is the factorized input
  • 15*z**2-8*z-63 is the expanded target

For the expanded form, only the form provided is considered as correct.

Solution

  • The directory ./data contains train.txt, validation.txt and test.txt
  • The source and target sequence vocabulary is stored in the directory ./vocab
  • The trained model (best_model.pt) is present in the directory ./model
  • All predictions made by the model on the test is stored in the file ./output/predictions.txt
  • Summary for the model and it's trainable parameters is stored in network.txt
  • The classes for the transformer model are in - backbone.py and transformer.py
  • data.py splits the dataset into train,val and train datasets randomly based on the input split ratio (already split dataset is provided in the repo)
  • train.py trains the model using the defined configurations
  • test.py runs the trained model on the test data to generate predictions and calculates the accuracy
  • text_EDA.ipynb contains the preliminary exploratory data analysis of the dataset
  • requirements.txt contains dependencies

Suggested setup for running the code -

Model was trained on a single NVIDIA RTX 3090 GPU with CUDA 10.2 and torch == 1.11.0 (You might have to change the torch version depending upon the GPU and CUDA version of your machine)

  • Set up a new conda virtual environment
conda create --name <env_name> python=3.9.2
  • Activate the environment
conda activate <env_name>
  • Install the dependencies (Run this command in the /Attention directory)
pip install -r requirements.txt

Commands to train the model (To only evaluate the model on test.txt, these steps can be skipped)

This solution uses the sacred library for logging, running, configuring and organizing the code.

All the commands should be run only from the parent directory (i.e. /Attention)

  1. Split data into train, val and test set:
python data.py with 'split_ratio=0.8'
  1. Train the model (All configurations can be observed in train.py and they can also be passed from command line as shown):
python train.py with 'hyperparameters.n_iters=20'

Commands to test the model

Evaluate model on test set (This will utilize GPU if available):

python test.py

Evaluate model on test set (Using CPU only):

python test.py with 'device="cpu"'

Model Accuracy

The model is evaluated against a strict equality between the predicted target sequence and the groud truth target sequence of the test dataset. The model achieved an accuracy of 98.63% (trained for 20 epochs for 45 minutes on a single GPU).

For a more comprehensive description of the solution, parameter choices and loss plots, please refer to the file Solution-Report.pdf

mathtransformer's People

Contributors

abhinand20 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.