Code Monkey home page Code Monkey logo

amazon_music_review's Introduction

Amazon Music Revew as classification - solution details

This is a PoC solution, targeting Amazon Music Dataset review rating classification.

Installation

Use the package manager pip to install the needed requirements. It is highly suggested to create a virtual environment for this PoC! This can be done either with virtenv or conda. The one on my machine was created

conda create --name amr python=3.8
conda activate amr

Once created and activated, install the needed requirements:

pip install -r requirements.txt

Usage

The code in this repo has two intentions:

  • predict using a fine-tuned BERTlike model, though a web service/endpoint.
  • train a new model using a new dataset, using a BERTlike model.

First, you need to get the code. For the purposes of this assigment, the code is avaliable in XXX.

cd /path/to/repo
export PYTHONPATH=$(pwd)

All commands are executed from the projects root.

Predict with supplied model

To predict on new datapoints, you need to start the server. Tornado was used for this PoC. To start the server execute

python src/server/server.py

The server will start on

http://localhost:5555/

and excepts POST requests. To simulate a request, Postman was used while developing. To test a dummy round trip, you can also use curl:

curl -L -X POST "http://localhost:5555" -H 'Content-Type: application/json' --data-raw '{"summary": "This little piggy went to the bank.","review": "And this one to the mall. He was a spender."}'

This will return a JSON object containing the predicted target label for the input datapoint:

{"someList": [5]}

Notice the JSON key names; they need to be exactly the same in any other request.

Train a new model

Training a new model is also straitghforward. One needs to:

  1. (optionally) Prepare a dataset consistin of 3 columns: first two columns are strings, following the same pattern as in the train file. The last column is the target column. Pass this using the -fn parameter. If no file is defined, the train data supplied with the assigment will be used;
  2. (optionally) Define a path where to serialize the model. If no path is given, the model will be saved to BERT_ARTIFACTS specified in definitions.py. It is suggested to supply this path when training on external data to avoid overwriting the model supplied with this solution!

Training is executed as follows:

python src/model/bertlike/trainer.py -fn path/to/data -sn path/to/model/serialization/folder

where data/training_new.csv is a dummy training dataset.

One minor issue at this point is the lack of model versioning and intelligent serialization of trained model and other artifacts (i.e. due to time constraints for developing this solution, using non-default paths has not been thoroughly tested!). The solution also lacks any batch processing of requests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.