Code Monkey home page Code Monkey logo

clothes-in-space's Introduction

clothes-in-space

Personalization with deep learning in 100 lines of code

Overview

This repository contains the companion code to the "Clothes in Space" blog post.

How to run the notebook

Install dependencies (manual)

Running the full notebook in the original architecture requires Python 3 with dependencies installed and access to Elasticsearch or Coveo and Redis: to get Elastic up and running locally and quickly, the docker setup is recommended.

Docker support (semi-auto)

We also support building the env with docker-compose. To run the POC with Docker:

  1. docker-compose build to build the Jupyter container;
  2. docker-compose up --force-recreate to start the stack; a link (e.g. http://127.0.0.1) with a temporary token will be printed in the cli;
  3. docker-compose down to shut down all the containers.

Coding

Once you have everything up and running, fill the variables in the notebook to let the Python clients successfully connect to the databases (fill also catalog-specific variables you know based on your specific input file). Please note that depending on your exact setup (Redis or memory, Elasticsearch or Coveo index, etc.), you may want to use specific portions of the notebook: for this reason, different "versions" of key functions are clearly marked in the code.

Data

There are two main embedding examples in the code, product embeddings and word embeddings:

  • product embeddings in the text were generated through real session data from commerce stores. To create embeddings from your session data you will need to provide data from your store. For your convenience, a sample session file is included in the repo: if your session data are formatted in the same way, the repo can be run with no changes;

  • word embeddings in the text were generated with the 1bn word corpus. A copy of the file can be downloaded here; please note that if you use the Docker setup the file will be downloaded automatically and available in the /tmp/corpus.txt path. The variable MAX_SENTENCES can be used to limit the amount of sentences for training, greatly speeding up model building.

To run the notebook end-to-end you will also need to provide a "catalog" file and some test SKUs (i.e. product identifiers) to visualize analogies and similarities. For your convenience a sample catalog file is included in the repo: if you put catalog and sessions files in the data folder, you can then use in the notebook /notebooks/data as your DATA_FOLDER.

Please note that the sample files (catalog and sessions) are NOT enough to run the notebook end-to-end: those files are just provided as syntactic examples of *real* files you may want to use to reproduce the personalization effect explained in the original blog post.

Data format

To use your own catalog and sessions files, follow the sample files provided in the repo. In particular:

  • catalog.csv is a csv file with four columns sku, name, target, image: sku is the product identifier;
  • sessions.txt is a TAB separated txt file storing a session on each line; each session has a numerical id first and then the list of SKUs (matching the content of catalog.csv, obviously) that were viewed in that session.

Acknowledgments

Thanks to Luca Bigon for adding docker-compose support and for the usual helpful comments; thanks to Francis Turgeon-Boutin for helping with Coveo's setup and general feedback on the project.

License

All the code is released "as in" under an MIT license.

clothes-in-space's People

Contributors

jacopotagliabue avatar bigluck avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.