Code Monkey home page Code Monkey logo

russian_embeddings's Introduction

API server for word embeddings for Russian language

Docker images for Russian various word2vec models. Now, there is only one model (Araneum) which is FastText model created with Gensim package. The model got from RusVectores

Builded images are stored at DockerHub here: https://hub.docker.com/r/rhangelxs/russian_embeddings. Individual models marked with a tag. Anareum model DockerHub's tag is: rhangelxs/russian_embeddings:araneum_none_fasttextcbow_300_5_2018

GitHub: https://github.com/rhangelxs/russian_embeddings

Idea

Docker images build with Flask, Arrested. By defaults flask development server is used, but can be served by Gunicorn as well.

All individual models can have own requirements.txt and Dockerfile, check araneum_none_fasttextcbow_300_5_2018 as an example.

API

Default port is 8080.

Transform word to vector

Individual words can be sended with GET or POST requests. Below {word} is a placeholder that should be replaced with a word of interest.

  • GET endpoint: /api/araneum_none_fasttextcbow_300_5_2018/v1/inference/{word}

    Example: /api/araneum_none_fasttextcbow_300_5_2018/v1/inference/тест

  • POST endpoint: /api/araneum_none_fasttextcbow_300_5_2018/v1/inference

    Payload is: {"token": "{word}"}

    Example of POST payload: {"token": "тест"}

You can send multiple words to POST endpoint and in a responce you will get a mean vector of standard size.

Calculate Word Mover's Distance

Additional available method is Word Mover's Distance, check Gensim documentation.

  • POST endpoint: /api/araneum_none_fasttextcbow_300_5_2018/v1/wmdsimilarity

    Payload:

    {
      "corpus": [
          [
              "слово",
              "второе",
              "третье"
          ], 
          [
              "здравствуйте"
          ],
          [
              "пока"
          ]
      ],
      "query": [
          "привет"
      ],
      "num_best": 5
    }

    The responce will have a list of ids of best matched indexes limited by num_best count.

Credits

  1. Kutuzov, A., & Kuzmenko, E. (2016, April). WebVectors: a toolkit for building web interfaces for vector semantic models. In International Conference on Analysis of Images, Social Networks and Texts (pp. 155-161). Springer, Cham.

russian_embeddings's People

Contributors

rhangelxs avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.