Code Monkey home page Code Monkey logo

dcs_experiments's Introduction

NLP Experiments using Digital Corpus of Sanskrit

For a Sanskrit parser project that I am collaborating on, we have been discussing and investigating different language models, and their applicability to parsing Sanskrit. I have been particularly interested in deep learning approaches for language modeling, such as Seq2Seq (+ attention), etc.

This git repository contains scripts and ipython notebooks with my experiments.

Word2vec using sentence roots

A building block in many of these deep learning approaches is the embedding of words in a vector space using word2vec or GloVe. This notebook contains some of my experiments with word2vec using the Digital Corpus of Sanskrit to investigate the feasibility of using word2vec on just root words (prAtipadikas/dhAtus) in Sanskrit.

The DCS database is quite small from a deep learning perspective (about 30 MB if we count just the root words), so it was unclear how good the results would be or what to expect. (Spoiler - I was pleasantly surprised by the quality of the results obtained for a first pass).

dcs_experiments's People

Contributors

alvarna avatar avinashvarna avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

dcs_experiments's Issues

raw data of the dcs?

Namaste,

I wonder if the raw data of the dcs used for this experiment is available somewhere? I am currently trying to train my own sanskrit segmenter and this data would be extremely useful!
With best wishes,

Sebastian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.