Code Monkey home page Code Monkey logo

nepali_nlp's Introduction

This projects aims to build a library for all the NLP processes for Nepali Language.

Getting the module

git clone [email protected]:sushil79g/Nepali_nlp.git
cd Nepali_nlp/nepali_nlp

Loading Embedding

from Embedding import Embeddings
word_vec = Embeddings().load_large_vector()
#word_vec = Embeddings().load_vector() #For small Embedding
#from fasttext_embedding import Fasttext
#word_vec = Fasttext().load()

For Nepali Synonym

from synonym import Synonym
Synonym().raw_synonym(word = 'माया',word_vec=word_vec) #method: 1
#output -> स्नेह','प्रेम','आदर','मायाँ','दया','मायालु','श्रद्धा','आत्मियता','स्पर्श','तिमी
Synonym().filter_synonym(word = 'साथी',word_vec=word_vec) #method: 2
#output -> 'भाइहरू','सहपाठी','प्रेमी','दाइ','प्रेमि','बहिनी'

Word-spell corrector

from spellcheck import Corrector
Corrector().corrector(word='सुशल') #In a very raw stage for now.
#output-> ['सुशील', 'सुशील']

Nepali text summerizer

from summerization import Summerize
Summerize().show_summary(word_vec,text, length_sentence_predict=5)

Nepali unicode to Devnagiri Font

from unicode_nepali import Unicode
text = 'ma ghara jaanchhu'
Unicode().unicode_word(text) #output-> 'म घर जान्छु'

Preeti-font character to Devnagiri Font

from preeti_unicode import preeti
unicode_word = 'g]kfnL'
print(preeti(unicode_word)) #output-> नेपाली

OCR(optical character reader)

from ocr import OCR
text = OCR(image_location)

Nepali Tokenizer

from Nepali_tokenizer import Tokenizer
Tokenizer().sentence_tokenize(text) #To tokenize sentence
Tokenizer().word_tokenize(text) #To tokenize word
Tokenizer().character_tokenize(text) #To tokenize character

Nepali sentence similarity

from sentence_similar import  Avg_vector_similar
sentences = ["कुपोषणकै कारण शारीरिक र मानसिक रुपमा कमजोर मात्र होइन, अकालमै ज्यान पनि गुमाउनुको परेको समाचार बग्रेल्ती सुन्न सकिन्छ","कर्णाली प्रदेश सामाजिक विकास मन्त्रालयले उपलब्ध गराएको तथ्यांकले कर्णालीमा प्रत्येक वर्ष जन्मिएका ५ वर्षमुनीका बालबालिका १ हजार जनामध्ये ५८ जनाले ज्यान गुमाउँदै आएको देखाएको छ"]
Avg_vector_similar().pair_similarity(word_vec, sentences) #output-> 0.6817289590835571

Nepali new-portal Scrapper (onlinekhabar and ekantipur for now)

from news_scrap import extract_news
news_link = 'https://www.onlinekhabar.com/2019/12/821094'
title, news = extract_news(news_link) #onlinekhabar and ekantipur is supported at the moment.

Show latest news summary

from news_latest import Update_news
title, links, summerized_news = Update_news().show_latest(word_vec=word_vec,portal='onlinekhabar',number_of_news=5) #ekantipur portal is also supported

TODOs:

  • Nepali Embeddings
  • Tokenizers (sentence, word, character)
  • Stop Words
  • Nepali Words Collection
  • Nepali Word synonym
  • Roman Nepali to Nepali
  • Nepali OCR
  • Summerization
  • Pos_tag
  • Sentence similarity score
  • Translation(Nepali<->English)(Currently)
  • Spell correction (Currently)
  • Named Entity Recognition

nepali_nlp's People

Contributors

sushil79g avatar sharmaanix avatar yankeexe avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.