Code Monkey home page Code Monkey logo

turkish-nlp-resources's Introduction

Turkish NLP Resources

Turkish NLP (Türkçe Doğal Dil İşleme) related Tools, Libraries, Models, Datasets and other resources.

Contents:

Tools/Libraries | Models | Datasets | Other Resources


Tools/Libraries

  • ITU Turkish NLP (Web Based & API) : Tools of Istanbul Technical University, Natural Language Processing Group.
  • spaCy Turkish models : spaCy Turkish models
  • VNLP (Python) : State of the art, lightweight NLP tools for Turkish language.
  • TDD - Tools (Web based) : Online tools provided by Turkish Data Depository (TDD) project.
  • Zemberek-NLP (Java) : Zemberek-NLP provides Natural Language Processing tools for Turkish.
  • Zemberek-Python (Python) : Python implementation of Zemberek.
  • Zemberek-Server (Docker) : REST Docker Server on Zemberek Turkish NLP Java Library.
  • Mukayese (Python) : is a benchmarking platform for various Turkish NLP tools and tasks, ranging from Spell-checking to NLU tasks.
  • SadedeGel (Python) : is initially designed to be a library for unsupervised extraction-based news summarization using several old and new NLP techniques.
  • Turkish Stemmer (Python) : Stemmer algorithm for Turkish language.
  • sinKAF (Python) : An ML library for profanity detection in Turkish sentences.
  • TrTokenizer (Python) : Sentence and word tokenizers for the Turkish language.
  • Tools for Turkish NLP provided by Starlang (Multi/Python) : Morphological Analysis, Spell Checker, Dependency Parser, Deasciifier, NER.
  • snnclsr/NER (Python) : Named Entity Recognition system for the Turkish Language.

Models

Word Embeddings

Datasets

  • TDD - Türkçe Dil Deposu (Turkish Language Repository) : The Turkish Natural Language Processing Project, one of the main projects of the Turkey Open Source Platform, aims to prepare the datasets needed for the processing of Turkish texts.
  • ITU NLP Group - Datasets : Datasets of Istanbul Technical University, Natural Language Processing Group.
  • Boğaziçi University TABI - NLI-TR : The Natural Language Inference in Turkish is a set of two large scale datasets that were obtained by translating the foundational NLI corpora (SNLI and MultiNLI) using Amazon Translate.
  • Turkish NLP Suite Datasets : Turkish NLP Suite Project offers diverse linguistic resources for Turkish NLP. The repo currently contains several NER datasets, medical NLP datasets and sentiment analysis datasets including movie reviews, product reviews and more.

Multilingual Datasets:

  • Amazon MASSIVE : MASSIVE is a parallel dataset of 1M utterances across 51 languages with annotations for the NLU tasks of intent prediction and slot annotation.
  • OPUS: en-tr : OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add linguistic annotation, and to provide the community with a publicly available parallel corpus.
  • CC-100 : Monolingual Datasets from Web Crawl Data. This corpus comprises of monolingual data for 100+ languages.
  • OSCAR : is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the Ungoliant architecture.

Treebank:

  • Universal Dependencies : is an international cooperative project to create treebanks of the world's languages. The project seeks to develop cross-linguistically consistent treebank annotation of morphology and syntax for multiple languages.
  • UD Turkish Kenet Turkish-Kenet UD Treebank consists of 18,700 manually annotated sentences and 178,700 tokens. Its corpus consists of dictionary examples from TDK.
  • UD Turkish BOUN : BOUN Treebank is created by the TABILAB and supported by TÜBİTAK. This corpus contains 9761 sentences, 121,214 tokens.

Other Data:

Other Sources:

Other Resources

Books:

Videos:

Articles:

Sample Notebooks/Snippets:

Blog Posts:

Other Lists:

Contrubuting

Your contributions are welcome. If you want to contribute to this list send a pull request or just open a new issue.

turkish-nlp-resources's People

Contributors

agmmnn avatar duygua avatar emreokcular avatar furkanakkurt1335 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.