Code Monkey home page Code Monkey logo

stanza's Introduction

Stanza: A Python NLP Library for Many Human Languages

A Fork Implementing a Morphologically Informed Prediction Filtering Mechanism in the POS Tagger

The post-filtering mechanism has been developed for Lithuanian and North Sami models, but can also be used with other pretrained taggers in prediction mode.

A morphological dictionary in CONLL-U format, stored as a MySQL table, is necessary for the filter to be activated, and is accessed through stanza/stanza/models/pos/morph.py.
Edit stanza/stanza/models/pos/config.properties and run stanza/data_files/sme/morph_dict/table_filler.py and stanza/data_files/lt/morph_dict/table_filler.py to create the MySQL tables.

To obtain filtered predictions run python -m stanza.models.tagger in the command line along with the following args:

  • --wordvec_dir (path to the pretrained embeddings directory)
  • --eval_file (path to a tokenized file to make predictions on)
  • --output_file (path to the predicted output file)
  • --lang (language code, e.g.: lt for Lithuanian, sme for North Sami)
  • --shorthand (language shorthand made up of language code, underscore, and treebank name, e.g.: lt_alksnis, sme_giella)
  • --mode predict
  • --save_dir (path to the directory where the pretrained model is stored)
  • --save_name (name of the pretrained model)
  • --morph_dict (name of the MySQL table storing the morphological dictionary (e.g.: lt for Lithuanian, sme for North Sami)

For pretrained models refer to https://stanfordnlp.github.io/stanza/download_models.html

Downloading pretrained embeddings without using the .sh script

The repository contains data files collected from the following sources:

LICENSE

Stanza is released under the Apache License, Version 2.0. See the LICENSE file for more details.

stanza's People

Contributors

angledluffa avatar j38 avatar yuhaozhang avatar qipeng avatar yuhui-zh15 avatar arunchaganty avatar rasimuvaikas avatar vzhong avatar manning avatar lwolfsonkin avatar mrapacz avatar mihail911 avatar gawy avatar dan-zheng avatar 0xflotus avatar adrianeboyd avatar vtuworkshop avatar tbm avatar m0re4u avatar turtlesoupy avatar timgates42 avatar mgrenander avatar pltrdy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.