Code Monkey home page Code Monkey logo

cocoa's Introduction

StanfordNLP: A Python NLP Library for Many Human Languages

Travis Status PyPI Version Python Versions

⚠️ Note ⚠️

All development, issues, ongoing maintenance, and support have been moved to our new GitHub repository as the toolkit is being renamed as Stanza since version 1.0.0. Please visit our new website for more information. You can still download stanfordnlp via pip, but newer versions of this package will be made available as stanza. This repository is kept for archival purposes.

The Stanford NLP Group's official Python NLP library. It contains packages for running our latest fully neural pipeline from the CoNLL 2018 Shared Task and for accessing the Java Stanford CoreNLP server. For detailed information please visit our official website.

References

If you use our neural pipeline including the tokenizer, the multi-word token expansion model, the lemmatizer, the POS/morphological features tagger, or the dependency parser in your research, please kindly cite our CoNLL 2018 Shared Task system description paper:

@inproceedings{qi2018universal,
 address = {Brussels, Belgium},
 author = {Qi, Peng  and  Dozat, Timothy  and  Zhang, Yuhao  and  Manning, Christopher D.},
 booktitle = {Proceedings of the {CoNLL} 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies},
 month = {October},
 pages = {160--170},
 publisher = {Association for Computational Linguistics},
 title = {Universal Dependency Parsing from Scratch},
 url = {https://nlp.stanford.edu/pubs/qi2018universal.pdf},
 year = {2018}
}

The PyTorch implementation of the neural pipeline in this repository is due to Peng Qi and Yuhao Zhang, with help from Tim Dozat and Jason Bolton.

This release is not the same as Stanford's CoNLL 2018 Shared Task system. The tokenizer, lemmatizer, morphological features, and multi-word term systems are a cleaned up version of the shared task code, but in the competition we used a Tensorflow version of the tagger and parser by Tim Dozat, which has been approximately reproduced in PyTorch (though with a few deviations from the original) for this release.

If you use the CoreNLP server, please cite the CoreNLP software package and the respective modules as described here ("Citing Stanford CoreNLP in papers"). The CoreNLP client is mostly written by Arun Chaganty, and Jason Bolton spearheaded merging the two projects together.

Issues and Usage Q&A

To ask questions, report issues or request features, please use the GitHub Issue Tracker.

Setup

StanfordNLP supports Python 3.6 or later. We strongly recommend that you install StanfordNLP from PyPI. If you already have pip installed, simply run:

pip install stanfordnlp

this should also help resolve all of the dependencies of StanfordNLP, for instance PyTorch 1.0.0 or above.

If you currently have a previous version of stanfordnlp installed, use:

pip install stanfordnlp -U

Alternatively, you can also install from source of this git repository, which will give you more flexibility in developing on top of StanfordNLP and training your own models. For this option, run

git clone https://github.com/stanfordnlp/stanfordnlp.git
cd stanfordnlp
pip install -e .

Running StanfordNLP

Getting Started with the neural pipeline

To run your first StanfordNLP pipeline, simply following these steps in your Python interactive interpreter:

>>> import stanfordnlp
>>> stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
# IMPORTANT: The above line prompts you before downloading, which doesn't work well in a Jupyter notebook.
# To avoid a prompt when using notebooks, instead use: >>> stanfordnlp.download('en', force=True)
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()

The last command will print out the words in the first sentence in the input string (or Document, as it is represented in StanfordNLP), as well as the indices for the word that governs it in the Universal Dependencies parse of that sentence (its "head"), along with the dependency relation between the words. The output should look like:

('Barack', '4', 'nsubj:pass')
('Obama', '1', 'flat')
('was', '4', 'aux:pass')
('born', '0', 'root')
('in', '6', 'case')
('Hawaii', '4', 'obl')
('.', '4', 'punct')

Note: If you are running into issues like OSError: [Errno 22] Invalid argument, it's very likely that you are affected by a known Python issue, and we would recommend Python 3.6.8 or later and Python 3.7.2 or later.

We also provide a multilingual demo script that demonstrates how one uses StanfordNLP in other languages than English, for example Chinese (traditional)

python demo/pipeline_demo.py -l zh

See our getting started guide for more details.

Access to Java Stanford CoreNLP Server

Aside from the neural pipeline, this project also includes an official wrapper for acessing the Java Stanford CoreNLP Server with Python code.

There are a few initial setup steps.

  • Download Stanford CoreNLP and models for the language you wish to use
  • Put the model jars in the distribution folder
  • Tell the python code where Stanford CoreNLP is located: export CORENLP_HOME=/path/to/stanford-corenlp-full-2018-10-05

We provide another demo script that shows how one can use the CoreNLP client and extract various annotations from it.

Online Colab Notebooks

To get your started, we also provide interactive Jupyter notebooks in the demo folder. You can also open these notebooks and run them interactively on Google Colab. To view all available notebooks, follow these steps:

  • Go to the Google Colab website
  • Navigate to File -> Open notebook, and choose GitHub in the pop-up menu
  • Note that you do not need to give Colab access permission to your github account
  • Type stanfordnlp/stanfordnlp in the search bar, and click enter

Trained Models for the Neural Pipeline

We currently provide models for all of the treebanks in the CoNLL 2018 Shared Task. You can find instructions for downloading and using these models here.

Batching To Maximize Pipeline Speed

To maximize speed performance, it is essential to run the pipeline on batches of documents. Running a for loop on one sentence at a time will be very slow. The best approach at this time is to concatenate documents together, with each document separated by a blank line (i.e., two line breaks \n\n). The tokenizer will recognize blank lines as sentence breaks. We are actively working on improving multi-document processing.

Training your own neural pipelines

All neural modules in this library, including the tokenizer, the multi-word token (MWT) expander, the POS/morphological features tagger, the lemmatizer and the dependency parser, can be trained with your own CoNLL-U format data. Currently, we do not support model training via the Pipeline interface. Therefore, to train your own models, you need to clone this git repository and set up from source.

For detailed step-by-step guidance on how to train and evaluate your own models, please visit our training documentation.

LICENSE

StanfordNLP is released under the Apache License, Version 2.0. See the LICENSE file for more details.

cocoa's People

Contributors

anushabala avatar derekchen14 avatar hhexiy avatar mihail911 avatar percyliang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cocoa's Issues

KeyError: 'post_id'

read_examples: data/train.json
Traceback (most recent call last):
File "parse_dialogue.py", line 60, in
examples = read_examples(args.transcripts, args.max_examples, Scenario)
File "/home/prateekagarwal/cocoa/cocoa/core/dataset.py", line 120, in read_examples
examples.append(Example.from_dict(raw, Scenario))
File "/home/prateekagarwal/cocoa/cocoa/core/dataset.py", line 29, in from_dict
scenario = Scenario.from_dict(None, raw['scenario'])
File "/home/prateekagarwal/cocoa/craigslistbargain/core/scenario.py", line 34, in from_dict
return Scenario(raw['uuid'], raw['post_id'], raw['category'], None, scenario_attributes, [KB.from_dict(scenario_attributes, kb) for kb in raw['kbs']])
KeyError: 'post_id'

Getting this error on running the step which parses training data.
I simply installed the requirements and then started running the steps mentioned under building the bot.

IndexError: index 0 is out of bounds for axis 0 with size 0

Hi
try to run python src/scripts/generate_scenarios.py --schema-path data/schema.json --scenarios-path data/scenarios.json --num-scenarios 500 --random-attributes --random-items --alphas 0.3 1 3

I had error:
Traceback (most recent call last):
File "src/scripts/generate_scenarios.py", line 158, in
s = generate_scenario(schema)
File "src/scripts/generate_scenarios.py", line 93, in generate_scenario
index = random_multinomial(distrib)
File "/home/ ... /cocoa/src/basic/util.py", line 11, in random_multinomial
accum += probs[i]
IndexError: index 0 is out of bounds for axis 0 with size 0

SL(act)+ Rule model

Hi,

How can I replicate SL(act)+Rule model? Could you please provide me the commands? Thank you!

Integration with AMT for crowdsourcing

Hi,

Thank you for providing this code.

Could you please share some example/script as to how we integrate this system with AMT for crowdsourcing dialogues?

Thanks.

How to solve problems in Chat with the bot

I'm trying Chat with the bot in the command line interface, and web interface.
When then, I faced some problems.

In the command line interface, TypeError was occurred.
I can't solve this error in myself.
I'm describing later about this error in detail.

In the web interface, no error was occurred, but I can't understand
what to do after the contents described below are output.

What should I do in each section?

Preparation of inputting commands

Stored pre-trained models below:

craigslistbargain/
 ├ checkpoint/
 │ └ lf2lf/
 │   └ config.json
 │   └ model_best.pt
 ├ mappings/
 │ └ lf2lf/
 │   └ kb.glove.pt
 │   └ vocab.pkl
 ├ model.pkl
 ├ price_tracker.pkl
 ├ templates.pkl

Command line interface

input:

PYTHONPATH=. python ../scripts/generate_dataset.py --schema-path data/craigslist-schema.json --scenarios-path data/dev-scenarios.json --results-path bot-chat-transcripts.json --max-examples 20 --agents rulebased cmd --price-tracker price_tracker.pkl --agent-checkpoints checkpoint/lf2lf/model_best.pt "" --max-turns 20 --random-seed 1 --sample --temperature 0.2

output:

[nltk_data] Downloading package punkt to /home/mocchaso/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/mocchaso/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Traceback (most recent call last):
  File "../scripts/generate_dataset.py", line 70, in <module>
    for name, model_path in zip(args.agents, args.agent_checkpoints)]
  File "/mnt/c/users/administrator/my_graduation_research/cocoa_src/craigslistbargain/systems/__init__.py", line 14, in get_system
    templates = Templates.from_pickle(args.templates)
  File "/mnt/c/users/administrator/my_graduation_research/cocoa_src/cocoa/model/generator.py", line 81, in from_pickle
    templates = read_pickle(path)
  File "/mnt/c/users/administrator/my_graduation_research/cocoa_src/cocoa/core/util.py", line 28, in read_pickle
    with open(path, 'rb') as fin:
TypeError: coercing to Unicode: need string or buffer, NoneType found

Web interface

input

PYTHONPATH=. python web/chat_app.py --port 5000 --config web/app_params.json --schema-path data/craigslist-schema.json --scenarios-path data/dev-scenarios.json --output web_output

output:

[nltk_data] Downloading package punkt to /home/mocchaso/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
You are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
You are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
1 systems loaded
human
App setup complete

NameError when lunching web server

NameError: global name 'datetime' is not defined in cocoa/craigslistbargain/options.py.

It occurs when I run the following script:

$PYTHONPATH=. python web/chat_app.py --port 8081 --config web/app_params.json --schema-path data/craigslist-schema.json --scenarios-path data/train-scenarios.json --output output

Question: Cannot execute craigslistbargain/web/chat_app.py

Hello.

In order to try to use cocoa system, I added to write paths below to /home/(user_name)/.pyenv/versions/anaconda3-5.3.0/envs/py27/lib/python2.7/site-packages/easy-install.pth.

  • (auto-written by setup.py) /mnt/c/users/(admin_name)/cocoa-master
  • /mnt/c/users/(user_name)/cocoa-master/cocoa
  • /mnt/c/users/(user_name)/cocoa-master/onmt
  • /mnt/c/users/(user_name)/cocoa-master/craigslistbargain
  • /mnt/c/users/(user_name)/cocoa-master/craigslistbargain/web
  • /mnt/c/users/(user_name)/cocoa-master/craigslistbargain/core

After that, I ran craigslistbargain/web/chat_app.py on Python 2.7.15.
However, this execution failed. The error is as follows:

Traceback (most recent call last):
  File "chat_app.py", line 19, in <module>
    from core.scenario import Scenario
ImportError: No module named scenario

What should I do?
I would appreciate if you could teach the solution.

Environment

  • Ubuntu 18.04 (Linux Subsystem of Windows 10 Education)
  • Python 2.7.15 on Anaconda virtual environment
    • Pytorch 0.4.1.post2

Modular approach question

Hi,

I am trying to reproduce the SL/RL(act) model and the overall example.

Reading the paper and running the code I noticed you refer to a Hybrid Policy (paper) which is basically the SL(act)+rule and there is also a hybrid type of agents which refers to what I want to reproduce.

When I am going through your code for the Modular approach, I noticed that when RL is applied, you specify the pt-neural type of agents:

mkdir checkpoint/lf2lf-margin; PYTHONPATH=. python reinforce.py --schema-path data/craigslist-schema.json \ --scenarios-path data/train-scenarios.json \ --valid-scenarios-path data/dev-scenarios.json \ --price-tracker price_tracker.pkl \ --agent-checkpoints checkpoint/lf2lf/model_best.pt checkpoint/lf2lf/model_best.pt \ --model-path checkpoint/lf2lf-margin \ --optim adagrad --learning-rate 0.001 \ --agents pt-neural pt-neural \ --report-every 500 --max-turns 20 --num-dialogues 5000 \ --sample --temperature 0.5 --max-length 20 --reward margin

Later, at the End-to-End approach, you mention that in order to run the RL finetune: "We just need to change the agent type to --agents hybrid hybrid".

So, my question is that, shouldn't those two be at the exact opposite side, meaning the Modular approach with hybrid type agents and the End-to-End approach with pt-neural?

I might be also missing something here - something that I haven't understood correctly. I would really appreciate your kind help.

Thank you in advance!

What the soulution to ImportError: cannot import name LIWC ?

Hi,He He!
I've learned a lot from running and reading your code!
However,I meet a problem when i setup the web server i.g. when i python ../scripts/visualize_transcripts.py
I get:

Traceback (most recent call last):
File "../scripts/visualize_transcripts.py", line 4, in
from analysis.visualizer import Visualizer
File "/data/linshuai/cocoa/craigslistbargain/analysis/visualizer.py", line 8, in
from analyze_strategy import StrategyAnalyzer
File "/data/linshuai/cocoa/craigslistbargain/analysis/analyze_strategy.py", line 25, in
from liwc import LIWC
ImportError: cannot import name LIWC

i've install the liwc via pip install.However,it doesn't seem to with LIWC.
And i wonder the use of this line of code:
self.liwc = LIWC.from_pkl(liwc_path)

Appreciating for your help!

Natural Language Generation for Dealornodeal

When I run self-play mode in the task of Dealornodeal, two agent will communicate in the dialogue act level. How can I turn those dialogue acts into natural language? Thank you!

IndexError: too many indices for array

When I try to chat with the bot in the web interface, I got the error below:

Traceback (most recent call last):
  File "/home/.local/lib/python2.7/site-packages/gevent/pywsgi.py", line 976, in handle_one_response
    self.run_application()
  File "/home/.local/lib/python2.7/site-packages/gevent/pywsgi.py", line 923, in run_application
    self.result = self.application(self.environ, self.start_response)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/.local/lib/python2.7/site-packages/flask_socketio/__init__.py", line 42, in __call__
    start_response)
  File "/home/.local/lib/python2.7/site-packages/engineio/middleware.py", line 67, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/cocoa/cocoa/web/views/chat.py", line 48, in check_inbox
    event = backend.receive(uid)
  File "/home/cocoa/cocoa/web/main/backend.py", line 838, in receive
    controller.step(self)
  File "/home/cocoa/cocoa/core/controller.py", line 109, in step
    event = session.send()
  File "/home/cocoa/cocoa/sessions/timed_session.py", line 60, in send
    self.queued_event.append(self.session.send())
  File "/home/cocoa/craigslistbargain/sessions/neural_session.py", line 64, in send
    tokens = self.generate()
  File "/home/cocoa/craigslistbargain/sessions/neural_session.py", line 158, in generate
    output_data = self.generator.generate_batch(batch, gt_prefix=self.gt_prefix, enc_state=enc_state)
  File "/home/cocoa/cocoa/neural/generator.py", line 119, in generate_batch
    for b in range(batch_size)]
  File "/home/cocoa/cocoa/neural/generator.py", line 109, in get_bos
    bos = batch.decoder_inputs[gt_prefix-1][b].data.cpu().numpy()[0]
IndexError: too many indices for array
2019-03-10T12:46:45Z {'REMOTE_PORT': '49500', 'HTTP_HOST': 'xxx', 'REMOTE_ADDR': '::ffff:xxx', (hidden keys: 24)} failed with IndexError

I don't know the reason for it, and my command is:

PYTHONPATH=. python web/chat_app.py --host 0.0.0.0 --port 5000 --config web/app_params_allsys.json --schema-path data/craigslist-schema.json --scenarios-path data/scenarios.json --price-tracker-model data/price_tracker.pkl --templates data/templates.pkl --policy data/model.pkl

Other informations:

System: Ubuntu18.04
Python: 2.7

Please let me know if there is any solutions for this problem, thanks.

TypeError: where() takes at most 2 arguments (3 given)

PYTHONPATH=. python src/main.py --schema-path data/schema.json --scenarios-path data/scenarios.json --train-examples-paths data/train.json --test-examples-paths data/dev.json --stop-words data/common_words.txt --min-epochs 10 --checkpoint checkpoint --rnn-type lstm --learning-rate 0.5 --optimizer adagrad --print-every 50 --model attn-copy-encdec --gpu 1 --rnn-size 100 --grad-clip 0 --num-items 12 --batch-size 32 --stats-file stats.json --entity-encoding-form type --entity-decoding-form type --node-embed-in-rnn-inputs --msg-aggregation max --word-embed-size 100 --node-embed-size 50 --entity-hist-len -1 --learned-utterance-decay
/ve_tf0.11_py2/venv/lib/python2.7/site-packages/fuzzywuzzy/fuzz.py:35: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
read_examples: data/train.json
read_examples: data/dev.json
Building lexicon...
Created lexicon: 522092 phrases mapping to 1314 entities, 3.291269 entities per phrase
Using rule-based lexicon...
3.96 s
test: 0 dialogues out of 0 examples
train: 7257 dialogues out of 8967 examples
dev: 878 dialogues out of 1083 examples
Vocabulary size: 8435
Traceback (most recent call last):
File "src/main.py", line 110, in
model = build_model(schema, mappings, model_args)
File "//cocoa/src/model/encdec.py", line 69, in build_model
model = GraphEncoderDecoder(encoder_word_embedder, decoder_word_embedder, graph_embedder, encoder, decoder, pad, select)
File "//cocoa/src/model/encdec.py", line 760, in init
super(GraphEncoderDecoder, self).init(encoder_word_embedder, decoder_word_embedder, encoder, decoder, pad, select, scope)
File "//cocoa/src/model/encdec.py", line 639, in init
self.build_model(encoder_word_embedder, decoder_word_embedder, encoder, decoder, scope)
File "//cocoa/src/model/encdec.py", line 659, in build_model
encoder.build_model(encoder_word_embedder, encoder_input_dict, time_major=False)
File "//cocoa/src/model/encdec.py", line 283, in build_model
super(GraphEncoder, self).build_model(word_embedder, input_dict, time_major=time_major, scope=scope)
File "//cocoa/src/model/encdec.py", line 193, in build_model
inputs = self._build_rnn_inputs(word_embedder, time_major)
File "//cocoa/src/model/encdec.py", line 267, in _build_rnn_inputs
word_embeddings = word_embedder.embed(self.inputs, zero_pad=True)
File "//cocoa/src/model/word_embedder.py", line 17, in embed
embeddings = tf.where(inputs == self.pad, tf.zeros_like(embeddings), embeddings)
TypeError: where() takes at most 2 arguments (3 given)

Modify System class and add Session class

  • Add new_session() function (and any other required functions) to System
  • System should load up the model at startup
  • Session should provide an interface to send/receive messages from the model

Model name problem

Could you please clarify the meaning of each model in the code of paper "Decoupling Strategy and Generation in Negotiation Dialogues"?

  1. "rulebased"
  2. "hybrid"
  3. "cmd"
  4. "fb-neural"
  5. "pt-neural"

Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.