Hello, I am not sure if this is relevant to this forum but I want to

I tried looking for downloading the tree bank at <a href="http://nlp.stanford.edu/sent

You can find the treebank on the right hand side of <a href="http://nlp.stanford.edu/s

Two questions: is there a routine in the code base which conve

Yes, there is. See: <div class="snippet-clipboard-content notranslate position-re

I downloaded the training data from <a href="http://nlp.stanford.edu/sentiment/code.ht

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Train sentiment analyzer for a specific domain about corenlp HOT 19 CLOSED

anupamme commented on September 27, 2024

Train sentiment analyzer for a specific domain

from corenlp.

Comments (19)

J38 commented on September 27, 2024 3

java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,parse,sentiment -parse.binaryTrees -sentiment.model /path/to/your-custom-sentiment-model -file example-sentence.txt -outputFormat text

from corenlp.

manning commented on September 27, 2024

Yes, the distributed model is at present trained only on movie reviews. A short answer is:

You need a treebank in the same format as the one we provide, with sentiment ratings for each binarized tree node (making this is plenty of work!)
You can then train a new model with a command like the sample command shown on this page: http://nlp.stanford.edu/sentiment/code.html .

from corenlp.

anupamme commented on September 27, 2024

I tried looking for downloading the tree bank at http://nlp.stanford.edu/sentiment/treebank.html

but I could not find a link from where I could download it. I'd like to create treebank for hotel reviews domain. So a reference on how does treebank for movies reviews look like will help.

from corenlp.

gangeli commented on September 27, 2024

You can find the treebank on the right hand side of http://nlp.stanford.edu/sentiment/index.html under "Dataset Download." For instance, http://nlp.stanford.edu/~socherr/stanfordSentimentTreebank.zip, or in Penn Treebank format, http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip.

from corenlp.

anupamme commented on September 27, 2024

Two questions:

is there a routine in the code base which converts a sentence into binary tree?
then do post order traversal to combine the sentiments of leaf nodes into root node?

I thought this routine might be there because it has been done for the movie reviews.

from corenlp.

manning commented on September 27, 2024

Yes, there is. See:

edu.stanford.nlp.sentiment.BuildBinarizedDataset

from corenlp.

anupamme commented on September 27, 2024

I downloaded the training data from http://nlp.stanford.edu/sentiment/code.html

Training data (train.txt file) had 8544 lines and is of size 2.1M. I generated the new model using SentimentTraining.java. The size of the new model (new_model.ser.gz) is 248K whereas the size of edu/stanford/nlp/models/sentiment/sentiment.ser.gz is 3.6K

I suspect this is because sentiment.ser.gz is trained on larget set of training data than the training data available to download. Is that correct?

If yes, is there a way to enhance the model for new domains (rather than training from scratch)?

from corenlp.

anupamme commented on September 27, 2024

I guess I was not super clear.

Is the complete training data available for re-training purposes? Or is it something proprietary to stanford and is not publicly available?

from corenlp.

davidaparicio commented on September 27, 2024

@anupamme I have the same question.. 👍

from corenlp.

gangeli commented on September 27, 2024

We certainly don't have any proprietary training data, though I'm not the right person to answer why the model sizes are different (also, I assume 3.6K = 3.6M?)

from corenlp.

fuxiang-chen commented on September 27, 2024

Supposedly we have already a trained model via this command:
java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz
How to use this model model.ser.gz for prediction? As in, what is the command to pass in this model via jar?

from corenlp.