Code Monkey home page Code Monkey logo

Comments (19)

J38 avatar J38 commented on September 27, 2024 3

java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,parse,sentiment -parse.binaryTrees -sentiment.model /path/to/your-custom-sentiment-model -file example-sentence.txt -outputFormat text

from corenlp.

manning avatar manning commented on September 27, 2024

Yes, the distributed model is at present trained only on movie reviews. A short answer is:

  • You need a treebank in the same format as the one we provide, with sentiment ratings for each binarized tree node (making this is plenty of work!)
  • You can then train a new model with a command like the sample command shown on this page: http://nlp.stanford.edu/sentiment/code.html .

from corenlp.

anupamme avatar anupamme commented on September 27, 2024

I tried looking for downloading the tree bank at http://nlp.stanford.edu/sentiment/treebank.html

but I could not find a link from where I could download it. I'd like to create treebank for hotel reviews domain. So a reference on how does treebank for movies reviews look like will help.

from corenlp.

gangeli avatar gangeli commented on September 27, 2024

You can find the treebank on the right hand side of http://nlp.stanford.edu/sentiment/index.html under "Dataset Download." For instance, http://nlp.stanford.edu/~socherr/stanfordSentimentTreebank.zip, or in Penn Treebank format, http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip.

from corenlp.

anupamme avatar anupamme commented on September 27, 2024

Two questions:

  1. is there a routine in the code base which converts a sentence into binary tree?
  2. then do post order traversal to combine the sentiments of leaf nodes into root node?

I thought this routine might be there because it has been done for the movie reviews.

from corenlp.

manning avatar manning commented on September 27, 2024

Yes, there is. See:

edu.stanford.nlp.sentiment.BuildBinarizedDataset

from corenlp.

anupamme avatar anupamme commented on September 27, 2024

I downloaded the training data from http://nlp.stanford.edu/sentiment/code.html

Training data (train.txt file) had 8544 lines and is of size 2.1M. I generated the new model using SentimentTraining.java. The size of the new model (new_model.ser.gz) is 248K whereas the size of edu/stanford/nlp/models/sentiment/sentiment.ser.gz is 3.6K

I suspect this is because sentiment.ser.gz is trained on larget set of training data than the training data available to download. Is that correct?

If yes, is there a way to enhance the model for new domains (rather than training from scratch)?

from corenlp.

anupamme avatar anupamme commented on September 27, 2024

I guess I was not super clear.

Is the complete training data available for re-training purposes? Or is it something proprietary to stanford and is not publicly available?

from corenlp.

davidaparicio avatar davidaparicio commented on September 27, 2024

@anupamme I have the same question.. 👍

from corenlp.

gangeli avatar gangeli commented on September 27, 2024

We certainly don't have any proprietary training data, though I'm not the right person to answer why the model sizes are different (also, I assume 3.6K = 3.6M?)

from corenlp.

fuxiang-chen avatar fuxiang-chen commented on September 27, 2024

Supposedly we have already a trained model via this command:
java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz
How to use this model model.ser.gz for prediction? As in, what is the command to pass in this model via jar?

from corenlp.

jageshmaharjan avatar jageshmaharjan commented on September 27, 2024

My question is same as @cfuxiang

from corenlp.

jageshmaharjan avatar jageshmaharjan commented on September 27, 2024

Thank you, @J38

from corenlp.

jageshmaharjan avatar jageshmaharjan commented on September 27, 2024

One more question, can i use both model [Stanford sentiment model and my domain model ] to predict the sentiment. If yes how can i use multiple model to predict.
But, i know result conflicts.

from corenlp.

J38 avatar J38 commented on September 27, 2024

No you can only use one model at a time.

from corenlp.

jageshmaharjan avatar jageshmaharjan commented on September 27, 2024

谢谢

from corenlp.

jageshmaharjan avatar jageshmaharjan commented on September 27, 2024

Thank you :)

from corenlp.

jageshmaharjan avatar jageshmaharjan commented on September 27, 2024

While i train the training dataset, using Stanford sentiment tool [RNTN], several models were generated, which i assume with score. Which model should we use.

from corenlp.

saisubramaniam avatar saisubramaniam commented on September 27, 2024

Has someone already created custom models for specific domains? If yes, will it be possible to share? Thanks.

from corenlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.