Code Monkey home page Code Monkey logo

nbsvm's Introduction

Note: I don't provide personal support for custom changes in the code. Only for the release. For people just starting, I recommend Treehouse for online-learning.

Naive Bayes SVM (NB-SVM)

This code reproduces performance of the NB-SVM on the IMDB reviews from the paper:

Sida Wang and Christopher D. Manning: Baselines and Bigrams: Simple, Good Sentiment and Topic Classification; ACL 2012. http://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf

They obtain 91.22% while this code obtains 91.55% with bigrams and 91.82% with trigrams. Little improvements (+0.33% with bigrams and +0.6% with unigrams) versus the paper.

To reproduce the results:

git clone [email protected]:mesnilgr/nbsvm.git
cd nbsvm; chmod +x oh_my_go.sh
./oh_my_go.sh

End to end (downloading the data, tokenizing, training the models), this will take 68 mins. Note that most of the time is spent dowloading and tokenizing. Once the data has been downloaded and tokenized, training an NB-SVM only takes ~2 mins for uni+bigrams and <5 mins for uni+bi+trigrams.

Creative Commons License
Naive Bayes SVM by Grégoire Mesnil is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Based on a work at https://github.com/mesnilgr/nbsvm.

nbsvm's People

Contributors

dpressel avatar mesnilgr avatar mjewkes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nbsvm's Issues

Install instruction yield ../nbsvm: No such file or directory

When I follow the instructions to this otherwise helpful Naive Bayes python script it gives me an error.

owner at Owners-iMac in ~/sbox/test/nbsvm on master
$ ./oh_my_go.sh
mkdir: nbsvm_run: File exists
dyld: Library not loaded: /usr/local/opt/openssl/lib/libssl.1.0.0.dylib
  Referenced from: /usr/local/bin/wget
  Reason: image not found
./oh_my_go.sh: line 11: 64166 Abort trap: 6           wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
tar: Error opening archive: Failed to open 'aclImdb_v1.tar.gz'
rm: aclImdb_v1.tar.gz: No such file or directory
ls: aclImdb/train/pos: No such file or directory
./oh_my_go.sh: line 3: temp: No such file or directory
mv: rename temp-norm to aclImdb/train/pos/norm.txt: No such file or directory
rm: temp: No such file or directory
ls: aclImdb/train/neg: No such file or directory
./oh_my_go.sh: line 3: temp: No such file or directory
mv: rename temp-norm to aclImdb/train/neg/norm.txt: No such file or directory
rm: temp: No such file or directory
ls: aclImdb/test/pos: No such file or directory
./oh_my_go.sh: line 3: temp: No such file or directory
mv: rename temp-norm to aclImdb/test/pos/norm.txt: No such file or directory
rm: temp: No such file or directory
ls: aclImdb/test/neg: No such file or directory
./oh_my_go.sh: line 3: temp: No such file or directory
mv: rename temp-norm to aclImdb/test/neg/norm.txt: No such file or directory
rm: temp: No such file or directory
mkdir: data: File exists
mv: rename aclImdb/train/pos/norm.txt to data/train-pos.txt: No such file or directory
mv: rename aclImdb/train/neg/norm.txt to data/train-neg.txt: No such file or directory
mv: rename aclImdb/test/pos/norm.txt to data/test-pos.txt: No such file or directory
mv: rename aclImdb/test/neg/norm.txt to data/test-neg.txt: No such file or directory
rm: aclImdb: No such file or directory
dyld: Library not loaded: /usr/local/opt/openssl/lib/libssl.1.0.0.dylib
  Referenced from: /usr/local/bin/wget
  Reason: image not found
./oh_my_go.sh: line 29: 64195 Abort trap: 6           wget https://www.csie.ntu.edu.tw/~cjlin/liblinear/oldfiles/liblinear-1.96.zip
unzip:  cannot find or open liblinear-1.96.zip, liblinear-1.96.zip.zip or liblinear-1.96.zip.ZIP.
rm: liblinear-1.96.zip: No such file or directory
./oh_my_go.sh: line 32: cd: liblinear-1.96: No such file or directory
make: *** No targets specified and no makefile found.  Stop.
BI-GRAM
python: can't open file '../nbsvm/nbsvm.py': [Errno 2] No such file or directory
TRI-GRAM
python: can't open file '../nbsvm/nbsvm.py': [Errno 2] No such file or directory
./oh_my_go.sh: line 40: cd: ../nbsvm: No such file or directory
(base) 

what external code you use?

os.system(trainsvm + " -s 0 train-nbsvm.txt model.logreg")

is it libsvm, if yest how to install on windows

support for multi class task

we can see from your code and paper that your method can do a verry good job on the binary class text classification tasks.But if we need to solve a multi class issue,how can we promote your algorithom
and the code? Thank you~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.