Code Monkey home page Code Monkey logo

isunym / sentimentanalysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bmurali3/sentimentanalysis

0.0 2.0 0.0 648 KB

The code learns Support Vector Machines using 'linear' kernel to classify test data. Scikit-learn's LinearSVC provides a fast implementation for linear kernel SVMs. The features extracted from the text included the Tf-idf matrix, a commonly used method to convert text into a vetor space. Scikit-learn;s TfidVectorizer provides a complete and convenient set of functions for this. In addition to this, TruncatedSVD, another class in scikit-learn is used to approximate the feature vectors to a lower dimension (100~500 dimensions). Applying various combinations of the the different parameters (including the slack variable C in SVM, principal component size in SVD truncation and usage of monograms or bigrams in the feature extraction stage) a best estimator was selected based on

Python 100.00%

sentimentanalysis's Introduction

Name:   Bharath Murali
Email:  [email protected]



Sentiment Analysis using Twitter Data

File (Script): SentimentAnalysis.py

               run as 
               Python SentimentAnalysis.py

Description:    The code learns Support Vector Machines using 'linear' kernel to classify test data. Scikit-learn's 
                LinearSVC provides a fast implementation for linear kernel SVMs. The features extracted from the text included 
                the Tf-idf matrix, a commonly used method to convert text into a vetor space. Scikit-learn's TfidVectorizer
                provides a complete and convenient set of functions for this. In addition to this, TruncatedSVD, another class in
                scikit-learn is used to approximate the feature vectors to a lower dimension (100~500 dimensions). Applying various
                combinations of the the different parameters (including the slack variable C in SVM, principal component size in SVD
                truncation and usage of monograms or bigrams in the feature extraction stage) a best estimator was selected based on 
                the score
                
Input:          A tsv or csv file containing two columns, sentiments and reviews which are tab seperated.

Implementation: The file movie-reviews-sentiment.tsv was split into 66% train data and the remaining as test data, The best estimator 
                was selected after 6-fold validation on the train data. This model was then trained with the whole train data set and 
                tested on the test set.

Results:        The best estimator was one that used both monograms and bigrams, having C=100 and principal components = 500. The accuracy
                of prediction validated by the test data was 73.86%.

Notes:          SVM implementation using the cvxopt library was attempted, but the memory requirement for solvin the constrained optimization problem
                was too high resulting in memory errors. If cvxopt libraries are not installed, the appropriate code will have to be commented
                in order to run the script. I apologize as minimum error handling was done.

sentimentanalysis's People

Contributors

bmurali3 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.