priyagunjate / logistic-regression-on-amazon-reviews-data-set. Goto Github PK
View Code? Open in Web Editor NEWLogistic Regression algorithm is applied on amazon reviews datasets to predict whether a review is positive or negative. Procedure to execute the above task is as follows: • Step1: Data Pre-processing is applied on given amazon reviews data-set.And Take sample of data from dataset because of computational limitations • Step2: Time based splitting on train and test datasets. • Step3: Apply Feature generation techniques(Bow,tfidf,avg w2v,tfidfw2v) • Step4: Apply Logistic Regression algorithm using each technique. • Step5: To find lambda using gridsearch cross-validation and random cross-validation • Step5: L1 and L2 regularization • Step6: L1 Regularization- Increase lambda hyperparameter to generate sparcity in dataset. 1. Report Performance metric 2. Report Error 3. Report Sparcity in "W*" • Step6: Feature Importance for postive and Negative reviews 1. Most Important Feature 2. Bar plot of top 15 Important Features. 0.2 Objective: • To classify given reviews (positive (Rating of 4 or 5) & negative (rating of 1 or 2)) using Logistic regression algorithm.