Amazon-Review-Sentiment-Analysis

Applied multinomial naive bayes algorithm to predict sentiment based on text data. In this project we built model to perform sentiment analysis for alexa reviews on amazon to predict if customers are happy with the product or not

1. Understand The Problem Statement and Business Case

Natural language processing (NLP) can be used to build predictive models to perform sentiment analysis on social media posts and reviews and predict if customers are happy or not. NLP work by converting words into numbers and training a machine learning models to make predictions. That way, we can automatically know if our customers are happy or not without manually going through massive number of tweets or reviews.

In this project, we are going to use nlp for our predictive model to predict if customers are happy or not based on alexa reviews on amazon.

2. Import Libraries and Datasets

We used customer reviews dataset from kaggle that contains 3150 customer reviews on alexa product. The following is the first two rows of the dataset :

rating	date	variation	verified_reviews	feedback
5	31-Jul-18	Charcoal Fabric	Love my Echo!	1
5	31-Jul-18	Charcoal Fabric	Loved it!	1

rating: Rating of the products
date : Date of the review
variation : Variation of the products
verified_reviews : Customers review
feedback : boolean to say whether a customers is happy or not (1 = positive, 0 = negative)

3. Explore Dataset

Checking missing values

Fortunately we don't have any missing values

Data Visualization

Majority of customers are happy with the products

Majority of customers are giving 5 stars on product reviews

Walnut Finish and Oak Finish is product variation with highest rating
White is product variation with lowest rating

The wordcloud above tell us the words that appear the most on reviews of the products

The wordcloud above tell us the words that appear the most on negative reviews of the products

4. Data Cleaning

Drop Unnecesary Columns

We drop rating and date columns because we don't need them for our predictive model.

Create Variation Dummies

We turn variation column into numerical data by creating dummies for variation column. The following is our variation dummies :

The next thing to do is drop variation column on reviews dataset and concatinate it with our variation dummies.

5. Perform Data Cleaning by Applying Punctuation Removal, Stop Words Removal, and Count Vectorizer

We use nltk library to define a pipeline to clean up all the messages. The pipeline will peforms the following :

Remove punctuation (ex : , ! . ? / etc )
Remove stopwords ( ex : i, you, them , we, etc)

The following is our customer reviews after we apply our pipeline :

Now we used count vectorizer to convert customer reviews into string data. The following is the result after we used count vectorizer to our data :

Finally we do the following before build the model :

Concat review dataset with vectorized reviews column
Drop verified_review and feedback columns

6. Train a Naive Bayes Classifier Model

We split the dataset into X Train, X Test, Y Train, and Y Test. We used 80% of our data into training dataset and 20% of our data into testing dataset. After split the dataset, we train our data using a MultinomialNB algorithm.

7. Asses Trained Model Performance

The main evaluation metric that we are used are confusion matrix and F1 score. The F-score, also called the F1-score, is a measure of a model's accuracy on a dataset. We can say F1-score is model accuracy. Confusion matrix is performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

Naive Bayes Classifier Model Confusion Matrix

Based on matrix above, we correctly classify around 5.700 positives feedback and 17 negatives feedback. We misclassify 31 negatives feedback and 10 positives feedback.

Naive Bayes Classifier Model F1 Score

Table above shows that our logistic regression model have F1 Score of 0.93, it means accuracy of our Naive Bayes model is 93%

8. Train and Evaluate a Logistic Classifier Model

Train a Logistic Classifier Model

We split the dataset into X Train, X Test, Y Train, and Y Test. We used 80% of our data into training dataset and 20% of our data into testing dataset. After split the dataset, we train our data using a Logistic Regression Classifier algorithm.

Evaluate a Logistic Regression Model

Logistic Regression Classifier Model Confusion Matrix

Based on matrix above, we correctly classify around 5.800 positives feedback and 19 negatives feedback. We misclassify 4 negatives feedback and 29 positives feedback.

Logistic Regression Classifier Model F1 Score

Table above shows that our logistic regression model have F1 Score of 0.94, it means accuracy of our Naive Bayes model is 94%

Conclusion

To predict if customers happy or not, we used two algorithms for our model. The following two algorithms are :

Naive Bayes classifier with accuracy of 93%
Logistic regression classifier with accuracy of 94%

We chose model with highest accuracy which is logistic regression with accuracy of 94%. Our model correctly predicted 5.800 positives feedback and 19 negatives feedback. Based on our model, we can tell that customers are quite happy with our products.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
Sentiment_Analysis_for_Alexa_reviews_on_Amazon.ipynb		Sentiment_Analysis_for_Alexa_reviews_on_Amazon.ipynb
amazon_alexa.tsv		amazon_alexa.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Amazon-Review-Sentiment-Analysis

1. Understand The Problem Statement and Business Case

2. Import Libraries and Datasets

3. Explore Dataset

Checking missing values

Data Visualization

4. Data Cleaning

Drop Unnecesary Columns

Create Variation Dummies

5. Perform Data Cleaning by Applying Punctuation Removal, Stop Words Removal, and Count Vectorizer

6. Train a Naive Bayes Classifier Model

7. Asses Trained Model Performance

Naive Bayes Classifier Model Confusion Matrix

Naive Bayes Classifier Model F1 Score

8. Train and Evaluate a Logistic Classifier Model

Train a Logistic Classifier Model

Evaluate a Logistic Regression Model

Logistic Regression Classifier Model Confusion Matrix

Logistic Regression Classifier Model F1 Score

Conclusion

About

Uh oh!

Releases

Packages

Languages

Galliano13/Amazon-Review-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Amazon-Review-Sentiment-Analysis

1. Understand The Problem Statement and Business Case

2. Import Libraries and Datasets

3. Explore Dataset

Checking missing values

Data Visualization

4. Data Cleaning

Drop Unnecesary Columns

Create Variation Dummies

5. Perform Data Cleaning by Applying Punctuation Removal, Stop Words Removal, and Count Vectorizer

6. Train a Naive Bayes Classifier Model

7. Asses Trained Model Performance

Naive Bayes Classifier Model Confusion Matrix

Naive Bayes Classifier Model F1 Score

8. Train and Evaluate a Logistic Classifier Model

Train a Logistic Classifier Model

Evaluate a Logistic Regression Model

Logistic Regression Classifier Model Confusion Matrix

Logistic Regression Classifier Model F1 Score

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages