In this article we will establish an automated customer satisfaction analysis based on opinon mining. The purpose of such a system is to provide valuable insight based on reviews and social media comments in an automated manner. We will use data science and natural language processing techniques to extract and visualize most interesting topics and to track them over time.
How to measure customer satisfaction with sentiment analysis
Development machine learning techniques, that allow to understand natural language, not only provide a solution to analyse thousands or milions of reviews in seconds.
Reviews, contrary to surveys, which influence responses with the way questions are posed, provide a genuine and authentic feedback from the client.
Being able to extract the topics or issues, that client mention in their comments, we gain an unique vision about how customer sees our product/service. With a rise of popularity of review sites, this is a source of marketing and business knowledge that should not be ignored.
Nowadays you can post on internet an opinion about almost everything – from your local grocery shop, barber, brand of cigarrettes or vacation destination. For the purpose of the study, we have used a dataset of hotel reviews.
Before starting the analysis, we dive into the forementioned dataset and read some samples. The most important difference between structured reviews or articles and touris portal comments is that the latter tend to be rather short, with “noise” of eg. grammar mistakes. Preparing the analysis pipeline we begin with cleaning the data. We take typical approach: removing punctuation, special characters, converting text to lowercase and removing stopwords.
Preprocessed text is converted to feature matrix with TF-IDF method. We set the treshold to remove words occuring too rarely, also to decrease the time and computational complexity. Prepared matrix is used to train a sentiment classifier based on SVC. Parameter tuning is done with a grid search method. We obtained accuracy of 86% on test data, which is more than satisfactory for this kind of problem.
Sentiment can be a gauge of general user satisfaction, but to properly draw conclusions from that value, we need a reference. If we represent it in a form of time series, we can track its evolution over time.
After analysing the sentiment we focused on opinion mining. In this case, we have explored three different approaches: tag cloud visualisation, summary generator and topic extraction.
In the first step we have created a preprocessing pipeline and using the sentiment classifier built earlier, we have divided the reviews into positive and negative. For each group we have created a tag cloud visualisation.
What clients considered positive
...and what negative
Looking the word cloud, we can immediately spot some important issues – hotel visitors tend to complain about the state of bathrooms, while the location of the hotel seems to be a definite advantage.
In the second part of analysis, we’ve created an algorithm which, among a few categories, chooses those comments, that carry the most information. This could be included in a semi-supervised system, marking those reviews worth reading by the hotel service.
We give each review a value, which is a sum of all tf-idf columns, divided by the review length. Basing on this metric, we can identify only those which are interesting, and worth human time.
As the last element, we propose to model reviews topics with another method – Non-Negative Matrix Factorizatoin. For each group we have generated a set of words, best describing the topic. In contrary to efect of clusterization, each opinion can have more than one topic. Below some examples of topics based on positive comments:
room bed hotel comfy comfortable clean building park big modern
positive wouldn fault experience extra extremely fab fabulous facility fact
helpful staff breakfast bar extremely restaurant kind hotel food decor
location park hotel building room staff centre tram view outside
As we can spot, users do compliment view from the rooms, helpful staff and hotel restaurant.
We can also plot the topic distribution among all comments:
As we can see, most positive reviews are mentioning staff and restaurant. Second group are "general" compliments.
To summarize, in this post we have shown the use of Natural Language Processing in automatization of customers opinions analysis. We can calculate the sentiment of the review and visualize it, choose most interesting reviews and extract topics of the comments. System can be extended by taking into measure the date of issuing the review - and then for example tracking the evolution of topics over time.