Client is a theme park and production studio located in United States. It features numerous attractions, live shows and rides based on entertainment industry, in particular movies and television and host millions of people every year. These reviews must be classified to obtain a meaningful information.
People who come to visit theme park share their feedback- judgements, feelings and reviews on Google, social networking platforms and travel forums. The feedback obtained from above data sources can be positive, negative, or neutral and act as an important source for further analysis and improved decision making. Since these feedback are mostly unstructured by nature, they need processing like classification or clustering to provide a meaningful information for future uses.
The client wanted to get away with the method of consolidating the customer reviews from YouTube, Facebook, Google Reviews, TripAdvisor, Yelp and Pissed Consumer; performing some manual cleaning and structuring of data in excel files. Then going over each review and tagging them with various themes. In addition, categorising whether the feedback was positive, negative or neutral.
There were 10 rides under the theme park for which client wanted to crawl data (of last 3 years).
Valiance helped the client in developing a machine learning based solution that segments the customer feedback collected through above mentioned data sources.
We used N-gram analysis technique to derive the summary of the subjective feedback extracted from different data sources. N-gram is a probabilistic language model for predicting the next item in a sequence of words. Bigrams, Trigrams, Quadgrams (tuples of two words, three words and four words respectively) were used to get the complete context of the sentence while analysing the sentiments of the feedback.
Sentiment analysis works on discovering opinions, classify the attitude they convey, and ultimately categorize them division-wise. Valiance helped client achieve its objectives by following below approach:
1. Web scraping for data acquisition:
Data acquisition from public forums and social media sites is a first step to execute sentiment analysis successfully.
With its automation capabilities, robustness, speed and flexibility to scale up, web crawling is the best solution for acquiring data for any of these domains. So, we developed and employed custom web crawlers for each data source to scrape the user feedback data for each ride. Basis this, we were able to extract ready to use clean and structured data with following variables from each page:
- Review Title
- Review Complete text
- Review URL
2. Creation of taxonomy
3. Tagging a total of 20000 reviews based on approved taxonomy along with recognising and classifying the sentiment as positive/negative/neutral
4. Creating sentiment analysis model with the above tagged reviews
- Increase in ROI by 25% due to reduced manual intervention in customer feedback extraction, cleansing and topic identification
- The solution helped the client in tracking customer experience from start to finish and taking corrective measures to improve the engagement.