In this hyper-competitive era, the need for companies to stand out among the competitors is becoming non trivial. Employees are the voice of an organization and their feedback determines the company as a whole. Also, these feedback play an important role to get a sense of the in-depth work culture of the company in both positive or negative ways.

Problem Context:

Importance of employee engagement becomes critical and inevitable in the current scenario and for that, continuous employee feedback is really important. Online platforms have emerged as critical mediums for sharing feedback but with growing data volume, it is difficult for HR teams to review and identify issues manually. NLP driven analytics engines can play critical roles in deciphering actual feedback and providing improvement opportunities.

How to solve this problem?

Employees review platforms such as Glassdoor, Google contain thousands of reviews for any organization. This review data gives an in depth knowledge about the various  working aspects of any organization both in positive and negative sense. Hence, it can be used as the feedback system by any organization to bring out the possible improvements.

Data Sources:

Basically, we have two main sources from where this review data can be extracted:

  1. Platforms like Glassdoor, Google are crowdsourced and contain thousands of reviews about the working ethics of a company. The power to express your views anonymously makes these platforms highly authentic and can be used by the organizations to bring out the possible improvements.
  2. Large enterprises continuously survey their employees and as a result, internal feedback surveys have a lot of insights to be used to find improvement opportunities.

Components of proposed solution:

1)  A custom web crawler or data scraping engine to extract the reviews relevant to a particular organization

2) Sentiment Analysis-training a Deep Learning model to identify whether a review is positive or negative

3) Topic Modelling/LDA/N-Grams-identifying the key themes present in the review data

4) A user friendly interface for the HR manager to show different clusters of reviews across identified themes

Let’s explore all these components in detail one by one:

Data Crawling:

Data crawling is a process used for digging deep into the world wide web and extracting relevant details from the web pages. In our case, it will be the reviews or feedback of the employees of a particular organization. 

We can use several open source frameworks such as Scrapy, PySpider or Selenium for building the web crawler. The data extracted by crawlers will then be sent to AWS storage from where it can be used for further processing.

NLP Driven Employee Review Data Mining

Sentiment Analysis:

Sentiment analysis  is a process of identifying a sentiment of a given text in various aspects such as its polarity (Positive,Negative or Neutral), subject, intent etc.

For our needs, we will use this to identify whether a given review is negative or positive. Since a review may contain both positive and negative aspects of any organization, applying a sentiment analysis model to the whole review doesn’t make sense. So, firstly we will decompose a given review into a number of logical sentences, each making its individual sense in either of the ways. Then, these individual sentences will be classified into positive or negative review depending upon their text structure and vocabulary.

Earlier, several models such as Multivariate Naïve Bayes Classifier, SVM were used for sentiment analysis. But, one of the major drawbacks of these models was that they used to classify a review only on the basis of the words appearing in the text regardless of their context. And, since the meaning of each word is dependent on the previous or upcoming word, considering all these words to be independent of each other was  a pretty bad idea.

This problem was later solved by Recurrent Neural Networks, Long Short Term Memory (LSTM) where they not only identify a word to classify a review but also the context in which the word was used. So, for our application purpose, a LSTM model will be trained on several reviews of any organization. Once trained and validated, it can then be used for classifying  a given sentence of a review into positive or negative.

Theme Identification:

Each sentence of a review could be centered around one or the other things such as food, working hours, salary and many more. Hence, identifying the theme of a given sentence is essential to later bucket all sentences with similar theme into a single database.

For example: 

-“The company has got a great work culture” and “There is a supportive and learning environment” can be put under one theme of  “Work Culture”. Similarly, “Working hours are not flexible” and “Long and hectic working hours” can be put under the theme of “Working Hours”.

This processing of identifying a theme from a text is called Topic Modelling and as the name suggests, it assigns a particular set of topics to a given text. There Are several models used for topic modelling such as Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA). These models need to be trained on labeled data containing the review and the theme to which it belongs.

Both LDA and LSA rely on the fact that reviews belonging to the similar theme would have approximately similar distribution of the words. Once trained and fine tuned,  it can then be used to identify the theme(s) in a particular review. Here is the flow chart describing above process: 

User Interface (UI) for HR Managers/Team:

Whatever we  have seen above is something that will work behind the user interface. Building an attractive and user friendly Interface will help HR managers or teams to directly identify the key themes that are present in the review data. Post this, they can take corrective actions to improve employee engagement.

Structure And design of the UI may vary from one organization to other. A typical UI will take URL of the website as input from where the reviews are to be crawled. It will then output the total number of reviews found, their distribution into positive and negative ones and key themes containing these reviews.

UI can have a Bar Graph (similar to below) showing proportions of the themes in negative and positive reviews set. Themes having more negative reviews can be looked at by HR managers and appropriate action can be taken. From the below graph we can see that food and timings are themes that must be worked upon at first:

Conclusion :

Above application can be really helpful for HRs of any organization to identify the key areas where the organization is lacking employee engagement. In today’s digital age where data is constantly being generated, using this data appropriately can be the cutting edge factor for any organization over the competitors.

Author: Apurv Sharma (Machine Learning Intern), Kavita Yadav (HR SME)

I agree to have my personal information transfered to MailChimp ( more information )
Join over 3.000 like minded AI enthusiasts who are receiving our weekly newsletters talking about the latest development in AI, Machine Learning and other Automation Technologies
We hate spam. Your email address will not be sold or shared with anyone else.

Leave a Reply