Data Science in Talent Acquisition: Optimising Resume Screening & Likelihood of Candidate Joining

Author: Shashank Raj (Business Analyst) and Kavita Yadav (HR)

Human Resource Management (HRM) teams are a vital part of every organisation. They are responsible for management and development of employees in an organization. One of the most important tasks for any HRM team is talent acquisition.

Hiring managers spend a lot of time and energy to find the right talent for their organisation. The risks of delayed onboarding include negative impact on topline business productivity, existing employees’ engagement and company reputation. To hire the most suitable candidate, the HR department must act fast to onboard new employees before they are snapped up by the competition. This is especially critical for hiring in new age technologies due to the limited pool of qualified candidates available. This is where HR Analytics comes into picture which helps to streamline the hiring process, resulting in significant cost and effort cutting for the organization. 

Problem Statement:  

Currently, HRM teams use keyword matching method to sort the resumes from the pool of candidates. This fuzzy match process identifies the resumes which have the desired keyword mentioned in the profiles. Many times, this results in a lot of non relevant resumes which involves manual filtering and wastes a lot of time.

Case in point:

If there is a need for a Big Data candidate with experience in the Telecom sector, the job portal recommends all profiles that mention Big Data and Telecom. But there could be a case where the candidate only has a graduation degree in Telecom but lacks relevant experience in that sector. So, the candidate is not a suitable match for the role. As a result, the recruitment team ends up wasting a lot of effort only to discover the same. 

Business Need:

The key objective of the solution is to come up with:

  • A model which sorts resumes to find candidates for the desired role. The model can score the resume based on attributes like projects done, education, current location, job location, previous organization, internship projects, certification, skills, work experience,  among others.
  • A prediction model to provide the propensity of the candidate accepting the offer post recruitment process.

Proposed Solution:

Machine Learning based text-mining with the application of Natural Language Processing (NLP) helps in analyzing complex documents such as resumes. Python based NLP algorithm helps to parse through resumes and identify the presence of the key attributes which are being sought after by the hiring team.

What does NLP do?:

Natural Language Processing is the ability of a computer program to understand the human language as it is spoken or written. From resume screening to employee engagement, NLP can analyze interactions  to accelerate the recruitment of quality candidates.

Terminologies in NLP:

  • Latent Dirichlet Allocation (LDA): LDA is a topic modelling algorithm. The goal of LDA is to map a document to topics such that the document is mostly captured by the topics. 
  • N-Gram Approach: N-Gram is a sequence of n words, phrases or syllables in a text. N-gram approach is used to assign probabilities to these sequences.
  • Tokenization: Tokenization is the task of chopping a text  into pieces, called tokens. These tokens are used to identify patterns and further used in the text mining process.


  • NLP model to summarise resume & assign score

We use feature extraction to generate features from resumes (both Handcrafted & Autonomous). These features are then used as independent variables in a network to assign similarity matching score.

Indicative attributes: Experience, Projects done, Educational Qualification, Skills Set, Project Duration, Number of Projects, College Tier, GPA etc.

As the next step, the developed algorithm specifies attributes to be extracted along with their patterns and extraction-method. It also specifies the section within which the given entities are to be looked for.

  • The propensity of joining

The predictive model is designed to predict the propensity of a candidate accepting  the offer. We collected the attributes of a candidate which the HRM teams use to gauge the eagerness of any candidate in joining the organisation.

Indicative attributes: Notice Period, Serving Notice, Years of Experience, Last Tenure and Salary, Frequency of Job Changes, Perks, Rewards and Recognition, Length of Commute, Company Policies, Candidate’s Behaviour, Current and Past Employers’ Rating etc.

We ran Multivariate analysis with historical data, for the attributes and acceptance of offers. This analysis helps in determining attributes that should be considered for the model development.

  • Building & validating the model
    • Build the model on existing data set. 70% of the dataset will be used to build the model
    • Validate the model against 30% of the dataset. Rebuild if performance characteristics don’t  match.
    • Algorithms to be tested: Random Forest, Neural Network, SVM, Gradient Boosting, Trees, BNN & Logit Functions

The model rates the applicants on a scale of 0 to 1. 0 being least likely to accept the offer and 1 being the most likely.

Business Importance:

Data science in talent acquisition enables HRM team to answer most critical recruiting questions with confidence and improve the team’s business impact. Here are a few benefits of this data-driven strategy:

  • Improve quality of hire: helps you discover what makes an effective job candidate — and a bad one
  • Predict speed of hire: provide more accurate hiring time estimates to stakeholders and positively impact the business productivity
  • Improve the candidate experience: bring the factors that impact candidate experience to the forefront, and quickly measure the effectiveness of each
  • Deliver on recruiting capacity: create data driven hiring plans that are continuously updated to reflect the most current needs of the company, be it hiring on new age technologies or traditional job positions


Machine learning applications are changing the landscape of talent acquisition. This will benefit talent acquisition teams by enabling recruiters to become more strategic by spending time on proactive hiring and workplace planning, while also enhancing candidate experience.


I agree to have my personal information transfered to MailChimp ( more information )
Join over 3.000 like minded AI enthusiasts who are receiving our weekly newsletters talking about the latest development in AI, Machine Learning and other Automation Technologies
We hate spam. Your email address will not be sold or shared with anyone else.

Leave a Reply