Compensation is one of the key culture-definers for any organization. Every organization strives to decide the best compensation for its employees during talent acquisition as well as internal appraisal cycle. This is crucial, because if the employee is underpaid then it may lead to attrition and in the case of overpaid it may impact the profitability of the company.

But, determining the “right” compensation can be tricky because a number of factors play a crucial role in determining compensation rates that are both fair and competitive.

Problem Statement

In the age of fierce competition, one of the important aspects companies are looking for is to periodically benchmark their compensation structures. This has lately been a challenge and companies mainly rely on external benchmarking agencies such as Mercer, Michael Page, Kelly Services, Glassdoor, Payscale etc. 

Business Need

The key objective of the solution is to come up with:

  • A data crawling engine to extract industry standard compensation data from open public sources
  • Standardise data for business consumption
  • Develop Machine Learning (ML) based compensation estimation model and compensation cuts across Experience, Skills, Domain and Education

Proposed Solution

Platforms like Glassdoor are crowdsourced and they have a database of more than a million salaries and reviews. Other global and country specific reports are published by different agencies which give a true picture of market compensation structure based on prevailing economic conditions. Example: U.S. Bureau of Labor Statistics and PayScale’s report for India are a few who provide comprehensive stats around salary structure for different sectors .

Instead of using the old indicators of age and tenure to estimate compensation structure, ML based algorithms take into account many additional factors such as recent changes in role, pay level, rates of change in pay and incentive eligibility to refine the prediction of compensation. This allows companies to be more successful and effective in managing the compensation of their employees.

Job attributes: Location, Requirement of the job, Occupational group

External data: GDP growth, Inflation, Asset growth, Job growth, Unemployment rate, CSO data


  • Data collection: a Python-based data crawler developed earlier parses and gathers the necessary information from the website
  • Data cleaning: posts with missing values are removed and possible conflicts in the data format (e.g. text encoding) are fixed.
  • Feature engineering: irrelevant features are discarded and others are standardized (e.g. converted into numerical features) by exploiting the domain knowledge
  • Model training and validation: the selected models are trained and cross-validated in order to find the classifiers that best describe the data and are able to predict the output variable with the highest accuracy
  • Model comparison and selection: each model is compared to the others with respect to accuracy and the best performing champion model is selected

Models tested using: K-means clustering, Randomized linear regression, Logistic regression.

Modelling Methodologies: 

  1. The K-means clustering model estimates the salary by finding the group of jobs containing similar profiles. The other models estimate the salary based on the features used. 
  2. Regression Models – Since salary is a continuous variable, the regression models can very well estimate the same with high accuracy. The model is integrated with the HRM system. When an employee is added to the system, the model measures key metrics against the employee and returns the estimated compensation. 


Compensation benchmarking is becoming an indispensable aspect for every organisation. Whether it is a small firm or well established large organisation, it is important to benchmark the pay structures, allowing you to maintain externally competitive and internally equitable pay over time.

HRM teams who are in charge of recruiting, hiring, and retaining talent for the company, know the challenge of competing against other organizations to attract the right employees. By using data driven compensation benchmarking models, you can protect the interests of current and future employees while ensuring the company’s growth.

Leave a Reply

Your email address will not be published.