Client Background

Client is a prominent Indian digital engagement solutions company providing brands with multi-channel digital solutions to connect with their customers 24X7. Its analytics enabled cross-channel targeting platform enables brands to deliver personalized engagement to their customers.


Its digital channels have a reach of 500 million plus end customer base powering 200 billion plus interactions on monthly basis.


Business Objective

Majority of customer interaction powered by client’s digital platform comprises of unstructured data. These interactions alone generate nearly 5 TB of data per month and contain a wealth of information about consumer lifestyle and their preferences. This information once discovered could truly be an asset to the marketing teams as they look drive increased customer engagement and transactions across brands.


The client had no in-house capability to store and mine such datasets at scale. It was therefore desired to create a technology infrastructure that can scale with data and allow mining of unstructured data to discover customer preferences & lifestyle information. Such a platform needed to leverage text mining & natural language processing capabilities to understand unstructured texts and use machine learning algorithms to predict customer preferences wherever missing.


Client didn’t have any in-house team with data science & big data skills or any prior experience in with such technologies. Valiance was chosen as a technology partner for this initiative to enable discovery of best possible solution and implement it successfully.



Our team of business analysts, data scientists & data engineers worked closely with client’s business and product team for the first three weeks to arrive at a right methodological approach comprising text mining framework, machine learning approach and technology platform that would achieve our business objectives.


A POC was done initially to validate the approach for accuracy & completeness of results.


After a month we had narrowed down to:


  1. Hadoop (Open source distribution) with HDFS for storage and Map-reduce for text mining in batch.
  2. Spark for training machine learning algorithms.
  3. HBase as storage for structured customer data.
  4. Text mining & machine learning approach to discover & predict customer attributes.


In next 5 months our team performed following activities:


  1. Data engineering team created Hadoop based infrastructure to process TB’s of unstructured data.
  2. Data science team created text mining rules and NLP algorithms to discover customer attributes including but not limited to age, gender, spending pattern, number of credit cards, number of kids, travel frequency, mutual fund info. We were able to do limited discovery through this due to sparse nature of data. However, it gave us sufficient samples to create training sets for training purpose.
  3. Business analytics & data science team also recommended additional customer attributes that could be useful from a marketing standpoint along with acting as a useful predictor in the identification of missing ones.
  4. Data science team trained ML algorithms to predict missing customer attributes using discovered attributes. Different algorithms were experimented to arrive at winner algorithms.
  5. Results of text mining & machine learning predictions were shared with client’s team regularly to take their feedback in improvement.
  6. At the end of 5 months, we were able to arrive at the first version of results with robust & accurate customer data. ~450 customer attributes thus discovered were exposed to marketing team through open API’s.


Post initial deployment we are constantly engaged with the client to improve the accuracy of ML algorithms and refine text mining rules.


  1. Enabled discovery of customer attributes through mining of unstructured data as against no information before. This resulted in intelligent marketing campaigns with higher ROI compared to baseline.
  2. 10% increase in customer engagement from digital interactions in 2 months of testing with different marketing campaigns.