Case Study

Case Study Post Type

Excavators excavate earth at the construction site
Industrial

Machine Learning Driven Chip Size Classification and Cutter Optimization

Client Background Client is a multinational metals and mining corporation, producing iron ore, copper, diamonds, gold and uranium. Context and Business Objective Client employs operators on different locations to mine the diamonds. These operators are responsible to run cutting machines with different parameters such as wheel speed, rotation rate, down speed etc and execute diamond mining. Depending on the parameters set by the operator, size of the extracted rock-chips varies. So, the client wanted to build a solution that informs the cutter operator what optimal cutter settings should produce the largest chip size. Current Approach Client heuristically identifies best tunable parameters to get better output. But, this approach has couple of drawbacks: It doesn’t quantify the contribution of individual parameter to the output This approach does not account for the interaction between two parameters, as all heuristic are dined in one dimension or max two Proposed Approach and Solution Valiance proposed to build a Mathematical relationship between tunable parameters and the output (Good/Bad). Advantages of this approach are given below: We can approximate quality of the output for any given input parameters without testing it in the field This approach quantifies the contribution of individual parameters and accounts for the interaction of tunable parameters Once the relationship is developed between tunable parameters and the output, we can move algorithmically within the search space to find globally optimal tunable parameters. As trench digging goes deeper inside the earth, soil level and lithology of the rock changes. So, our Machine Learning engine optimized the tunable parameters for a given depth, lithology and bit. Modeling Pipeline We developed a classification model by regressing the quality of the chip size on above mentioned independent variables Trained the model on 80% of the samples and validated the model on 20% of the samples Assigned 0 to good and 1 to bad because in the optimization we used standard minimization problem We validated four parametric classifiers: Parametric Classifiers: Logistic Regression; Linear Discriminant Analysis (LDA); Perception and Naïve Bayes and one tree based classifier: Tree Based Classifier: Gradient Boost Machines Optimization Model Performance and ROI Our optimization exercise was able to convert ~60% of the bad sample into good sample. We observed that the tree-based model outperforms all other models with respect to variation and bias both. Also, the model showed an accuracy of 78%. The conversion rate was improved by ~15%.

HR managers interviewing job applicant
Financial Services

Agent Recruitment Funnel Optimization for an Insurance Distributor

Client Background Client is a digital insurance platform that helps corporate and individuals with their insurance needs. The platform allows customers to search, compare and buy various life and general insurance products online. Business Context To facilitate customers in buying process, client hires Point of Sale (POS) persons as agents. Following is a thorough process which agents recruitment undergo while coming onboard: Lead Generation: Lead initiates contact by filling up a form with basic demographic information KYC: Certain documentation is required by client to verify the Lead Training: Leads are trained to understand end customer’s requirement(s) and provide policies accordingly Certification: Client conducts exams to help leads become certified insurance agent recruitment Activation: Any lead that sells at least one insurance policy becomes active Repeat: Any lead that sells more than two insurance policies is a repeat A variety of communications is sent by client to the leads throughout this process to track activity and measure engagement. Business Problem and Objective Client was facing a significantly high level of attrition across each stage in the onboarding process. While this was expected, out of total leads generated: Only 5.7% reach the activation stage Only 1.2% reach the repeat stage The client wanted to increase the efficiency of this recruitment funnel leading to decreased training and onboarding costs as well as more policies sold. To achieve this objective, the client wanted to engage with a solutions provider that can provide the following: Help the client stitch together all the data collected throughout the onboarding process Analyse the stitched dataset to generate insights that can help in optimizing the process and filter unwanted leads Increase the conversion rate of leads towards active and repeat Solution Valiance took a data driven approach to offer insights and indicators for optimizing the recruitment funnel and detecting high-potential recruits early. By performing a preliminary analysis we decided to use following datasets for further analysis: Demographic Data: age, gender, education, state App Usage: event name, timestamp of event, page, section Training Video Consumption: video start timestamp, playlist ID Call Data from Call Centre: call duration, call timestamp Timestamp of Stage Transitions: KYC done timestamp, KYC verified timestamp We created a schema for all the data that we captured across all stages of the funnel. This helped us to look at the complete DNA of an individual agent recruitment.   Insights and Recommendations Following indicators of progression through the funnel and productivity were discovered during the project: Leads aged 25 and above tend to be retained through the funnel better as compared to agents aged below 25. There seems to be a direct correlation between usage of the “Risk Meter” feature in the app before first sale and overall productivity. Leads who are called within 1 day of progressing from one stage to the other have a higher chance of progressing through the funnel. Basis these insights, we provided following recommendation to client to optimise the funnel: Target leads aged 25 and above for lead generation. Monitor the use of “Risk Meter” feature by leads during the early stages and help them progress through the funnel quickly. Incentivise the use of this feature during the early stages of the funnel. Introduce training videos to help leads learn how to use this feature. Monitor stage transitions and prioritise calling leads who have progressed from one stage on the same day.

Concept of pharmaceutical business, illegal pharmaceutical busin
Healthcare

Marketing Spend Optimization For Pharmaceutical Enterprise

Client Background Client is a US based global, research-driven bio-pharmaceutical company committed to developing innovative advanced therapies for some of the world’s most complex and critical conditions. It caters to three therapeutic areas: Dermatology, Gastroenterology, and Rheumatology. Business Objective Physicians or Health Care Professionals (HCPs) are the most significant element in pharmaceutical sales as their prescriptions determine which drugs will be used by patients. Therefore, the key to pharmaceutical sales for any pharmaceutical company is influencing the physician. To do this, sales reps spend on following marketing tactics/activities to drive sales – Closed Loop Marketing (CLM) Calls, Non-CLM Calls, Internet Live Seminars, Lunch sessions. It was observed that 18% of the current sales is driven by CLM and Non-CLM calls executed for three therapeutic areas: Dermatology, Gastroenterology, and Rheumatology CLM calls provide better ROI compared to Non-CLM calls but since they are 1.4X costlier, majority of investment has been made in Non-CLM callsTherefore, client wanted to optimize the marketing spend for CLM and Non-CLM calls so as to increase the revenue and NPV (Net Present Value) (NPV accounts for the cumulative profitability of a spend in a tactic i.e. NPV = Present Value – Investment) Solution Our data science team considered following inputs for analysis Activity Data: Promotional sales activity data and other sources to capture (a) level of engagement (reach and frequency) and (b) type of interaction (e.g. Face-to-face, lunch and learn) with the HCP Demand Data: Market data from syndicated sources, to capture sales which can be tied to promotional activities to compute impact of the promotional sales activity Financial Data: Financial Data from internal sources to estimate cost of the sales force activities to enable analysis of ROIMultivariate regression was performed on sales and marketing tactics to estimate and forecast the impact on sales and using Monte Carlo Simulation, response curves were built to define the contribution of the investment to the sales i.e. the Return on Investment (ROI)NPV plot against marketing spend suggested that marketing spend of 1.85 Million Euros wasn’t resulting in optimum sales. The ROI would be maximum when an investment of ~0.30 Million Euros is made in this sales tactic. Outcome Following optimization opportunities were unearthed from the insights we generated with modeling exercise: Decrease Investment & Hold Revenue Steady: Marketing mix can be optimized to generate previous year revenue while decreasing investment by -44% Hold Investment Steady & Increase Revenue: Using the last year’s marketing budget, revenue can be increased by 30 Million Euros in this year Maximize Profitable Investment: Using Maximized Investment scenario, an additional 45 Million Euros in revenue can be captured with an additional investment of 15.4 Million Euros

Online shopping and delivery concept, product package boxes in c
Others

Data Lake and Data Warehousing for B2B ecommerce on AWS Cloud

Client Background Client is an Indian e-commerce company that connects Indian manufacturers or suppliers with buyers and provides B2C, B2B and C2C sales services through its web portal. Business Objective Client has deployed an on-premise Oracle RDBMS infrastructure to store its data in order of 500 GB at present which grows 20GB on a monthly basis. The current Oracle RDBMS infrastructure is used to manipulate data through Stored Procedures, extract the data manually through a WEBERP system and use that extracted data to create the business reports manually in MS Excel. The reports were based on monthly data with limited view into daily trends. The client wanted to revamp the data processing and reporting process so that reports can be generated daily or on demand. Solution Our data engineering team understood the client’s current data setup & future technology roadmap. It was decided to go with AWS Redshift for data warehousing needs. Additionally, a data pipeline & reporting solution was also proposed to be built using serverless components of AWS, namely, AWS Glue & AWS QuickSight. We were provided with a set of 50 business views (reports), covering 90% of business queries. It was also desired for data warehousing design to support any future adhoc information request. Following development steps were followed We built an intermediate staging layer to capture data from the input source as it is with limited transformation. This meant we would have entire historical data in one place that could be referred to later if needed. This data layer was provided by S3. Created a logical view of the data warehouse model to capture all facts and dimensions provided in business views. Logical model once approved was converted into a physical AWS Redshift model with appropriate partition and sort keys. Although data warehouse is supposed to be “write once, read many”; it was requested to allow the client team to make infrequent updates for records that are prone to be modified. Initial data for tables (multiple files for each table) was provided in a pre-agreed S3 location in csv files. Each row in the csv file had a logical operator indicating record was meant for Insertion, Updating or Deletion. Tables further had hierarchical relationships and dependencies between themselves which meant that any update or deletion logic in ETL jobs had to handle complex sequential atomic update scenarios. Glue jobs were created to move data from S3 to the staging layer (AWS Redshift tables acting as temporary storage for daily processing). At this stage, only columns needed for business reports were moved with changes in data types for some columns. Post having data in staging tables in AWS Redshift, AWS Redshift queries (as a part of cron job deployed on EC2 instance) were used to move data into warehouse tables in AWS Redshift itself. At this stage, aggregations and selections were performed. Job performance metrics and associated logs were stored in a metadata table and notifications were sent via email to the developer team on completion or failure of any step of the data pipeline. Outcome We went live with the data warehouse in 6 months with 3 years of historical data. Incremental processes were set up to ingest data on an ongoing basis with an automated mechanism for handling any failures.

Electricity pylons bearing the power supply across a rural landscape during sunset. Selective focus.
Industrial

Data Analytics Platform for Power Distribution Utility

Client Background Client is a state-owned power distribution utility & service provider responsible for providing electricity to nearly 1.6 million customers. Business Need The client required real-time management of the grid infrastructure, which was quick, reliable and efficient. The client was using an OLTP (DB2) system with on-premise DB2 RDBMS infrastructure in the backend. A query to the DB2 database was required every time to extract data for any ad hoc data requirement or for analysis. The current infrastructure did not support MIS requirement and especially ad hoc analysis. Moreover, keeping such a high volume of data in an OLTP system is not recommended since the cost of data management on disk takes over the query processing time. The only way to store/process more data in an OLTP system is to scale up vertically which is a costly affair. Solution-Data Analytics Platform Valiance proposed to develop scalable Data Lake on top of AWS cloud infrastructure that would allow different business units and stakeholders to access insights across multiple sources of information whenever required. This data analytics infrastructure would bring all the data at one place; perform ad hoc analysis and be future-ready for more sophisticated predictive analytics workloads leading to smarter operations. Proposed platform comprised of following key components. Data Lake (Amazon S3) to ingest and store datasets on an ongoing basis. This would allow BI and analytics team to request data as per their need. Any downstream applications can also feed from Data Lake. Process the data as and when required using AWS EMR (Elastic Map Reduce) Enable ad hoc query functionality over Data Lake in S3 using AWS Athena/AWS Redshift Create Dashboards to view reports of the trends as per most recent data in AWS Quicksight Technical Architecture (based on AWS)   Key highlights of the solution Data lake needs to ingest both historical data and incremental data that DB2 will get in future. The first step was to extract the historical data from DB2. We used Sqoop for this purpose. Our team had several brainstorming sessions with the client in setting up timelines to execute the Sqoop jobs. These jobs were scheduled during after hours (night) when there is the least impact of Sqoop jobs on the existing applications. Once the data was extracted, the next step was to push the data to AWS S3. AWS Snowball service was used to push the one-time historical data into AWS S3. The next step was to handle the weekly incremental data. In this case, the team set up a CDC (Change Data Capture) process using Sqoop and Spark to push the weekly data into AWS S3 using S3 multipart upload. The Sqoop jobs were automated using bash scripts which would call Spark scripts (written in Python) to get the changed data weekly. Both these scripts were hosted on an on-premise Linux machine which was connected to AWS Cloud. Once the CDC process was complete, S3 multipart upload script was called to upload the data into the S3 data lake. The S3 multipart upload script was written in Python using the official boto3 library. Post data migration in S3, AWS EMR was used to process the data for insight generation. AWS Lambda scripts were then created to spin up the EMR cluster, run the data processing jobs written in Pyspark and then terminate the cluster when the job finishes. The output of EMR jobs was stored into two different sources. Frequently queried data was ingested into AWS Redshift for faster and effective query response while other data was kept in AWS S3 for an ad-hoc query using AWS Athena. The team automated the weekly data manipulation process via Python/Pyspark scripts. boto3 library was used to automate the AWS Cloud process. The official AWS Developers documentation was used as a reference for each of the component – AWS S3 and AWS Redshift for this purpose. Automation scripts were deployed into AWS Lambda and scheduled to execute the script at a mutually agreed time. AWS Quicksight was used to present the reporting data. The reports developed were populated with data within 10 seconds due to the current setup.

View of Bangkok City at sunset
Others

Datawarehousing on AWS Redshift for Telecom Operator

Client Background Client is an Indonesia-based mobile telecommunications services operator. The operator’s coverage includes Java, Bali, and Lombok as well as the principal cities in and around Sumatra, Kalimantan and Sulawesi. Client offers data communication, broadband Internet, mobile communication and 3G services over GSM 900 and GSM 1800 networks.   Current Scenario & Business Need Currently, client has an on premise Teradata Warehouse infrastructure to store their multi Terabytes of data which is increasing in volume day by day. The current Teradata infrastructure has already been exhausted by as much as 70% which prompted the client to look for alternative solution apart from scaling the Teradata warehouse vertically. The maintenance of on-premise infrastructure at such a scale is also a concern for client which made the client look for a cloud based data warehousing solution. The existing client setup used Ab Initio for ETL requirements which increased the cost of operations. The client wanted a low cost ETL solution for the cloud infrastructure. Client had an existing PowerBI licence which the client wanted to utilize to generate reports to data-driven business decisions. Solution Our team met business stakeholders to understand their current setup, overall business requirements and propose a solution. We decided to develop a scalable Data Warehouse, ETL and reporting solution for the client using AWS Cloud Services. We used AWS Redshift to create a Data Warehouse AWS Glue for the ETL process Existing PowerBI for reporting purposes Solution workflow Data Extraction & Storage: We extracted the relevant data from Teradata warehouse and stored it into a Virtual Machine in CSV, Avro and Parquet formats to test the compatibility of different data storage formats with the ETL process – AWS Glue. The files were uploaded to S3 from the virtual machine via a secure channel to make sure that the data is not compromised during transit. Data Preparation: Our next step was to prepare the data, i.e., perform the logical operations on the data so that it can be stored in the data warehouse as well as could be used for reporting purposes with least/no manipulation. We wrote ETL scripts using pyspark and scheduled those scripts to be executed at some specific time using AWS Glue. Data Processing: AWS Glue processed the data from S3, made necessary transformations and stored the resultant data into AWS Redshift. We wrote more data manipulation queries in SQL to further process the data stored in AWS Redshift and prepare it for reporting. Data Visualisation: We used PowerBI to present the reporting data. We installed powerBI desktop on an EC2 instance so that the data can be accessed from AWS Redshift with minimal network latency and reports can be generated in seconds. Result Migration from on premise tera data environment to AWS cloud resulted in significant gains to the customer with reduced infrastructure and storage cost, lower maintenance costs, improved reliability and much needed agility for further analytics roadmap.

security
Industrial

Video based defect identication for Foam manufacturing unit

Client Background Client is a leading Polyurethane (PU) Foam manufacturer with manufacturing plants across the country. It manufactures foams for mattresses used in homes, auto industry, hospitals. Business Objective Present Quality inspection process for inspecting foams is completely manual with multiple people required to keep constant vigil on the output for identifying any quality issues. These issues prominently are cuts, discoloration, holes. Irrespective of tight manual oversight, a 3% of bad sample rate is still observed which leads to the entire material lot being rejected by the customer thus leading to reduced profitability & ultimately customer dissatisfaction. In addition this manual inspection process places limitations on the foam output with machines running at sub-optimal speeds to enable manual oversight. It is thus desired to use AI & computer vision enabled technology to automate the quality control. Solution Valiance proposed to deploy a suite of IOT enabled video cameras close to machine edge’s to observe the output foam and using AI & computer vision, flag any abnormalities by raising an alarm. In order to train the models, our ML team collected training data from the manufacturing plant by taking several images of foams during manufacturing stages in different lighting conditions with “Ok” and “Not Ok” characteristics. These images had to be manually labelled into “Ok” and different defect categories. Data thus collected was used to train deep learning classification models on AWS cloud and resulted in 95-99% recall of defects across different defect categories. ML models were exposed through API gateway, Kinesis stream and lambda functions to detect defects in incoming video feeds. Deployed solution include Camera units with local connectivity Local Internet gateway Dashboard to monitor defects identified by ML algorithms Outcome Solution was deployed across one manufacturing unit to being with and showed promising results within first month itself Throughput was increased by 5% because machines were able to run for a longer period of time. Incidences of humanly identifiable defects were reduced to less than 1 percent.

Businessman hand protecting to virtual human icon for focus cust
Others

Customer 360 for Digital marketing company

Client Background Client is a prominent Indian digital engagement solutions company providing brands with multi-channel digital solutions to connect with their customers 24X7. Its analytics enabled cross-channel targeting platform enables brands to deliver personalized engagement to their customers. Its digital channels have a reach of 500 million plus end customer base powering 200 billion plus interactions on monthly basis. Business Objective Majority of customer interaction powered by client’s digital platform comprises of unstructured data. These interactions alone generate nearly 5 TB of data per month and contain a wealth of information about consumer lifestyle and their preferences. This information once discovered could truly be an asset to the marketing teams as they look drive increased customer engagement and transactions across brands. The client had no in-house capability to store and mine such datasets at scale. It was therefore desired to create a technology infrastructure that can scale with data and allow mining of unstructured data to discover customer preferences & lifestyle information. Such a platform needed to leverage text mining & natural language processing capabilities to understand unstructured texts and use machine learning algorithms to predict customer preferences wherever missing. Client didn’t have any in-house team with data science & big data skills or any prior experience in with such technologies. Valiance was chosen as a technology partner for this initiative to enable discovery of best possible solution and implement it successfully. Solution Our team of business analysts, data scientists & data engineers worked closely with client’s business and product team for the first three weeks to arrive at a right methodological approach comprising text mining framework, machine learning approach and technology platform that would achieve our business objectives. A POC was done initially to validate the approach for accuracy & completeness of results. After a month we had narrowed down to: Hadoop (Open source distribution) with HDFS for storage and Map-reduce for text mining in batch. Spark for training machine learning algorithms. HBase as storage for structured customer data. Text mining & machine learning approach to discover & predict customer attributes. In next 5 months our team performed following activities: Data engineering team created Hadoop based infrastructure to process TB’s of unstructured data. Data science team created text mining rules and NLP algorithms to discover customer attributes including but not limited to age, gender, spending pattern, number of credit cards, number of kids, travel frequency, mutual fund info. We were able to do limited discovery through this due to sparse nature of data. However, it gave us sufficient samples to create training sets for training purpose. Business analytics & data science team also recommended additional customer attributes that could be useful from a marketing standpoint along with acting as a useful predictor in the identification of missing ones. Data science team trained ML algorithms to predict missing customer attributes using discovered attributes. Different algorithms were experimented to arrive at winner algorithms. Results of text mining & machine learning predictions were shared with client’s team regularly to take their feedback in improvement. At the end of 5 months, we were able to arrive at the first version of results with robust & accurate customer data. ~450 customer attributes thus discovered were exposed to marketing team through open API’s. Post initial deployment we are constantly engaged with the client to improve the accuracy of ML algorithms and refine text mining rules. Outcome Enabled discovery of customer attributes through mining of unstructured data as against no information before. This resulted in intelligent marketing campaigns with higher ROI compared to baseline. 10% increase in customer engagement from digital interactions in 2 months of testing with different marketing campaigns.

Human Resources and People Networking Concept
Financial Services

Predictive B2B customer acquisition

Client Background Our client, a fortune 100 financial services firm, is a leading issuer of credit card, charge card & travelers cheque. It’s a leader in both the consumer & corporate cards. A whopping 30% of all credit card transactions in the US are attributable to their credit card business. Business Objective Client desires to build a B2B business intelligence platform to have Customer acquisition model to find out potential buyers of the corporate credit card by exploring data of corporate clients from historical company data, social media trends, news/RSS feeds. The model should also predict estimated revenues. Interactive dashboards to show graphical reports of different matrix-like Product, Revenue, Potential sales, Customers etc. Database of companies consisting of their details like Name, address, contact information, revenue, no. Of employees etc Target companies list for marketing purpose. These companies can be their potential customers. Current process of accomplishing the above is completely manual and time consuming with the sales team collecting data in different spreadsheets from third-party websites. Information in the spreadsheet is manually cleaned and enriched before feeding into a common database. The platform to be developed is proposed to do away with this manual process. Solution Our team proposed to create a cloud-based application with modular architecture using open source technologies i.e.  Angularjs, Django, MySQL, Python. Microsoft Azure was chosen as the cloud platform for application deployment. Application comprised of the following modules Data Collection: Web Crawlers using python & selenium Data Transformation & Aggregation: Cleaning & enriching the data like standardizing of text fields, creating derived fields. Finally aggregating different company datasets to create a single view of the company. Lookalike models: Set of models to identify potential companies from new datasets that looked similar to the existing customer base. These companies, showing a high degree of similarity, would be then targeted by the sales team for corporate cards Product Recommendation Model: Ranked list of products to be pitched to potential targets User Interface: web-based application for sales team & management to View and segment list of prospects using similarity score and other attributes. Allocate a list of prospects to individual sales team members based on hierarchy and other business rules. Export the list to internal sales CRM with excel export. Import the list of existing customers and retraining the model for finding similar leads. Outcome Application went live for Switzerland market in 3 months time with the manual effort of the sales team being reduced to 50% within 1st month of implementation and down to nearly zero by the third month. Sales effort focused on a narrowed list of companies (based on similarity scores and recommended products) produced 75% more leads compared to existing process in 6 months of testing. The tool was further expanded to Middle east and other European markets

Magnifier glass over the red inscription best content cut out of paper. Surrounded by other inscriptions on a dark background. Word cloud concept.
Financial Services

Personalized financial content recommendation

Client Background Client is one of the China’s premier financial services portal providing stock market news, personal finance advice and latest happenings in the financial world across the globe. It attracts millions of Chinese visitors per month on its website who engage with its content. Subscribed visitors are also served content as email digest and browser notifications. Business Objective As a digital content provider, it’s essential to keep your visitors engaged by serving them meaningful & relevant content at right time. There are millions of articles covering various aspects of finance, stock markets & news that need to be analyzed and recommended. Till then rule-based approach with text mining had been used mostly for content discovery & recommending similar articles to a particular article. This yielded results to an extent but wasn’t expected to scale. Client wanted to adopt a more sophisticated approach using Natural Language Processing & Machine Learning as the foundation. Solution Our team suggested narrowing down the scope of the solution to suggesting relevant articles for any page visit. This had to be done in real time and we had the context available in form of current page visit. Considerations of integration of algorithms within the current platform and scaling to serve real-time recommendations would be taken care by clients technology team. Our approach involved following steps: Convert every article into a feature vector using n-grams. Each article to be scored with other articles to determine a similarity score. There are different techniques present, so, we will use multiple and evaluate the results. Additional business rules will be overlaid on top of similarity score. These rules cover trending articles, popular articles, date of publish & geography preferences. Initial exercise will be limited to a sample dataset. Once validated by the client we will scale the solution in collaboration with client’s technology team. Client provided us with sample article corpus on Amazon S3 bucket. We created a spark cluster on AWS for processing. Every article was converted into a feature vector and scored against each other for similarity rating. We also created business rules for computing popularity scores, trending articles factoring in time decay and visits. Working on the solution was demonstrated through simulated incoming article request that was used to discover feature vector of current article and discover articles with high similarity scores. Article result set was further segmented by popularity, trending, time decay factor along with other visitor preferences. Outcome Removed the need for manual classification with automated tagging accurate at 80% levels. Accuracy was verified manually by client’s team. Controlled pilot on the sample of visitors resulted in increased visitor engagement by up to 20%.

Scroll to Top