Case Study

Case Study Post Type

Risk Assessment Graph Chart Spreadsheet Table Word
Financial Services

Credit risk assessment for Micro loans

Our client Our client is a global leader in mobility solutions. It offers Digital VAS, Mobile Finance, and Customer Management solutions to both telecom operators and microlending institutions. These telecom and microfinance players are currently spread out across 15 African countries. Why they came to us Our client had a customer base of 1-10 million in each of the 15 countries where they were operating. They were capturing 2TB data every single month. Assessing credit risk and credit line assignment was a huge challenge given the diversity of the loan applicants and the volume of data. The client understood that they needed an advanced solution that could leverage all the data they were collecting and define credit limits accordingly. The problem Our client’s current risk assessment algorithm was based on simplistic internal rules. It assigned pre-defined credit limits to users based on fixed segments. And there was no way of reassessing a user’s credit-worthiness based on all the data that our client was collecting across the user journey. Our strategy We decided to build an Adaptive and Intelligent Risk Assessment platform using a Machine Learning Model. This would involve incorporating various data points that were already being collected. These included: Demographic Financial Telecom: CDR (Call Data Records), GPRS, BTS App Activity SMS The overall goals were to: Offer higher credit limits to users who repay their loans on-time and lower limits to riskier users Automate the process of risk evaluation and reduce manual oversight to a large extent Improve risk assessment of micro-loans, reducing borrower defaults significantly Leverage the new risk assessment framework to reduce the turnaround for approval and disbursal of loans by creating a more advanced credit Build a model that scales seamlessly as our high-growth client expands its footprint The implementation We built a Quantitative Framework based on Machine Learning(ML) and Non-Linear optimization. As part of the implementation: We created an ML-based algorithm to assess credit risk for micro-loans. The algorithm churns out a credit Risk Score along with the Probability of Default. We built an optimization model for credit line assignment for borrowers. This model includes factors like the Probability of Default and the Portfolio Funds Available. It also aims to minimize the Exposure at Default (EAD) and maximize the Economic Value Added (EVA). The transformation We saw a 15% improvement in loan approvals over 6 months. Our framework reduced defaults by a whopping 10% over the same time period

Closeup shot of a cute elephant walking on the dry grass in the wilderness
Public Sector

Wildlife species identification from Video feed

Client Background Client is internationally acclaimed Institution which is actively engaged in research on biodiversity related issues. Their mission is to nurture the development of wildlife science and promote its application in conservation, in consonants with our cultural and socioeconomic milieu.   Business Objective The client wanted to develop an Image Classification engine to better identify Himalayan animal species captured using motion-sensing cameras. Initially, some of animal species we were given to identify were: Snow Leopard, Wooly Hare, Marmot, Ibex, Blue Sheep etc. To meet this objective, client wanted to engage service provider who could:   Use the historical image data to build an image classification model: images provided were taken during day/night and had different quality w.r.t distance and portion of animal captured in the camera        Use the developed model to identify and classify Himalayan Wildlife species identification in newly-captured images   Also, client wanted to implement this solution to:   Allow end-user(s) to identify the species in captured image Allow end-user(s) to identify misclassified images and retrain the model from the last checkpoint using the correct classification   Solution Valiance proposed to build a platform that intelligently identifies Himalayan wildlife species identification in the captured images and allows for retraining the backend model using new and labelled data.   Client provided historical image dataset for the model training purpose. Here, all the images were already tagged with labels of species. Below is the approach we followed to develop the platform:   The tagged image data was uploaded on the AWS S3 storage in batches The tagged image data was transformed into a feature matrix using transfer learning on pre-trained deep learning-based models Model was trained to predict the image tag (animal species) using the generated feature vectors Model weights were stored on AWS S3 storage and made accessible using web-based application Web-based application interfaced with the stored model using a REST API to generate predictions for new image data The new image data since the last refresh of the model weights could be incorporated into the model for retraining using the app   Outcome Following a single image classification approach, our developed model showed 89.6% overall accuracy in identifying and predicting species from the given pool.

Excavators excavate earth at the construction site
Industrial

Machine Learning Driven Chip Size Classification and Cutter Optimization

Client Background Client is a multinational metals and mining corporation, producing iron ore, copper, diamonds, gold and uranium. Context and Business Objective Client employs operators on different locations to mine the diamonds. These operators are responsible to run cutting machines with different parameters such as wheel speed, rotation rate, down speed etc and execute diamond mining. Depending on the parameters set by the operator, size of the extracted rock-chips varies. So, the client wanted to build a solution that informs the cutter operator what optimal cutter settings should produce the largest chip size. Current Approach Client heuristically identifies best tunable parameters to get better output. But, this approach has couple of drawbacks: It doesn’t quantify the contribution of individual parameter to the output This approach does not account for the interaction between two parameters, as all heuristic are dined in one dimension or max two Proposed Approach and Solution Valiance proposed to build a Mathematical relationship between tunable parameters and the output (Good/Bad). Advantages of this approach are given below: We can approximate quality of the output for any given input parameters without testing it in the field This approach quantifies the contribution of individual parameters and accounts for the interaction of tunable parameters Once the relationship is developed between tunable parameters and the output, we can move algorithmically within the search space to find globally optimal tunable parameters. As trench digging goes deeper inside the earth, soil level and lithology of the rock changes. So, our Machine Learning engine optimized the tunable parameters for a given depth, lithology and bit. Modeling Pipeline We developed a classification model by regressing the quality of the chip size on above mentioned independent variables Trained the model on 80% of the samples and validated the model on 20% of the samples Assigned 0 to good and 1 to bad because in the optimization we used standard minimization problem We validated four parametric classifiers: Parametric Classifiers: Logistic Regression; Linear Discriminant Analysis (LDA); Perception and Naïve Bayes and one tree based classifier: Tree Based Classifier: Gradient Boost Machines Optimization Model Performance and ROI Our optimization exercise was able to convert ~60% of the bad sample into good sample. We observed that the tree-based model outperforms all other models with respect to variation and bias both. Also, the model showed an accuracy of 78%. The conversion rate was improved by ~15%.

HR managers interviewing job applicant
Financial Services

Agent Recruitment Funnel Optimization for an Insurance Distributor

Client Background Client is a digital insurance platform that helps corporate and individuals with their insurance needs. The platform allows customers to search, compare and buy various life and general insurance products online. Business Context To facilitate customers in buying process, client hires Point of Sale (POS) persons as agents. Following is a thorough process which agents recruitment undergo while coming onboard: Lead Generation: Lead initiates contact by filling up a form with basic demographic information KYC: Certain documentation is required by client to verify the Lead Training: Leads are trained to understand end customer’s requirement(s) and provide policies accordingly Certification: Client conducts exams to help leads become certified insurance agent recruitment Activation: Any lead that sells at least one insurance policy becomes active Repeat: Any lead that sells more than two insurance policies is a repeat A variety of communications is sent by client to the leads throughout this process to track activity and measure engagement. Business Problem and Objective Client was facing a significantly high level of attrition across each stage in the onboarding process. While this was expected, out of total leads generated: Only 5.7% reach the activation stage Only 1.2% reach the repeat stage The client wanted to increase the efficiency of this recruitment funnel leading to decreased training and onboarding costs as well as more policies sold. To achieve this objective, the client wanted to engage with a solutions provider that can provide the following: Help the client stitch together all the data collected throughout the onboarding process Analyse the stitched dataset to generate insights that can help in optimizing the process and filter unwanted leads Increase the conversion rate of leads towards active and repeat Solution Valiance took a data driven approach to offer insights and indicators for optimizing the recruitment funnel and detecting high-potential recruits early. By performing a preliminary analysis we decided to use following datasets for further analysis: Demographic Data: age, gender, education, state App Usage: event name, timestamp of event, page, section Training Video Consumption: video start timestamp, playlist ID Call Data from Call Centre: call duration, call timestamp Timestamp of Stage Transitions: KYC done timestamp, KYC verified timestamp We created a schema for all the data that we captured across all stages of the funnel. This helped us to look at the complete DNA of an individual agent recruitment.   Insights and Recommendations Following indicators of progression through the funnel and productivity were discovered during the project: Leads aged 25 and above tend to be retained through the funnel better as compared to agents aged below 25. There seems to be a direct correlation between usage of the “Risk Meter” feature in the app before first sale and overall productivity. Leads who are called within 1 day of progressing from one stage to the other have a higher chance of progressing through the funnel. Basis these insights, we provided following recommendation to client to optimise the funnel: Target leads aged 25 and above for lead generation. Monitor the use of “Risk Meter” feature by leads during the early stages and help them progress through the funnel quickly. Incentivise the use of this feature during the early stages of the funnel. Introduce training videos to help leads learn how to use this feature. Monitor stage transitions and prioritise calling leads who have progressed from one stage on the same day.

Concept of pharmaceutical business, illegal pharmaceutical busin
Healthcare

Marketing Spend Optimization For Pharmaceutical Enterprise

Client Background Client is a US based global, research-driven bio-pharmaceutical company committed to developing innovative advanced therapies for some of the world’s most complex and critical conditions. It caters to three therapeutic areas: Dermatology, Gastroenterology, and Rheumatology. Business Objective Physicians or Health Care Professionals (HCPs) are the most significant element in pharmaceutical sales as their prescriptions determine which drugs will be used by patients. Therefore, the key to pharmaceutical sales for any pharmaceutical company is influencing the physician. To do this, sales reps spend on following marketing tactics/activities to drive sales – Closed Loop Marketing (CLM) Calls, Non-CLM Calls, Internet Live Seminars, Lunch sessions. It was observed that 18% of the current sales is driven by CLM and Non-CLM calls executed for three therapeutic areas: Dermatology, Gastroenterology, and Rheumatology CLM calls provide better ROI compared to Non-CLM calls but since they are 1.4X costlier, majority of investment has been made in Non-CLM callsTherefore, client wanted to optimize the marketing spend for CLM and Non-CLM calls so as to increase the revenue and NPV (Net Present Value) (NPV accounts for the cumulative profitability of a spend in a tactic i.e. NPV = Present Value – Investment) Solution Our data science team considered following inputs for analysis Activity Data: Promotional sales activity data and other sources to capture (a) level of engagement (reach and frequency) and (b) type of interaction (e.g. Face-to-face, lunch and learn) with the HCP Demand Data: Market data from syndicated sources, to capture sales which can be tied to promotional activities to compute impact of the promotional sales activity Financial Data: Financial Data from internal sources to estimate cost of the sales force activities to enable analysis of ROIMultivariate regression was performed on sales and marketing tactics to estimate and forecast the impact on sales and using Monte Carlo Simulation, response curves were built to define the contribution of the investment to the sales i.e. the Return on Investment (ROI)NPV plot against marketing spend suggested that marketing spend of 1.85 Million Euros wasn’t resulting in optimum sales. The ROI would be maximum when an investment of ~0.30 Million Euros is made in this sales tactic. Outcome Following optimization opportunities were unearthed from the insights we generated with modeling exercise: Decrease Investment & Hold Revenue Steady: Marketing mix can be optimized to generate previous year revenue while decreasing investment by -44% Hold Investment Steady & Increase Revenue: Using the last year’s marketing budget, revenue can be increased by 30 Million Euros in this year Maximize Profitable Investment: Using Maximized Investment scenario, an additional 45 Million Euros in revenue can be captured with an additional investment of 15.4 Million Euros

Online shopping and delivery concept, product package boxes in c
Others

Data Lake and Data Warehousing for B2B ecommerce on AWS Cloud

Client Background Client is an Indian e-commerce company that connects Indian manufacturers or suppliers with buyers and provides B2C, B2B and C2C sales services through its web portal. Business Objective Client has deployed an on-premise Oracle RDBMS infrastructure to store its data in order of 500 GB at present which grows 20GB on a monthly basis. The current Oracle RDBMS infrastructure is used to manipulate data through Stored Procedures, extract the data manually through a WEBERP system and use that extracted data to create the business reports manually in MS Excel. The reports were based on monthly data with limited view into daily trends. The client wanted to revamp the data processing and reporting process so that reports can be generated daily or on demand. Solution Our data engineering team understood the client’s current data setup & future technology roadmap. It was decided to go with AWS Redshift for data warehousing needs. Additionally, a data pipeline & reporting solution was also proposed to be built using serverless components of AWS, namely, AWS Glue & AWS QuickSight. We were provided with a set of 50 business views (reports), covering 90% of business queries. It was also desired for data warehousing design to support any future adhoc information request. Following development steps were followed We built an intermediate staging layer to capture data from the input source as it is with limited transformation. This meant we would have entire historical data in one place that could be referred to later if needed. This data layer was provided by S3. Created a logical view of the data warehouse model to capture all facts and dimensions provided in business views. Logical model once approved was converted into a physical AWS Redshift model with appropriate partition and sort keys. Although data warehouse is supposed to be “write once, read many”; it was requested to allow the client team to make infrequent updates for records that are prone to be modified. Initial data for tables (multiple files for each table) was provided in a pre-agreed S3 location in csv files. Each row in the csv file had a logical operator indicating record was meant for Insertion, Updating or Deletion. Tables further had hierarchical relationships and dependencies between themselves which meant that any update or deletion logic in ETL jobs had to handle complex sequential atomic update scenarios. Glue jobs were created to move data from S3 to the staging layer (AWS Redshift tables acting as temporary storage for daily processing). At this stage, only columns needed for business reports were moved with changes in data types for some columns. Post having data in staging tables in AWS Redshift, AWS Redshift queries (as a part of cron job deployed on EC2 instance) were used to move data into warehouse tables in AWS Redshift itself. At this stage, aggregations and selections were performed. Job performance metrics and associated logs were stored in a metadata table and notifications were sent via email to the developer team on completion or failure of any step of the data pipeline. Outcome We went live with the data warehouse in 6 months with 3 years of historical data. Incremental processes were set up to ingest data on an ongoing basis with an automated mechanism for handling any failures.

Electricity pylons bearing the power supply across a rural landscape during sunset. Selective focus.
Industrial

Data Analytics Platform for Power Distribution Utility

Client Background Client is a state-owned power distribution utility & service provider responsible for providing electricity to nearly 1.6 million customers. Business Need The client required real-time management of the grid infrastructure, which was quick, reliable and efficient. The client was using an OLTP (DB2) system with on-premise DB2 RDBMS infrastructure in the backend. A query to the DB2 database was required every time to extract data for any ad hoc data requirement or for analysis. The current infrastructure did not support MIS requirement and especially ad hoc analysis. Moreover, keeping such a high volume of data in an OLTP system is not recommended since the cost of data management on disk takes over the query processing time. The only way to store/process more data in an OLTP system is to scale up vertically which is a costly affair. Solution-Data Analytics Platform Valiance proposed to develop scalable Data Lake on top of AWS cloud infrastructure that would allow different business units and stakeholders to access insights across multiple sources of information whenever required. This data analytics infrastructure would bring all the data at one place; perform ad hoc analysis and be future-ready for more sophisticated predictive analytics workloads leading to smarter operations. Proposed platform comprised of following key components. Data Lake (Amazon S3) to ingest and store datasets on an ongoing basis. This would allow BI and analytics team to request data as per their need. Any downstream applications can also feed from Data Lake. Process the data as and when required using AWS EMR (Elastic Map Reduce) Enable ad hoc query functionality over Data Lake in S3 using AWS Athena/AWS Redshift Create Dashboards to view reports of the trends as per most recent data in AWS Quicksight Technical Architecture (based on AWS)   Key highlights of the solution Data lake needs to ingest both historical data and incremental data that DB2 will get in future. The first step was to extract the historical data from DB2. We used Sqoop for this purpose. Our team had several brainstorming sessions with the client in setting up timelines to execute the Sqoop jobs. These jobs were scheduled during after hours (night) when there is the least impact of Sqoop jobs on the existing applications. Once the data was extracted, the next step was to push the data to AWS S3. AWS Snowball service was used to push the one-time historical data into AWS S3. The next step was to handle the weekly incremental data. In this case, the team set up a CDC (Change Data Capture) process using Sqoop and Spark to push the weekly data into AWS S3 using S3 multipart upload. The Sqoop jobs were automated using bash scripts which would call Spark scripts (written in Python) to get the changed data weekly. Both these scripts were hosted on an on-premise Linux machine which was connected to AWS Cloud. Once the CDC process was complete, S3 multipart upload script was called to upload the data into the S3 data lake. The S3 multipart upload script was written in Python using the official boto3 library. Post data migration in S3, AWS EMR was used to process the data for insight generation. AWS Lambda scripts were then created to spin up the EMR cluster, run the data processing jobs written in Pyspark and then terminate the cluster when the job finishes. The output of EMR jobs was stored into two different sources. Frequently queried data was ingested into AWS Redshift for faster and effective query response while other data was kept in AWS S3 for an ad-hoc query using AWS Athena. The team automated the weekly data manipulation process via Python/Pyspark scripts. boto3 library was used to automate the AWS Cloud process. The official AWS Developers documentation was used as a reference for each of the component – AWS S3 and AWS Redshift for this purpose. Automation scripts were deployed into AWS Lambda and scheduled to execute the script at a mutually agreed time. AWS Quicksight was used to present the reporting data. The reports developed were populated with data within 10 seconds due to the current setup.

View of Bangkok City at sunset
Others

Datawarehousing on AWS Redshift for Telecom Operator

Client Background Client is an Indonesia-based mobile telecommunications services operator. The operator’s coverage includes Java, Bali, and Lombok as well as the principal cities in and around Sumatra, Kalimantan and Sulawesi. Client offers data communication, broadband Internet, mobile communication and 3G services over GSM 900 and GSM 1800 networks.   Current Scenario & Business Need Currently, client has an on premise Teradata Warehouse infrastructure to store their multi Terabytes of data which is increasing in volume day by day. The current Teradata infrastructure has already been exhausted by as much as 70% which prompted the client to look for alternative solution apart from scaling the Teradata warehouse vertically. The maintenance of on-premise infrastructure at such a scale is also a concern for client which made the client look for a cloud based data warehousing solution. The existing client setup used Ab Initio for ETL requirements which increased the cost of operations. The client wanted a low cost ETL solution for the cloud infrastructure. Client had an existing PowerBI licence which the client wanted to utilize to generate reports to data-driven business decisions. Solution Our team met business stakeholders to understand their current setup, overall business requirements and propose a solution. We decided to develop a scalable Data Warehouse, ETL and reporting solution for the client using AWS Cloud Services. We used AWS Redshift to create a Data Warehouse AWS Glue for the ETL process Existing PowerBI for reporting purposes Solution workflow Data Extraction & Storage: We extracted the relevant data from Teradata warehouse and stored it into a Virtual Machine in CSV, Avro and Parquet formats to test the compatibility of different data storage formats with the ETL process – AWS Glue. The files were uploaded to S3 from the virtual machine via a secure channel to make sure that the data is not compromised during transit. Data Preparation: Our next step was to prepare the data, i.e., perform the logical operations on the data so that it can be stored in the data warehouse as well as could be used for reporting purposes with least/no manipulation. We wrote ETL scripts using pyspark and scheduled those scripts to be executed at some specific time using AWS Glue. Data Processing: AWS Glue processed the data from S3, made necessary transformations and stored the resultant data into AWS Redshift. We wrote more data manipulation queries in SQL to further process the data stored in AWS Redshift and prepare it for reporting. Data Visualisation: We used PowerBI to present the reporting data. We installed powerBI desktop on an EC2 instance so that the data can be accessed from AWS Redshift with minimal network latency and reports can be generated in seconds. Result Migration from on premise tera data environment to AWS cloud resulted in significant gains to the customer with reduced infrastructure and storage cost, lower maintenance costs, improved reliability and much needed agility for further analytics roadmap.

security
Industrial

Video based defect identication for Foam manufacturing unit

Client Background Client is a leading Polyurethane (PU) Foam manufacturer with manufacturing plants across the country. It manufactures foams for mattresses used in homes, auto industry, hospitals. Business Objective Present Quality inspection process for inspecting foams is completely manual with multiple people required to keep constant vigil on the output for identifying any quality issues. These issues prominently are cuts, discoloration, holes. Irrespective of tight manual oversight, a 3% of bad sample rate is still observed which leads to the entire material lot being rejected by the customer thus leading to reduced profitability & ultimately customer dissatisfaction. In addition this manual inspection process places limitations on the foam output with machines running at sub-optimal speeds to enable manual oversight. It is thus desired to use AI & computer vision enabled technology to automate the quality control. Solution Valiance proposed to deploy a suite of IOT enabled video cameras close to machine edge’s to observe the output foam and using AI & computer vision, flag any abnormalities by raising an alarm. In order to train the models, our ML team collected training data from the manufacturing plant by taking several images of foams during manufacturing stages in different lighting conditions with “Ok” and “Not Ok” characteristics. These images had to be manually labelled into “Ok” and different defect categories. Data thus collected was used to train deep learning classification models on AWS cloud and resulted in 95-99% recall of defects across different defect categories. ML models were exposed through API gateway, Kinesis stream and lambda functions to detect defects in incoming video feeds. Deployed solution include Camera units with local connectivity Local Internet gateway Dashboard to monitor defects identified by ML algorithms Outcome Solution was deployed across one manufacturing unit to being with and showed promising results within first month itself Throughput was increased by 5% because machines were able to run for a longer period of time. Incidences of humanly identifiable defects were reduced to less than 1 percent.

Businessman hand protecting to virtual human icon for focus cust
Others

Customer 360 for Digital marketing company

Client Background Client is a prominent Indian digital engagement solutions company providing brands with multi-channel digital solutions to connect with their customers 24X7. Its analytics enabled cross-channel targeting platform enables brands to deliver personalized engagement to their customers. Its digital channels have a reach of 500 million plus end customer base powering 200 billion plus interactions on monthly basis. Business Objective Majority of customer interaction powered by client’s digital platform comprises of unstructured data. These interactions alone generate nearly 5 TB of data per month and contain a wealth of information about consumer lifestyle and their preferences. This information once discovered could truly be an asset to the marketing teams as they look drive increased customer engagement and transactions across brands. The client had no in-house capability to store and mine such datasets at scale. It was therefore desired to create a technology infrastructure that can scale with data and allow mining of unstructured data to discover customer preferences & lifestyle information. Such a platform needed to leverage text mining & natural language processing capabilities to understand unstructured texts and use machine learning algorithms to predict customer preferences wherever missing. Client didn’t have any in-house team with data science & big data skills or any prior experience in with such technologies. Valiance was chosen as a technology partner for this initiative to enable discovery of best possible solution and implement it successfully. Solution Our team of business analysts, data scientists & data engineers worked closely with client’s business and product team for the first three weeks to arrive at a right methodological approach comprising text mining framework, machine learning approach and technology platform that would achieve our business objectives. A POC was done initially to validate the approach for accuracy & completeness of results. After a month we had narrowed down to: Hadoop (Open source distribution) with HDFS for storage and Map-reduce for text mining in batch. Spark for training machine learning algorithms. HBase as storage for structured customer data. Text mining & machine learning approach to discover & predict customer attributes. In next 5 months our team performed following activities: Data engineering team created Hadoop based infrastructure to process TB’s of unstructured data. Data science team created text mining rules and NLP algorithms to discover customer attributes including but not limited to age, gender, spending pattern, number of credit cards, number of kids, travel frequency, mutual fund info. We were able to do limited discovery through this due to sparse nature of data. However, it gave us sufficient samples to create training sets for training purpose. Business analytics & data science team also recommended additional customer attributes that could be useful from a marketing standpoint along with acting as a useful predictor in the identification of missing ones. Data science team trained ML algorithms to predict missing customer attributes using discovered attributes. Different algorithms were experimented to arrive at winner algorithms. Results of text mining & machine learning predictions were shared with client’s team regularly to take their feedback in improvement. At the end of 5 months, we were able to arrive at the first version of results with robust & accurate customer data. ~450 customer attributes thus discovered were exposed to marketing team through open API’s. Post initial deployment we are constantly engaged with the client to improve the accuracy of ML algorithms and refine text mining rules. Outcome Enabled discovery of customer attributes through mining of unstructured data as against no information before. This resulted in intelligent marketing campaigns with higher ROI compared to baseline. 10% increase in customer engagement from digital interactions in 2 months of testing with different marketing campaigns.

Scroll to Top