Research Papers

Research Papers Post Type

An NLP-Driven Intelligent Video Query System for Interactive Video Retrieval

Title: An NLP-Driven Intelligent Video Query System for Interactive Video Retrieval Authors: Shailendra Singh Kathait (Co-Founder & Chief Data Scientist, Valiance Solutions)Ashish Kumar (Principal Data Scientist, Valiance Solutions)Samay Sawal (Intern Data Scientist, Valiance Solutions) Summary: Urban traffic surveillance produces vast quantities of continuous video data, but extracting actionable insights in real time remains a persistent challenge. This paper presents a novel Video Query System (VQS) that unifies deep-learning–based violation detection with natural language understanding to enable intuitive, efficient retrieval of relevant video segments from continuous CCTV feeds. Unlike traditional pipelines that operate in isolated silos, VQS consolidates multiple violation detectors — covering helmet non-compliance, illegal parking, over-speeding, wrong-way driving, and pedestrian tracking — into a single multi-attribute indexed repository. A large language model (LLM) translates free-form user queries (e.g., “show red motorcycles speeding above 60 km/h between 6–8 AM”) into structured, filterable query specifications, enabling precise retrieval and ranking of matching video frames with annotated metadata. The system’s interactive dashboard presents results as ranked frame sequences with bounding boxes and metadata, significantly accelerating investigative workflows for traffic management and enforcement authorities. Evaluation on real-world urban footage demonstrates strong detection precision (over 90%) and robust query interpretation, validating that natural-language–driven video querying bridges the gap between complex analytics and non-technical operators. The paper also discusses system design, user interaction, scalability, limitations, and future extensions for broader smart-city applications. Download Research Paper

Vehicle Tracking and Re-identification: A Smart Approach to Security and Civic Monitoring

Title: Vehicle Tracking and Re-identification: A Smart Approach to Security and Civic Monitoring Authors: Shailendra Singh Kathait, Ashish Kumar, Samay Sawal, Arjun Dhavse, Kimaya Pundir Summary: This research paper presents a modular, computer vision–driven system for real-time vehicle tracking, re-identification, and overspeeding violation detection using monocular traffic surveillance cameras. The proposed framework integrates two complementary components: a deep feature–based Vehicle Re-Identification (ReID) module and a YOLOv8-powered speed estimation pipeline. The ReID module leverages a Vision Transformer backbone to extract 512-dimensional embeddings from vehicle images captured at temporal intervals, enabling robust identity association across time despite changes in viewpoint, illumination, and partial occlusions. In parallel, the speed estimation module detects vehicles at fixed sampling intervals, tracks centroid displacement within a defined region of interest, and converts pixel-level motion into real-world speed using calibrated scaling and Kalman filtering for noise reduction. Vehicles exceeding a predefined speed threshold are automatically flagged and visually annotated, while detailed identity-matching logs are generated for audit and enforcement purposes. Experimental evaluation on real-world CCTV footage demonstrates stable detection, accurate identity continuity, and reliable speed violation reporting under realistic urban traffic conditions. The system’s modular architecture supports scalability, adaptability, and future enhancements such as automated camera calibration, cross-camera tracking, and edge deployment, making it well suited for smart city traffic monitoring and civic security applications Download Research Paper

Deep Learning-based Person Tracking using Facial Recognition A Smart Approach to Security and Civic Monitoring

Deep Learning-based Person Tracking using Facial Recognition: A Smart Approach to Security and Civic Monitoring

Title: Deep Learning-based Person Tracking using Facial Recognition: A Smart Approach to Security and Civic Monitoring Authors: Shailendra Singh Kathait, Ashish Kumar, Samay Sawal Summary: Person tracking using facial recognition has emerged as a crucial technology in surveillance, security, and human-computer interaction applications. This paper presents a comprehensive framework that integrates advanced facial detection, feature extraction, and tracking methodologies to robustly identify and monitor individuals in video streams. The approach in this paper combines state-of-the-art computer vision techniques with deep learning-based facial recognition to achieve real-time performance while maintaining high accuracy. The system integrates YOLO for object detection and DeepFace for facial recognition, offering an efficient solution for real-time person tracking. Additionally, the framework extends beyond individual tracking by incorporating intelligent analysis for detecting traffic violations, monitoring criminal activities, and identifying civic issues such as unauthorized encroachments or safety hazards. By leveraging existing surveillance infrastructure, this system enhances preventive policing and response times, making urban spaces safer and more efficient. The system is built using widely available open-source libraries and is designed for scalability across various camera setups. Experimental results demonstrate that this framework provides effective tracking and identification even under challenging conditions such as occlusions, varied lighting, and rapid movements. Download Research Paper

Real-Time Restricted Zone Violation Detection Using YOLOv8 and Centroid-Based Tracking

Deep Learning-based Person Tracking: A Smart Approach to Security and Civic Monitoring

Title: Deep Learning-based Person Tracking: A Smart Approach to Security and Civic Monitoring Authors: Shailendra Singh Kathait – Co-Founder & Chief Data Scientist, Ashish Kumar – Principal Data Scientist, Samay Sawal – Intern Data Scientist, Ram Patidar – Data Scientist, Khushi Agrawal – Intern Data Scientist [all Valiance Solutions Noida, India] Summary: This paper introduces a deep learning-based framework designed for real-time detection and surveillance of individuals violating designated restricted zones, such as vehicle-only areas. Utilizing advanced object detection algorithms, specifically YOLOv8, the system focuses on head detection and spatial reasoning to accurately track individuals entering these zones. A centroid-based tracking mechanism ensures each individual is flagged only once per frame, enhancing detection precision. To further improve accuracy, the framework incorporates modifications to bounding boxes and employs region-specific polygonal filtering, allowing for more precise violation detection. Visual feedback is provided through overlaying boundary boxes and labels on detected individuals, while cumulative violation counts are recorded for monitoring purposes. The proposed system demonstrates stable performance under varying conditions, making it suitable for applications in crowd management, security, and surveillance. Its flexible architecture allows for the integration of additional capabilities, such as movement direction and speed analysis, to provide more context-aware violation assessments. By leveraging existing surveillance infrastructure, this approach offers a cost-effective solution for enhancing urban safety and monitoring. Download Research Paper

Deep-Learning-based-Approach-for-Detecting-Traffic-Violations-Involving-No-Helmet-Use-and-Wrong-Cycle-Lane-Usage

Deep Learning-based Approach for Detecting Traffic Violations Involving No Helmet Use and Wrong Cycle Lane Usage

Title: Deep Learning-based Approach for Detecting Traffic Violations Involving No Helmet Use and Wrong Cycle Lane Usage Authors: Shailendra Singh Kathait – Co-Founder & Chief Data Scientist, Ashish Kumar – Principal Data Scientist, Samay Sawal – Intern Data Scientist, Ram Patidar – Data Scientist, Khushi Agrawal – Intern Data Scientist [all Valiance Solutions Noida, India] Summary: Urban road safety is significantly compromised by traffic violations such as motorcyclists riding without helmets and unauthorized use of cycle lanes. This study introduces a deep learning-based framework designed for the automated, real-time detection of these specific infractions. By leveraging advanced object detection and tracking algorithms, notably the YOLO (You Only Look Once) architecture, combined with spatial reasoning techniques, the system effectively identifies motorcyclists without helmets and detects bicycles operating outside designated lanes. Enhancements like bounding box adjustments, centroid-based relationships, and region-specific filtering are employed to improve detection accuracy. Additional analyses, including speed and direction assessments, provide contextual understanding of the violations. The system offers visual feedback and maintains cumulative violation counts, demonstrating robust performance across diverse urban traffic scenarios. Its scalable architecture allows for extension to detect a broader range of traffic violations, aiming to reduce reliance on manual monitoring and bolster road safety enforcement.​ Download Research Paper

Computer Vision And Deep Learning Based Approach For Violations Due To Illegal Parking Detection

Title: Computer Vision and Deep Learning based Approach for Violations due to Illegal Parking Detection Authors: Shailendra Singh Kathait; Co-Founder & Chief Data Scientist, Ashish Kumar; Principal Data Scientist, Ram Patidar; Data Scientist, Samay Sawal; Intern Data Scientist, Khushi Agrawal; Intern Data Scientist [all Valiance Solutions] Summary: The research paper titled “Computer Vision and Deep Learning based Approach for Traffic Violations due to Over-speeding and Wrong Direction Detection” by Shailendra Singh Kathait et al. presents a cost-effective and scalable method for monitoring traffic violations using non-specialized public cameras. Traditional traffic enforcement relies on expensive infrastructure like Automatic Number Plate Recognition (ANPR) cameras and radar guns, which are often limited to specific locations due to their high costs. In contrast, this study leverages widely available public surveillance cameras, repurposing them for traffic monitoring without significant additional investments. The proposed system integrates state-of-the-art deep learning object detection models, specifically YOLO (You Only Look Once) architectures, with advanced computer vision techniques to accurately estimate vehicle speed and detect direction in real-time. By analyzing video feeds, the system identifies vehicles, tracks their movements, calculates speeds, and determines travel directions. This enables the detection of critical traffic violations such as over-speeding and wrong-direction driving. Experimental results demonstrate the robustness, accuracy, and real-time capabilities of the approach, highlighting its potential for practical deployment in urban traffic surveillance. The modular design and reliance on general-purpose cameras facilitate widespread and affordable implementation, offering a viable solution for enhancing traffic law enforcement and road safety in rapidly urbanizing areas. Download Research Paper

Computer Vision and Deep Learning Based Approach for Traffic Violations Due To Over-speeding and Wrong Direction Detection

Computer Vision and Deep Learning Based Approach for Traffic Violations Due To Over-Speeding and Wrong Direction Detection

Title: Computer Vision and Deep Learning Based Approach for Traffic Violations Due To Over-Speeding and Wrong Direction Detection Authors: Shailendra Singh Kathait: Co-Founder and Chief Data Scientist, Ashish Kumar: Principal Data Scientist, Samay Sawal: Intern Data Scientist, Ram Patidar: Data Scientist, Khushi Agrawal: Intern Data Scientist [all Valiance Solutions Noida, India] Summary: The research paper titled “Computer Vision and Deep Learning based Approach for Traffic Violations due to Over-speeding and Wrong Direction Detection” presents a cost-effective and scalable method for monitoring traffic violations using non-specialized public cameras. Traditional traffic enforcement relies on expensive infrastructure like Automatic Number Plate Recognition (ANPR) cameras and radar guns, which are often limited to specific locations due to their high costs. In contrast, this study leverages widely available public surveillance cameras, repurposing them for traffic monitoring without significant additional investments. The proposed system integrates state-of-the-art deep learning object detection models, specifically YOLO (You Only Look Once) architectures, with advanced computer vision techniques to accurately estimate vehicle speed and detect direction in real-time. By analyzing video feeds, the system identifies vehicles, tracks their movements, calculates speeds, and determines travel directions. This enables the detection of critical traffic violations such as over-speeding and wrong-direction driving. Experimental results demonstrate the robustness, accuracy, and real-time capabilities of the approach, highlighting its potential for practical deployment in urban traffic surveillance. The modular design and reliance on general-purpose cameras facilitate widespread and affordable implementation, offering a viable solution for enhancing traffic law enforcement and road safety in rapidly urbanizing areas. Download Research Paper

MobileNetV2: Transfer Learning for Elephant Detection

Title: MobileNetV2: Transfer Learning for Elephant Detection Authors: Samay Sawal – Intern Data Scientist, Valiance Solutions, Noida, India Shailendra Singh Kathait – Co-Founder and Chief Data Scientist, Valiance Solutions, Noida, India Summary: Wildlife conservation and ecological monitoring rely heavily on accurate species classification. This research presents a deep learning-based approach for elephant detection using MobileNetV2 and transfer learning techniques. Traditional classification methods are labor-intensive and prone to human errors, making automated solutions essential for improving efficiency and accuracy. The study utilizes images captured from a specific reserved region, structured into two categories: “elephants” and “others.” Data augmentation techniques, including rotation, shifting, zooming, and flipping, enhance model robustness. MobileNetV2, a lightweight and efficient convolutional neural network, is employed as the feature extractor, leveraging pre-trained ImageNet weights. Custom layers such as Global Average Pooling, Fully Connected Layers, and Dropout were integrated to optimize performance. Comparative analysis with CNN and VGG16 models demonstrated that MobileNetV2 achieved superior classification performance, with a test accuracy of 98.31% and significantly lower computational costs. Transfer learning expedited model training and improved generalization across diverse environmental conditions. This research highlights the effectiveness of MobileNetV2 in wildlife monitoring and conservation. Future work includes expanding the dataset, deploying real-time monitoring systems on edge devices, and implementing individual elephant identification for enhanced conservation efforts. The proposed model serves as a scalable solution for automated wildlife classification tasks. Download Research Paper

Artificial Intelligence for Human-Animal Conflict Mitigation: Image Classification and Human Tracking in Tadoba Andhari Tiger Reserve

Title: Artificial Intelligence for Human-Animal Conflict Mitigation: Image Classification and Human Tracking in Tadoba Andhari Tiger Reserve Authors: Mothukuri Sujith, Shailendra Singh Kathait, Piyush Dhuliya Valiance Analytics Private Limited Summary: This study presents an advanced AI-driven approach to mitigating human-animal conflicts within the Tadoba Andhari Tiger Reserve (TATR), located in the Chandrapur region. This area faces significant issues as it harbors a diverse population of flora and fauna, including tigers, leopards, and bears, which frequently come into contact with surrounding communities. The Human-Animal Conflict Mitigation System (HACMS) developed for TATR utilizes edge AI cameras, deep learning-based image classification, and human tracking systems to predict and prevent potential conflict scenarios. Central to this approach are daytime-specific deep learning models that detect and classify animals in real time, leveraging the YOLO v5 architecture. Three distinct models comprise this system: a custom detection model trained on species-specific data, a pre-trained model based on YOLO for public datasets, and a segmentation model to resolve specific challenges in detecting animals like bears and bisons that often appear similar in images. Each model serves a specific function within the detection pipeline, achieving robust accuracy in species identification and human recognition. To build the models, a custom dataset of 7,959 images from TATR was utilized, with 73% allocated for training, 16% for validation, and 11% for testing. Data augmentation techniques such as rotation, brightness adjustment, and image preprocessing were applied to increase model generalization, enabling it to handle varied lighting and forest conditions. The YOLO v5 architecture’s use of anchor-free detection and mini-batch normalization significantly boosted efficiency and precision, allowing the model to adapt to various object shapes and sizes. Through this setup, the system achieved an overall test accuracy of 94.82%, with a near-perfect ~100% accuracy for critical species like tigers, leopards, and bears, meeting forest authorities’ requirements for reliable animal identification and alerting. For human detection, the system integrates the Nanotrack algorithm from OpenCV, which provides lightweight, real-time tracking of human movement within forest areas. When the AI-enabled cameras detect human presence, this tracking mechanism initiates and follows the individual’s movement using bounding boxes across frames. This process aids in monitoring human entry into restricted zones, alerting authorities if a person is close to potentially dangerous wildlife. Additionally, adjustments were made to the pre-trained model by replacing common vehicle classes with a ‘Human’ class, improving detection accuracy by focusing on forest-relevant categories. This paper emphasizes that effective conflict mitigation relies not only on accurate animal classification but also on tracking human activities to preemptively raise alerts and deter risky encounters. By harnessing edge analytics, the HACMS operates with limited dependence on cloud computing, making it well-suited to remote areas where connectivity may be sporadic. The system’s design is both scalable and adaptive, offering a template for future implementations in other high-conflict zones. Ultimately, this research demonstrates the transformative potential of AI and deep learning in human-animal conflict management, combining real-time image analysis with proactive alerting to create a safer environment for both humans and animals. The solution offers a promising step toward sustainable coexistence, supporting local communities, wildlife authorities, and conservation efforts by leveraging innovative technology to address the complex dynamics of shared ecosystems. Download Research Paper

Smart Screening: Non-Invasive Detection of Severe Neonatal Jaundice using Computer Vision and Deep Learning

Paper Title: Smart Screening: Non-Invasive Detection of Severe Neonatal Jaundice using Computer Vision and Deep Learning Authors: Kartikya Gupta, Valiance Solutions Vaibhav Sharma, Valiance Solutions Shailendra Singh Kathait, Valiance Solutions Summary: The research paper titled “Smart Screening: Non-Invasive Detection of Severe Neonatal Jaundice using Computer Vision and Deep Learning” presents a novel approach to detecting severe neonatal jaundice through non-invasive techniques, using advanced computer vision and deep learning algorithms. Neonatal jaundice is a common condition among newborns, and early detection is critical in preventing severe complications, such as kernicterus, a form of brain damage. Traditionally, detection methods involve blood tests, which are invasive, time-consuming, and expensive. This study proposes an innovative solution that could address these limitations. The research focuses on utilizing image processing techniques to analyze visual data of newborns’ skin to classify jaundice severity. The authors developed a custom convolutional neural network (CNN) model and compared its performance against several state-of-the-art transfer learning models, including MobileNet, EfficientNet, and Vision Transformer. These models were trained using a dataset of medical images specifically aimed at diagnosing jaundice. The deep learning models successfully identified the degree of jaundice with high accuracy, particularly in detecting severe cases that require medical attention. One of the key advantages of this system is its non-contact, affordable nature, which makes it an ideal solution for resource-limited healthcare settings. The proposed model could easily be deployed in remote or underdeveloped areas, where access to traditional diagnostic tools may be restricted. By leveraging smartphone cameras or other imaging devices, healthcare professionals and caregivers can screen infants in a timely and efficient manner, enabling earlier intervention and reducing the risk of complications. The paper also discusses the potential scalability of the system, as well as its possible integration into telemedicine platforms. The findings indicate that the solution could significantly enhance the early detection of jaundice while minimizing the need for invasive procedures. Additionally, the cost-effectiveness and ease of use of the system suggest its potential as a widespread tool in neonatal care. In conclusion, this study highlights the promising role of computer vision and deep learning in healthcare, specifically in providing a non-invasive, affordable, and accessible solution for the early detection of severe neonatal jaundice. It represents a step forward in improving neonatal care, especially in areas with limited medical resources. Download Research Paper

Scroll to Top