A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection

Key Findings

Methodology

This paper proposes a two-stage object-centric deep learning framework for exam cheating detection. First, the YOLOv8n model is used to localize students in exam-room images. Each detected region is cropped and preprocessed, then classified by a fine-tuned RexNet-150 model as either normal or cheating behavior. The system is trained on a dataset compiled from 10 independent sources with a total of 273,897 samples, achieving 0.95 accuracy, 0.94 recall, 0.96 precision, and 0.95 F1-score.

Key Results

Result 1: The system was trained on 273,897 samples, achieving 0.95 accuracy, 0.94 recall, 0.96 precision, and 0.95 F1-score, a 13% increase over a baseline accuracy of 0.82 in video-based cheating detection.
Result 2: With an average inference time of 13.9 ms per sample, the proposed approach demonstrates robustness and scalability for deployment in large-scale environments.
Result 3: Ablation studies confirmed the significant improvement in detection accuracy of the two-stage method over the full-frame approach.

Significance

This study is significant for both academia and industry. It not only improves the accuracy of exam cheating detection but also addresses transparency and complexity issues in traditional methods. By leveraging the combination of YOLOv8n and RexNet-150, the framework provides an efficient and scalable solution that can operate in real-time in resource-limited environments. Additionally, the system addresses ethical concerns by ensuring that results are delivered privately to students, avoiding public shaming.

Technical Contribution

Technical contributions include: 1) Proposing an object-centric two-stage framework that significantly improves detection accuracy; 2) Creating a large-scale standardized dataset that serves as a benchmark for future models; 3) Conducting detailed ablation studies and model comparisons, demonstrating the framework's superior performance over traditional approaches and establishing a new state-of-the-art.

Novelty

This study is the first to combine YOLOv8n and RexNet-150 for exam cheating detection, proposing an object-centric detection approach that overcomes the background noise interference issues of full-frame methods. Compared to existing methods, this framework simplifies the architecture and enhances detection efficiency and accuracy.

Limitations

Limitation 1: The current method relies on static frames, lacking temporal continuity, which may fail to distinguish between brief innocent actions and prolonged cheating behavior.
Limitation 2: By focusing only on faces and upper bodies, it may miss evidence of cheating on the desk, such as phones or notes.
Limitation 3: Inconsistencies in dataset annotations may affect the model's robustness.

Future Work

Future research directions include: 1) Expanding the Region of Interest (ROIs) extraction to include more of the upper body, hands, and immediate desk area to capture a more complete visual narrative of potential cheating acts; 2) Exploring multi-class classification to identify specific types of cheating; 3) Improving data quality and annotation strategies to enhance system robustness and accuracy.

AI Executive Summary

Exam cheating detection is a critical component of academic integrity, and with the proliferation of remote and hybrid learning, ensuring fairness and transparency in assessments has become increasingly important. Traditional invigilation relies on human observation, which is inefficient and prone to errors. While some AI-powered monitoring systems have been deployed, many lack transparency or require complex multi-layered architectures.

This paper proposes an improved two-stage framework that integrates object detection and behavioral analysis. First, the YOLOv8n model is used to localize students in exam-room images. Each detected region is cropped and preprocessed, then classified by a fine-tuned RexNet-150 model as either normal or cheating behavior. The system is trained on a dataset compiled from 10 independent sources with a total of 273,897 samples, achieving 0.95 accuracy, 0.94 recall, 0.96 precision, and 0.95 F1-score.

The core technical principle of this framework is to eliminate background noise through object detection and focus on the behavioral analysis of each examinee. This allows the system to more accurately identify cheating behaviors while reducing false positives. Experimental results demonstrate a significant improvement in detection accuracy over traditional full-frame methods.

This study not only achieves technical breakthroughs but also addresses ethical concerns by ensuring that results are delivered privately to students, avoiding public shaming. Furthermore, the system's efficiency and scalability make it suitable for deployment in large-scale environments.

Despite significant progress, the current method has limitations, such as lacking temporal continuity and missing evidence on the desk. Future research will focus on expanding ROIs extraction, improving data quality, and annotation strategies to further enhance system robustness and accuracy.

Deep Analysis

Background

Exam cheating detection is a critical component of academic integrity. With the proliferation of remote and hybrid learning, ensuring fairness and transparency in assessments has become increasingly important. Traditional invigilation relies on human observation, which is inefficient and prone to errors. While some AI-powered monitoring systems have been deployed, many lack transparency or require complex multi-layered architectures. Recent advances in deep learning have provided new possibilities for automated cheating detection. In particular, the development of object detection and behavioral analysis technologies has made it possible to detect cheating behaviors in complex multi-person exam environments. However, existing methods still face challenges in handling background noise and data scarcity.

Core Problem

Exam cheating not only seriously undermines the value of learning outcomes but also poses risks to the credibility of educational institutions. Consequently, there is a pressing need for robust, scalable solutions to support proctors in monitoring exams. Existing AI-based proctoring systems face significant hurdles in handling background noise and differentiating individual behaviors. Additionally, the field is hampered by the fragmented and scarce high-quality publicly available datasets, which impedes the development and fair evaluation of generalizable models.

Innovation

This paper proposes a novel two-stage object-centric framework for cheating detection. • First, the YOLOv8n model is used to localize students in exam-room images, eliminating background noise. • Then, each detected region is cropped and preprocessed, and a fine-tuned RexNet-150 model classifies the behavior as normal or cheating. • By decoupling the complex task of scene understanding into two distinct and manageable sub-problems, the framework significantly improves detection accuracy. • Additionally, a large-scale standardized dataset is created, serving as a benchmark for future models.

Methodology

�� Use the YOLOv8n model to detect human-like objects in exam-room images and generate bounding boxes. • Apply cropping and preprocessing steps to extract robust Regions of Interest (ROIs). • Forward these ROIs to the RexNet-150 model, where it distinguishes between cheating and non-cheating behaviors. • Finally, predicted labels and bounding boxes are drawn back onto their original image, highlighting the integration of the entire workflow. • The dataset was collected from 10 open sources, cleaned, and standardized, split into training, validation, and test sets.

Experiments

Experiments were conducted within a Kaggle Notebook environment using a single NVIDIA RTX 3080 GPU. The software stack was built on PyTorch version 2.1. The RexNet-150 model was trained for 10 epochs using the Adam optimizer with a learning rate of 0.0003. The dataset was split into training, validation, and test sets, comprising 80%, 10%, and 10%, respectively. Ablation studies confirmed the significant improvement in detection accuracy of the two-stage method over the full-frame approach.

Results

Experimental results show that the system was trained on 273,897 samples, achieving 0.95 accuracy, 0.94 recall, 0.96 precision, and 0.95 F1-score, a 13% increase over a baseline accuracy of 0.82 in video-based cheating detection. With an average inference time of 13.9 ms per sample, the proposed approach demonstrates robustness and scalability for deployment in large-scale environments. Ablation studies confirmed the significant improvement in detection accuracy of the two-stage method over the full-frame approach.

Applications

The system is applicable to educational institutions requiring large-scale proctoring, especially in remote or hybrid learning environments. Its efficiency and scalability allow it to operate in real-time in resource-limited environments. Additionally, the system's privacy design ensures student privacy, avoiding public shaming.

Limitations & Outlook

The current method relies on static frames, lacking temporal continuity, which may fail to distinguish between brief innocent actions and prolonged cheating behavior. Additionally, by focusing only on faces and upper bodies, it may miss evidence of cheating on the desk, such as phones or notes. Inconsistencies in dataset annotations may affect the model's robustness. Future research will focus on expanding ROIs extraction, improving data quality, and annotation strategies to further enhance system robustness and accuracy.

Plain Language Accessible to non-experts

Imagine you are in a large classroom taking an exam, and the teacher is at the front invigilating. Traditionally, the teacher needs to observe each student to ensure no one is cheating. This is like being in a large kitchen where the chef needs to watch every pot to make sure nothing burns or boils over. But this is very difficult because there are too many pots to watch. Now, imagine there is a smart assistant that can automatically identify each pot and alert the chef which pot needs attention. This is how the cheating detection system proposed in this paper works. It uses a technology called YOLOv8n to identify each student in the exam room, just like the smart assistant identifies each pot. Then, it uses another technology called RexNet-150 to analyze each student's behavior to determine if they are cheating. This way, the system can help the teacher invigilate more effectively, just like the smart assistant helps the chef manage the kitchen better.

ELI14 Explained like you're 14

Hey there! Have you ever wondered how teachers catch cheating during exams? Traditionally, teachers have to keep an eye on every student to make sure no one is cheating. It's like when you're playing a game and you have to keep track of multiple tasks to make sure everything is going well. But that's hard, right? Now, there's a new technology that can help teachers. It's like having a super assistant in the game that can automatically identify each task and tell you which one needs attention. This system uses a technology called YOLOv8n to identify each student in the exam room, just like the super assistant identifies each task. Then, it uses another technology called RexNet-150 to analyze each student's behavior to determine if they are cheating. This way, teachers can invigilate more easily, just like you have a super assistant in the game. Isn't that cool?

Glossary

YOLOv8n

YOLOv8n is a state-of-the-art object detection model capable of quickly and accurately identifying objects in images.

Used in the paper to localize students in exam-room images.

RexNet-150

RexNet-150 is a deep learning model for image classification, known for its efficient feature representation capabilities.

Used in the paper to analyze student behavior and determine if cheating is occurring.

F1-Score

The F1-Score is the harmonic mean of precision and recall, used to measure a model's performance on imbalanced datasets.

Used to evaluate the performance of the cheating detection system.

Recall

Recall is the proportion of actual positive cases that the model correctly identifies.

Used to evaluate the system's effectiveness in detecting cheating behavior.

Precision

Precision is the proportion of positive predictions that are actually correct.

Used to evaluate the system's reliability in reducing false alarms.

Ablation Study

An ablation study is an experimental method that evaluates the impact of removing or replacing certain parts of a model on its overall performance.

Used to confirm the superiority of the two-stage method over the full-frame approach.

Object Detection

Object detection is a computer vision technique used to identify target objects in images and mark their locations.

Used to localize students in exam-room images.

Behavioral Analysis

Behavioral analysis involves observing and analyzing individual behavior patterns to identify anomalies or specific behaviors.

Used to determine if student behavior is normal or cheating.

Dataset

A dataset is a collection of data samples used to train and evaluate machine learning models.

The paper uses a dataset compiled from 10 independent sources.

Inference Time

Inference time refers to the time it takes for a model to generate output results from input data.

Used to evaluate the system's performance in real-time applications.

Open Questions Unanswered questions from this research

1 How can the Region of Interest (ROIs) extraction be expanded to capture more complete cheating behavior without increasing computational complexity? Current methods primarily focus on faces and upper bodies, potentially missing evidence on the desk.
2 How can temporal continuity be integrated without affecting system performance to distinguish between brief innocent actions and prolonged cheating behavior?
3 How can data quality and annotation strategies be improved to enhance system robustness and accuracy? Inconsistencies in existing dataset annotations may affect model robustness.
4 How can multi-class classification be achieved without increasing system complexity to identify specific types of cheating?
5 How can system transparency and interpretability be improved without compromising student privacy?

Applications

Immediate Applications

Remote Exam Monitoring

The system can be used in remote exam environments to help educational institutions monitor student behavior in real-time, ensuring fairness and transparency in assessments.

Hybrid Learning Environments

In hybrid learning environments, the system can be used for large-scale proctoring, reducing the burden of manual invigilation and improving efficiency.

Academic Integrity Maintenance

By detecting and preventing exam cheating, the system helps educational institutions maintain academic integrity and enhance their credibility.

Long-term Vision

Intelligent Education Systems

The system can be part of intelligent education systems, providing real-time behavioral analysis and feedback to help students reflect and improve.

Cross-Domain Applications

The technology can be extended to other domains requiring behavioral monitoring, such as security surveillance and employee behavior analysis, providing broader social value.

Abstract

Academic integrity continues to face the persistent challenge of examination cheating. Traditional invigilation relies on human observation, which is inefficient, costly, and prone to errors at scale. Although some existing AI-powered monitoring systems have been deployed and trusted, many lack transparency or require multi-layered architectures to achieve the desired performance. To overcome these challenges, we propose an improvement over a simple two-stage framework for exam cheating detection that integrates object detection and behavioral analysis using well-known technologies. First, the state-of-the-art YOLOv8n model is used to localize students in exam-room images. Each detected region is cropped and preprocessed, then classified by a fine-tuned RexNet-150 model as either normal or cheating behavior. The system is trained on a dataset compiled from 10 independent sources with a total of 273,897 samples, achieving 0.95 accuracy, 0.94 recall, 0.96 precision, and 0.95 F1-score - a 13\% increase over a baseline accuracy of 0.82 in video-based cheating detection. In addition, with an average inference time of 13.9 ms per sample, the proposed approach demonstrates robustness and scalability for deployment in large-scale environments. Beyond the technical contribution, the AI-assisted monitoring system also addresses ethical concerns by ensuring that final outcomes are delivered privately to individual students after the examination, for example, via personal email. This prevents public exposure or shaming and offers students an opportunity to reflect on their behavior. For further improvement, it is possible to incorporate additional factors, such as audio data and consecutive frames, to achieve greater accuracy. This study provides a foundation for developing real-time, scalable, ethical, and open-source solutions.

cs.CV cs.AI

References (7)

Analyzing the Potential of ReXNet-150: A Novel Architecture for Automobile Parts Classification

M. Ranjith Kumar, P. Adithiyan, G. J. Sendur et al.

2024 4 citations

A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition

Xi Ouyang, Shuangjie Xu, Chaoyun Zhang et al.

2019 63 citations

Real-Time Vehicle Detection Using YOLOv8-Nano for Intelligent Transportation Systems

Murat Bakirci

2024 36 citations

A Video-based Detector for Suspicious Activity in Examination with OpenPose

R. Moyo, Stanley Ndebvu, Michael Zimba et al.

2023 2 citations View Analysis →

A Visual Analytics Approach to Facilitate the Proctoring of Online Exams

Haotian Li, Min Xu, Yong Wang et al.

2021 69 citations View Analysis →

YOLOv8n-PP: a lightweight pose recognition algorithm for photovoltaic array cleaning robot

Jidong Luo, Guoyi Wang, Yanjiao Lei et al.

2025 1 citations

A Cheating Detection System in Online Examinations Based on the Analysis of Eye-Gaze and Head-Pose

Ambi Singh, Smita Das

2022 8 citations

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

YOLOv8n

RexNet-150

F1-Score

Recall

Precision

Ablation Study

Object Detection

Behavioral Analysis

Dataset

Inference Time

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Remote Exam Monitoring

Hybrid Learning Environments

Academic Integrity Maintenance

Long-term Vision

Intelligent Education Systems

Cross-Domain Applications

Abstract

References (7)

Related Papers

Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

DeepTaxon: An Interpretable Retrieval-Augmented Multimodal Framework for Unified Species Identification and Discovery

Learn&Drop: Fast Learning of CNNs based on Layer Dropping

SS3D: End2End Self-Supervised 3D from Web Videos

PASR: Pose-Aware 3D Shape Retrieval from Occluded Single Views

A Non-Invasive Alternative to RFID: Self-Sufficient 3D Identification of Group-Housed Livestock