HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection

TL;DR

HMS-BERT uses hybrid multi-task self-training for multilingual, multi-label cyberbullying detection, achieving a macro F1-score of 0.9847.

cs.CL 🔴 Advanced 2026-03-13 2 views

Zixin Feng Xinying Cui Yifan Sun Zheng Wei Jiachen Yuan Jiazhen Hu Ning Xin Md Maruf Hasan

AI Reader Arxiv Page Download PDF

cyberbullying detection multilingual multi-label learning self-training BERT model

Key Findings

Methodology

HMS-BERT is a hybrid multi-task self-training framework built on a pretrained multilingual BERT model. It integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. To address labeled data scarcity in low-resource languages, HMS-BERT introduces an iterative self-training strategy with confidence-based pseudo-labeling to facilitate cross-lingual knowledge transfer.

Key Results

On the multi-label task, HMS-BERT achieves a macro F1-score of up to 0.9847 across four public datasets, significantly outperforming existing methods. This indicates the model's strong capability in handling multilingual and multi-label cyberbullying detection.
For the main classification task, HMS-BERT achieves an accuracy of 0.6775, demonstrating robustness in the three-class classification task. Compared to baseline models, HMS-BERT excels in multilingual scenarios.
Ablation studies verify the effectiveness of HMS-BERT's components, particularly the handcrafted features and self-training mechanism, which play crucial roles in enhancing model performance.

Significance

HMS-BERT's introduction holds significant implications for academia and industry. It addresses the limitations of existing methods in multilingual and multi-label scenarios, especially in low-resource languages. By combining multi-task learning and self-training strategies, the framework enhances cross-lingual generalization, providing a novel solution for multilingual cyberbullying detection.

Technical Contribution

HMS-BERT offers distinct technical contributions compared to existing state-of-the-art methods. It not only combines multi-task learning and self-training strategies but also enhances contextual understanding through handcrafted features. The framework's design provides new theoretical guarantees for multilingual multi-label learning and opens up new engineering possibilities.

Novelty

HMS-BERT is the first to integrate multi-task self-training strategies for multilingual multi-label cyberbullying detection. Compared to related work, the framework presents unique innovations in handling low-resource languages and multi-label classification tasks, particularly in cross-lingual knowledge transfer and pseudo-label generation.

Limitations

HMS-BERT may underperform on extremely imbalanced datasets, particularly when certain categories have very few samples, limiting the model's generalization capability.
The framework's computational complexity is relatively high, with long training times, which may not be suitable for real-time applications.
In specific cultural contexts, handcrafted features may not fully capture the nuances of certain languages.

Future Work

Future research can expand in several directions: 1) further optimize self-training strategies to improve pseudo-label quality; 2) explore more efficient model architectures to reduce computational complexity; 3) extend to more low-resource languages to validate HMS-BERT's broad applicability.

AI Executive Summary

With the rapid rise of online communication, cyberbullying has become a pervasive and pressing social concern. Traditional methods for detecting cyberbullying have primarily focused on monolingual data, employing rule-based approaches and traditional machine learning models. However, these methods are limited in their effectiveness in multilingual and multi-label scenarios, particularly in low-resource languages.

HMS-BERT is an innovative hybrid multi-task self-training framework designed to address the challenges of multilingual and multi-label cyberbullying detection. Built upon a pretrained multilingual BERT model, HMS-BERT integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. To address labeled data scarcity in low-resource languages, HMS-BERT introduces an iterative self-training strategy with confidence-based pseudo-labeling to facilitate cross-lingual knowledge transfer.

In experiments, HMS-BERT demonstrates strong performance across four public datasets, achieving a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task. Ablation studies further verify the effectiveness of HMS-BERT's components, particularly the handcrafted features and self-training mechanism, which play crucial roles in enhancing model performance.

However, HMS-BERT may underperform on extremely imbalanced datasets, particularly when certain categories have very few samples, limiting the model's generalization capability. Additionally, the framework's computational complexity is relatively high, with long training times, which may not be suitable for real-time applications. Future research can expand in optimizing self-training strategies, exploring more efficient model architectures, and extending to more low-resource languages.

Deep Analysis

Background

Cyberbullying involves the transmission of abusive, threatening, or degrading content through digital channels. Compared to traditional bullying, cyberbullying is not limited by time or location, often occurring anonymously and spreading widely across platforms, causing greater psychological harm to victims. In recent years, the multilingual nature of user-generated content has underscored the need for detection systems that can operate effectively across languages. Additionally, online abuse often involves overlapping forms such as insults, discrimination, and threats, posing significant challenges for binary or single-label classifiers. Early research on cyberbullying detection primarily focused on English monolingual data, employing rule-based approaches and traditional machine learning models. However, these methods have clear limitations in capturing semantic nuances, contextual dependencies, and implicit aggression, especially in cross-domain scenarios.

Core Problem

Current methods for cyberbullying detection face challenges in multilingual and multi-label scenarios. Existing methods commonly assume monolingual or single-task formulations, which restrict their effectiveness in realistic multilingual and multi-label scenarios. Furthermore, the scarcity of labeled data in low-resource languages further limits the model's generalization capabilities. Cyberbullying texts often contain multiple overlapping forms of aggression, making single-label classification inadequate for representing semantic complexity. Therefore, there is a pressing need for a solution that can effectively detect cyberbullying in multilingual and multi-label scenarios.

Innovation

HMS-BERT's core innovations include its hybrid multi-task self-training framework:

�� Combining multi-task learning and self-training strategies: By jointly optimizing multi-label abuse classification tasks and a three-class main classification task, the framework enhances cross-lingual generalization.

�� Introducing confidence-based pseudo-label iterative self-training strategy: By generating high-quality pseudo-labels, it facilitates cross-lingual knowledge transfer, particularly in low-resource languages.

�� Integrating contextual representations and handcrafted linguistic features: Enhances the model's ability to detect semantic nuances and implicit aggression.

Methodology

The implementation of HMS-BERT involves the following key steps:

�� Data Preprocessing: Clean and standardize datasets from multiple sources to ensure consistent multi-label annotations.

�� Input Representation: Transform input text into multilingual BERT-encoded contextual semantics and handcrafted lexical feature streams.

�� Semantic Encoding: Obtain sentence-level representations through the 12-layer Transformer encoder of multilingual BERT.

�� Feature Enhancement: Process handcrafted features through dropout and dense layers before integrating with BERT representations.

�� Classification: Use a fully connected layer with sigmoid activation for multi-label task predictions and a softmax-activated fully connected layer for three-class main task classification.

�� Self-Training Optimization: Implement an iterative self-training loop that leverages unlabeled data to generate high-confidence pseudo-labels, enhancing model robustness and generalization.

Experiments

The experimental design includes evaluations on four public datasets: HateXplain, Cyberbullying Classification, SCCDUser, and SCCDComment. HateXplain is used as the primary training resource for the multi-label classification task, while the remaining three datasets are used exclusively for pseudo-labeling and cross-lingual evaluation. The experiments employ metrics such as macro F1-score, accuracy, and MCC, and conduct ablation studies to verify the effectiveness of each component.

Results

Experimental results show that HMS-BERT achieves a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task. Compared to baseline models, HMS-BERT excels in multilingual scenarios, particularly in low-resource languages. Ablation studies verify the effectiveness of the handcrafted features and self-training mechanism in enhancing model performance.

Applications

HMS-BERT can be applied to real-time cyberbullying detection on multilingual online platforms, especially in low-resource language environments. The framework's design enables it to effectively handle multilingual and multi-label scenarios, providing a comprehensive abuse detection solution for social media platforms.

Limitations & Outlook

Despite HMS-BERT's strong performance in multilingual and multi-label scenarios, it may underperform on extremely imbalanced datasets. Additionally, the framework's computational complexity is relatively high, with long training times, which may not be suitable for real-time applications. Future research can expand in optimizing self-training strategies, exploring more efficient model architectures, and extending to more low-resource languages.

Plain Language Accessible to non-experts

Imagine you're in a large international school with students from different countries speaking various languages. The school wants to ensure every student can learn in a safe environment, so they decide to develop a system to detect any form of bullying. This system needs to understand texts in different languages and identify different types of bullying, such as insults, threats, or discrimination.

HMS-BERT is like the school's super detective, capable of handling multiple languages and identifying various bullying behaviors. It's like a smart teacher who can understand the vocabulary and expressions used by students in different languages. To achieve this, HMS-BERT uses a special training method called self-training. Just like a teacher teaching students in class, HMS-BERT continuously learns and updates its knowledge to improve its capabilities.

The system also uses some special techniques, such as combining contextual information and handcrafted features, just like a teacher noticing the tone and expressions of students. This way, HMS-BERT can more accurately identify bullying behaviors, even in less common languages.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a super cool multiplayer online game with players from all over the world. Everyone's chatting in different languages, but sometimes you encounter some unfriendly players who might use language to attack others. To make the game environment friendlier, the game company decides to develop a super smart system to detect these unfriendly behaviors.

This system is called HMS-BERT, and it's like a superhero in the game, capable of understanding multiple languages and identifying various unfriendly behaviors. It not only understands every word you say but also determines if those words are bullying others. Just like you level up your character in the game, HMS-BERT also improves its abilities through self-training.

HMS-BERT is like a smart game admin, able to recognize unfriendly words spoken in different languages. Even if some players use less common languages, HMS-BERT can identify these behaviors with its super skills. This way, everyone can play in a safer and friendlier game environment!

Glossary

HMS-BERT

A framework for multilingual and multi-label cyberbullying detection, combining multi-task learning and self-training strategies.

HMS-BERT is the core method proposed in this paper for handling multilingual and multi-label scenarios.

Multilingual BERT (mBERT)

A pretrained language model capable of processing text in multiple languages, supporting cross-lingual semantic representation.

mBERT is the foundational model for HMS-BERT, used to generate contextual semantic representations.

Self-Training

A machine learning strategy that improves model generalization by using unlabeled data to generate pseudo-labels.

HMS-BERT uses self-training strategies to facilitate cross-lingual knowledge transfer.

Pseudo-Label

Labels generated by the model for unlabeled data, used for model updates during the self-training process.

HMS-BERT uses pseudo-labels to enhance detection capabilities in low-resource languages.

Multi-Task Learning

A learning strategy that improves overall model performance by optimizing multiple related tasks simultaneously.

HMS-BERT uses multi-task learning to optimize both multi-label and main classification tasks.

Macro F1-Score

A metric for evaluating the performance of multi-label classification models, considering precision and recall for each label.

HMS-BERT achieves a macro F1-score of up to 0.9847 on the multi-label task.

Handcrafted Features

Features manually designed to enhance the model's understanding of specific tasks.

HMS-BERT combines handcrafted features with contextual representations to improve detection accuracy.

Cross-Lingual Knowledge Transfer

Improving model performance in one language by leveraging knowledge learned in another language.

HMS-BERT achieves cross-lingual knowledge transfer through self-training strategies.

Ablation Study

An experimental method that evaluates the impact of removing certain components of a model on overall performance.

Ablation studies verify the effectiveness of HMS-BERT's components.

Main Classification Task

A task in HMS-BERT responsible for classifying text into three categories (normal, offensive, hateful).

The main classification task achieves an accuracy of 0.6775.

Open Questions Unanswered questions from this research

1 How can HMS-BERT's generalization capability be improved on extremely imbalanced datasets? Current methods may underperform when handling categories with very few samples, requiring new strategies to enhance model robustness.
2 How can the computational complexity of HMS-BERT be reduced for real-time applications? The existing framework has high computational costs and long training times, limiting its application in real-time detection.
3 How can HMS-BERT's broad applicability be validated in more low-resource languages? Research needs to be expanded to cover more languages and verify its effectiveness in different cultural contexts.
4 How can pseudo-label quality be improved to further optimize self-training strategies? The accuracy of pseudo-labels directly affects the training effect of the model, requiring exploration of more efficient generation methods.
5 How can more effective handcrafted features be designed for specific cultural contexts? Existing features may not fully capture the nuances of certain languages, requiring targeted optimization.

Applications

Immediate Applications

Social Media Platforms

HMS-BERT can be used for real-time cyberbullying detection on social media platforms, helping platform managers quickly identify and handle inappropriate content, enhancing user experience.

Educational Institutions

Educational institutions can use HMS-BERT to monitor student interactions on online learning platforms, promptly identifying and intervening in potential bullying behaviors to maintain a healthy learning environment.

Online Gaming

Online gaming companies can deploy HMS-BERT to detect inappropriate language in games, ensuring players engage in a friendly and safe environment.

Long-term Vision

Cross-Cultural Communication

HMS-BERT's multilingual capabilities can facilitate cross-cultural communication, helping people from different language backgrounds better understand and communicate, reducing misunderstandings and conflicts.

Global Cybersecurity

HMS-BERT can be part of global cybersecurity efforts, helping governments and organizations detect and prevent cyberbullying and hate speech, maintaining harmony and safety in cyberspace.

Abstract

Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing methods are commonly limited by monolingual assumptions or single-task formulations, which restrict their effectiveness in realistic multilingual and multi-label scenarios. In this paper, we propose HMS-BERT, a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection. Built upon a pretrained multilingual BERT backbone, HMS-BERT integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. To address labeled data scarcity in low-resource languages, an iterative self-training strategy with confidence-based pseudo-labeling is introduced to facilitate cross-lingual knowledge transfer. Experiments on four public datasets demonstrate that HMS-BERT achieves strong performance, attaining a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task. Ablation studies further verify the effectiveness of the proposed components.

cs.CL stat.ML

References (20)

mBERT-GRU multilingual deep learning framework for hate speech detection in social media

Pardeep Singh, N. Singh, Monika et al.

2023 5 citations ⭐ Influential

MC-BERT4HATE: Hate Speech Detection using Multi-channel BERT for Different Languages and Translations

Hajung Sohn, Hyunju Lee

2019 77 citations ⭐ Influential

Cyberbullying: Causes, Consequences, and Coping Strategies

Nicole L. Weber, William V. Pelfrey

2014 10 citations

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee et al.

2019 111519 citations View Analysis →

Bias and Cyberbullying Detection and Data Generation Using Transformer Artificial Intelligence Models and Top Large Language Models

Y. Kumar, Kuan Huang, Á. Pérez et al.

2024 12 citations

Language-agnostic BERT Sentence Embedding

Fangxiaoyu Feng, Yinfei Yang, Daniel Matthew Cer et al.

2020 1185 citations View Analysis →

MTBullyGNN: A Graph Neural Network-Based Multitask Framework for Cyberbullying Detection

Krishanu Maity, Tanmay Sen, S. Saha et al.

2024 16 citations

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond et al.

2019 9296 citations View Analysis →

How Multilingual is Multilingual BERT?

Telmo Pires, Eva Schlinger, Dan Garrette

2019 1627 citations View Analysis →

A Machine Learning Approach to Cyberbullying Detection in Arabic Tweets

Dhiaa Musleh, Atta Rahman, Mohammed Abbas Alkherallah et al.

2024 17 citations

SCCD: A Session-based Dataset for Chinese Cyberbullying Detection

Qingpo Yang, Yakai Chen, Zihui Xu et al.

2025 7 citations View Analysis →

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal et al.

2019 7990 citations View Analysis →

Multi-Task Self-Training for Learning General Representations

Golnaz Ghiasi, Barret Zoph, E. D. Cubuk et al.

2021 117 citations View Analysis →

Label prompt for multi-label text classification

Rui Song, Zelong Liu, Xingbing Chen et al.

2022 47 citations

Exhaustive Study into Machine Learning and Deep Learning Methods for Multilingual Cyberbullying Detection in Bangla and Chittagonian Texts

Tanjim Mahmud, Michal Ptaszynski, Fumito Masui

2024 60 citations

Hybrid Fake News Detection Model: Bagging and Logistic Regression Approach

Shivam Kumar, N. Tiwari, Abhishek Bajpai et al.

2025 3 citations

Multimodal hate speech detection: a novel deep learning framework for multilingual text and images

Furqan Khan Saddozai, Sahar K. Badri, Daniyal M. Alghazzawi et al.

2025 5 citations

Cyberbullying, Mental Health, and Violence in Adolescents and Associations With Sex and Race: Data From the 2015 Youth Risk Behavior Survey

Mohammed Alhajji, S. Bass, Ting Dai

2019 55 citations

From Words to Wounds: Cyberbullying and Its Influence on Mental Health Across the Lifespan

S. von Humboldt, Gail Low, Isabel Leal

2025 5 citations

Attention is All you Need

Ashish Vaswani, Noam Shazeer, Niki Parmar et al.

2017 169377 citations View Analysis →

HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

HMS-BERT

Multilingual BERT (mBERT)

Self-Training

Pseudo-Label

Multi-Task Learning

Macro F1-Score

Handcrafted Features

Cross-Lingual Knowledge Transfer

Ablation Study

Main Classification Task

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Social Media Platforms

Educational Institutions

Online Gaming

Long-term Vision

Cross-Cultural Communication

Global Cybersecurity

Abstract

References (20)

Related Papers

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Interpretable Semantic Gradients in SSD: A PCA Sweep Approach and a Case Study on AI Discourse

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration