keyboard_arrow_up
MagBERT: A Compact Multi-Dialectal BERT for Low-Resource Maghrebi Arabic

Authors

Amina Laggoun 1,2,3 , Chahnez Zakaria 1 , Youness Moukafih 3 , Ouassim Karrakchou 3 and Kamel Smaili 2, 1 Ecole Nationale Superieure dInformatique (ESI), Algeria, 2 Universite de Lorraine, France, 3Universite Internationale de Rabat, Morocco

Abstract

In a landscape dominated by large language models, Maghrebi Arabic dialects, though widely used in everyday communication and informal writing, remain largely underserved by Natural Language Processing (NLP) technologies. Their limited linguistic resources, high variability, and lack of standardized orthography make them particularly challenging to model effectively. To address these issues, this work introduces MagBERT, a lightweight variant of BERT designed specifically for the three major Maghrebi dialects: Algerian, Moroccan, and Tunisian Arabic, in both Arabic and Latin scripts. The model was pre trained then fine-tuned on multiple downstream tasks, demonstrating competitive performance compared to several strong benchmark models. Despite its compact size, MagBERT shows strong potential as an efficient and versatile model for processing under-resourced North African dialects.

Keywords

Maghrebi dialects, NLP, Lightweight BERT, Low-resource.

Full Text  Volume 16, Number 9