keyboard_arrow_up
Information Retrieval vs Cache Augmented Generation vs Fine Tuning: A Comparative Study on Urdu Medical Question Answering

Authors

Ahmad Mahmood1, Zainab Ahmad1, Iqra Ameer2 and Grigori Sidorov1, 1Instituto Politécnico Nacional (IPN), Mexico, 2The Pennsylvania State University, USA

Abstract

The development of medical question-answering (QA) systems has predominantly focused on high-resource languages, leaving a significant gap for low-resource languages like Urdu. This study proposed a novel corpus designed to advance medical QA research in Urdu, created by translating the benchmark MedQuAD corpus into Urdu using the Generative AI-based translation technique. The proposed corpus is evaluated using three approaches: (i) Information Retrieval (IR), (ii) Cache-Augmented Generation (CAG), and (iii) Fine-Tuning (FT). We conducted two experiments, one on a 500-instance subset and another on the complete 3,152-question corpus, to assess retrieval effectiveness, response accuracy, and computational efficiency. Our results show that JinaAI embeddings outperformed other IR models, while OpenAI 4o mini, FT achieved the highest response accuracy (BERTScore: 70.6%) but is computationally expensive. CAG eliminates retrieval latency but requires high resources. Findings suggest that IR is optimal for real-time QA, Fine-Tuning ensures accuracy, and CAG balances both. This research advances Urdu medical AI, bridging healthcare accessibility gaps.

Keywords

Information retrieval, retrieval-augmented generation, cache-augmented generation, fine-tuning, Urdu medical question-answerin

Full Text  Volume 15, Number 10