Digital repository of Slovenian research organisations

Show document
A+ | A- | Help | SLO | ENG

Title:RAGCare-QA : a benchmark dataset for evaluating retrieval-augmented generation pipelines in theoretical medical knowledge
Authors:ID Dobreva, Jovana (Author)
ID Karasmanakis, Ivana, Institut "Jožef Stefan" (Author)
ID Ivanisevic, Filip, Institut "Jožef Stefan" (Author)
ID Horvat, Tadej, Institut "Jožef Stefan" (Author)
ID Gams, Matjaž, Institut "Jožef Stefan" (Author)
ID Simjanoska Misheva, Monika (Author)
Files:URL URL - Source URL, visit https://www.sciencedirect.com/science/article/pii/S2352340925008674
 
.pdf PDF - Presentation file, download (919,71 KB)
MD5: 095F423B47FF0E0B8227C1229E909D6F
Description: The dataset is available on Hugging Face Hub: https://huggingface.co/datasets/ChatMED-Project/RAGCare-QA.
 
Language:English
Typology:1.03 - Other scientific articles
Organization:Logo IJS - Jožef Stefan Institute
Abstract:The paper introduces RAGCare-QA, an extensive dataset of 420 theoretical medical knowledge questions for assessing Retrieval-Augmented Generation (RAG) pipelines in medical education and evaluation settings. The dataset includes one-choice-only questions from six medical specialties (Cardiology, Endocrinology, Gastroenterology, Family Medicine, Oncology, and Neurology) with three levels of complexity (Basic, Intermediate, and Advanced). Each question is accompanied by the best fit of RAG implementation complexity level, such as Basic RAG (315 questions, 75.0 %), Multi-vector RAG (82 questions, 19.5 %), and Graph-enhanced RAG (23 questions, 5.5 %). The questions emphasize theoretical medical knowledge on fundamental concepts, pathophysiology, diagnostic criteria, and treatment principles important in medical education. The dataset is a useful tool for the assessment of RAG- based medical education systems, allowing researchers to fine-tune retrieval methods for various categories of theoretical medical knowledge questions.
Keywords:medical education, retrieval-augmented generation, theoretical knowledge, multiple-choice questions
Publication status:Published
Publication version:Version of Record
Submitted for review:29.06.2025
Article acceptance date:01.10.2025
Publication date:09.10.2025
Publisher:Elsevier
Year of publishing:2025
Number of pages:str. 1-11
Numbering:Vol. 63, [article no.] 112146
Source:Nizozemska
PID:20.500.12556/DiRROS-25043 New window
UDC:004.8
ISSN on article:2352-3409
DOI:10.1016/j.dib.2025.112146 New window
COBISS.SI-ID:263916803 New window
Copyright:© 2025 The Author(s).
Note:Nasl. z nasl. zaslona; Soavtorji iz Slovenije: Ivana Karasmanakis, Filip Ivanisevic, Tadej Horvat, Matjaž Gams; Opis vira z dne 8. 1. 2026;
Publication date in DiRROS:08.01.2026
Views:237
Downloads:62
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Data in brief
Publisher:Elsevier
ISSN:2352-3409
COBISS.SI-ID:32117977 New window

Document is financed by a project

Funder:EC - European Commission
Project number:101159214
Name:Bridging Research Institutions to Catalyze Generative AI Adoption by the Health Sector in the Widening Countries
Acronym:ChatMED

Licences

License:CC BY 4.0, Creative Commons Attribution 4.0 International
Link:http://creativecommons.org/licenses/by/4.0/
Description:This is the standard Creative Commons license that gives others maximum freedom to do what they want with the work as long as they credit the author.
Licensing start date:09.10.2025
Applies to:VoR

Secondary language

Language:Slovenian
Title:RAGCare-QA: a benchmark dataset for evaluating retrieval-augmented generation pipelines in theoretical medical knowledge
Keywords:medicinsko izobraževanje, ocena znanja, teoretično znanje, vprašanja z več možnimi odgovori


Back