Digital repository of Slovenian research organisations

Show document
A+ | A- | Help | SLO | ENG

Title:Grammatical error correction of Slovenian school essays using large language models
Authors:ID Klemen, Matej (Author)
ID Božič, Martin (Author)
ID Arhar Holdt, Špela (Author)
ID Robnik Šikonja, Marko (Author)
Files:URL URL - Source URL, visit https://www.sodobna-pedagogika.net/clanki/03-2025_popravljanje-slovnicnih-napak-v-slovenskih-esejih-z-velikimi-jezikovnimi-modeli/
 
.pdf PDF - Presentation file, download (284,22 KB)
MD5: 8FCBC4808B27CF0F4E3E8D990C1369CB
 
Language:English
Typology:1.02 - Review Article
Organization:Logo ZDPDS - Association of Societies of Educational Workers of Slovenia
Abstract:Grammatical error correction (GEC) is the task of automatically detecting and correcting grammatical errors in text. Large language models have enabled the development of accurate automated methods for detecting and correcting certain types of errors. In the educational domain, the aim of GEC is to aid teachers in correcting student errors. Excessive paraphrasing is a property of Generative Pre-trained Transformer-based models and is undesirable in the language education context. To avoid this, we develop multiple Slovenian models for correcting errors in spelling, word case (capitalization), word form, and word order. We describe the training data construction, training process, and model evaluation approach using the Šolar-Eval 1.0 corpus of school essays authored by primary and secondary school students. Our quantitative evaluation shows that the developed models have reasonably high accuracy levels, and our qualitative evaluation highlights the strengths and weaknesses of the models and the evaluation process. The analysis reveals multiple challenges and promising future directions for improving both model development and the evaluation process.
Keywords:large language models, grammatical error correction, educational domain, synthetic data construction
Publication status:Published
Publication version:Version of Record
Publication date:01.10.2025
Year of publishing:2025
Number of pages:str. 162-176
Numbering:Letn. 76 = 142, št. 3
PID:20.500.12556/DiRROS-24472 New window
UDC:371.68
ISSN on article:0038-0474
DOI:10.63384/sptB53z793a New window
COBISS.SI-ID:259208195 New window
Publication date in DiRROS:01.12.2025
Views:67
Downloads:36
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Sodobna pedagogika
Shortened title:Sodob. pedagog.
Publisher:Zveza društev pedagoških delavcev Slovenije
ISSN:0038-0474
COBISS.SI-ID:761348 New window

Document is financed by a project

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:J7-3159
Name:Empirična podlaga za digitalno podprt razvoj pisne jezikovne zmožnosti

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:GC-0002
Name:Veliki jezikovni modeli za digitalno humanistiko

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:L2-50070
Name:Tehnike vektorskih vložitev za medijske aplikacije

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P6-0411
Name:Jezikovni viri in tehnologije za slovenski jezik

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P6-0411
Name:Jezikovni viri in tehnologije za slovenski jezik

Funder:EC - European Commission
Funding programme:HE
Project number:101186647
Name:Centre of Excellence in Artificial Intelligence for Digital Humanities
Acronym:AI4DH

Funder:Other - Other funder or multiple funders
Project number:C3.K8.IB
Acronym:PoVeJMo

Funder:SLING
Project number:S24O01-42

Licences

License:CC BY-SA 4.0, Creative Commons Attribution-ShareAlike 4.0 International
Link:http://creativecommons.org/licenses/by-sa/4.0/
Description:This Creative Commons license is very similar to the regular Attribution license, but requires the release of all derivative works under this same license.

Secondary language

Language:Slovenian
Title:Popravljanje slovničnih napak v slovenskih esejih z velikimi jezikovnimi modeli
Abstract:Strojno popravljanje slovničnih napak je naloga, ki zajema samodejno zaznavanje in popravljanje slovničnih napak v besedilu. Na področju izobraževanja je cilj metod pomagati učiteljem pri popravljanju napak učencev. Veliki jezikovni modeli omogočajo razvoj natančnih avtomatskih metod za zaznavanje in popravljanje določenih vrst napak. Da bi se izognili pretiranemu parafraziranju, ki je značilno za modele tipa GPT, in je v kontekstu poučevanja jezika nezaželeno, predstavimo več razvitih slovenskih modelov tipa BERT in T5 za popravljanje različnih vrst napak. Te vključujejo črkovalne napake, napake v rabi velikih začetnic, besednih oblik in besednega reda. V članku opišemo postopek ustvarjanja učnih podatkov, postopek učenja ter postopek evalvacije modelov na korpusu Šolar-Eval 1.0, ki vsebuje šolske spise osnovnošolcev in srednješolcev. Avtomatska evalvacija kaže razmeroma visoko natančnost razvitih modelov, medtem ko ročna kvalitativna evalvacija razkrije prednosti in slabosti razvitih modelov ter evalvacijskega postopka. Analiza razkriva številne izzive in obetavne smeri za nadaljnje izboljšave tako pri razvoju modelov kot pri postopku evalvacije.
Keywords:veliki jezikovni modeli, popravljanje slovničnih napak, izobraževalna domena, sintetiziranje podatkov


Collection

This document is a part of these collections:
  1. Sodobna pedagogika

Back