Approaches to analysing historical newspapers using LLMs

Dobranić, Filip; Munda, Tina; Pejić, Oliver; Gorjanc, Vojko; Šmajdek, Uroš; Bordon, David; Lenardič, Jakob; Konovšek, Tjaša; Pančur, Andrej; Pahor de Maiti, Kristina; Bohak, Ciril; Fišer, Darja

Show document
A+ | A- | | SLO | ENG

Title:	Approaches to analysing historical newspapers using LLMs
Authors:	ID Dobranić, Filip (Author) ID Munda, Tina (Author) ID Pejić, Oliver (Author) ID Gorjanc, Vojko (Author) ID Šmajdek, Uroš (Author) ID Bordon, David (Author) ID Lenardič, Jakob (Author) ID Konovšek, Tjaša (Author) ID Pančur, Andrej (Author) ID Pahor de Maiti, Kristina (Author) ID Bohak, Ciril (Author) ID Fišer, Darja (Author)
Files:	PDF - Presentation file, download (2,83 MB) MD5: E666D6DB694ACC671883F398DA55C8DC
Language:	English
Typology:	1.04 - Professional Article
Organization:	INZ - Institute of Contemporary History
Abstract:	This study presents a computational analysis of the Slovene historical newspapers \textit{Slovenec} and \textit{Slovenski narod} from the sPeriodika corpus, combining topic modelling, large language model (LLM)-based aspect-level sentiment analysis, entity-graph visualisation, and qualitative discourse analysis to examine how collective identities, political orientations, and national belonging were represented in public discourse at the turn of the twentieth century. Using BERTopic, we identify major thematic patterns and show both shared concerns and clear ideological differences between the two newspapers, reflecting their conservative-Catholic and liberal-progressive orientations. We further evaluate four instruction-following LLMs for targeted sentiment classification in OCR-degraded historical Slovene and select the Slovene-adapted GaMS3-12B-Instruct model as the most suitable for large-scale application, while also documenting important limitations, particularly its stronger performance on neutral sentiment than on positive or negative sentiment. Applied at dataset scale, the model reveals meaningful variation in the portrayal of collective identities, with some groups appearing predominantly in neutral descriptive contexts and others more often in evaluative or conflict-related discourse. We then create NER graphs to explore the relationships between collective identities and places. We apply a mixed methods approach to analyse the named entity graphs, combining quantitative network analysis with critical discourse analysis. The investigation focuses on the emergence and development of intertwined historical political and socionomic identities. Overall, the study demonstrates the value of combining scalable computational methods with critical interpretation to support digital humanities research on noisy historical newspaper data.
Publication status:	Published
Publication version:	Version of Record
Publication date:	27.03.2026
Number of pages:	16 str.
PID:	20.500.12556/DiRROS-29716
ISSN on article:	2331-8422
DOI:	10.48550/arXiv.2603.25051
COBISS.SI-ID:	280325123
Note:	Nasl. z nasl. zaslona; Opis vira z dne 3. 6. 2026;
Publication date in DiRROS:	03.06.2026
Views:	138
Downloads:	74
Metadata:
:	Copy citation

Share:

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Document is financed by a project

Funder:	ARIS - Slovenian Research and Innovation Agency
Funding programme:	Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Project number:	P6-0436-2022
Name:	Digitalna humanistika: viri, orodja in metode

Funder:	ARIS - Slovenian Research and Innovation Agency
Funding programme:	Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Project number:	GC-0002
Name:	Veliki jezikovni modeli za digitalno humanistiko

Licences

License:	CC BY 4.0, Creative Commons Attribution 4.0 International

Link:	http://creativecommons.org/licenses/by/4.0/
Description:	This is the standard Creative Commons license that gives others maximum freedom to do what they want with the work as long as they credit the author.
Licensing start date:	03.06.2026

Secondary language

Language:	Slovenian
Keywords:	časopisi, LLM, jezikoslovje, zgodovina

Back

Show document A+ | A- | | SLO | ENG

Document is financed by a project

Licences

Secondary language

Show document
A+ | A- | | SLO | ENG