Digitalni repozitorij raziskovalnih organizacij Slovenije

Izpis gradiva
A+ | A- | Pomoč | SLO | ENG

Naslov:Approaches to analysing historical newspapers using LLMs
Avtorji:ID Dobranić, Filip (Avtor)
ID Munda, Tina (Avtor)
ID Pejić, Oliver (Avtor)
ID Gorjanc, Vojko (Avtor)
ID Šmajdek, Uroš (Avtor)
ID Bordon, David (Avtor)
ID Lenardič, Jakob (Avtor)
ID Konovšek, Tjaša (Avtor)
ID Pančur, Andrej (Avtor)
ID Pahor de Maiti, Kristina (Avtor)
ID Bohak, Ciril (Avtor)
ID Fišer, Darja (Avtor)
Datoteke:.pdf PDF - Predstavitvena datoteka, prenos (2,83 MB)
MD5: E666D6DB694ACC671883F398DA55C8DC
 
Jezik:Angleški jezik
Tipologija:1.04 - Strokovni članek
Organizacija:Logo INZ - Inštitut za novejšo zgodovino
Povzetek:This study presents a computational analysis of the Slovene historical newspapers \textit{Slovenec} and \textit{Slovenski narod} from the sPeriodika corpus, combining topic modelling, large language model (LLM)-based aspect-level sentiment analysis, entity-graph visualisation, and qualitative discourse analysis to examine how collective identities, political orientations, and national belonging were represented in public discourse at the turn of the twentieth century. Using BERTopic, we identify major thematic patterns and show both shared concerns and clear ideological differences between the two newspapers, reflecting their conservative-Catholic and liberal-progressive orientations. We further evaluate four instruction-following LLMs for targeted sentiment classification in OCR-degraded historical Slovene and select the Slovene-adapted GaMS3-12B-Instruct model as the most suitable for large-scale application, while also documenting important limitations, particularly its stronger performance on neutral sentiment than on positive or negative sentiment. Applied at dataset scale, the model reveals meaningful variation in the portrayal of collective identities, with some groups appearing predominantly in neutral descriptive contexts and others more often in evaluative or conflict-related discourse. We then create NER graphs to explore the relationships between collective identities and places. We apply a mixed methods approach to analyse the named entity graphs, combining quantitative network analysis with critical discourse analysis. The investigation focuses on the emergence and development of intertwined historical political and socionomic identities. Overall, the study demonstrates the value of combining scalable computational methods with critical interpretation to support digital humanities research on noisy historical newspaper data.
Status publikacije:Objavljeno
Verzija publikacije:Objavljena publikacija
Datum objave:27.03.2026
Št. strani:16 str.
PID:20.500.12556/DiRROS-29716 Novo okno
ISSN pri članku:2331-8422
DOI:10.48550/arXiv.2603.25051 Novo okno
COBISS.SI-ID:280325123 Novo okno
Opomba:Nasl. z nasl. zaslona; Opis vira z dne 3. 6. 2026;
Datum objave v DiRROS:03.06.2026
Število ogledov:137
Število prenosov:74
Metapodatki:XML DC-XML DC-RDF
:
Kopiraj citat
  
Objavi na:Bookmark and Share


Postavite miškin kazalec na naslov za izpis povzetka. Klik na naslov izpiše podrobnosti ali sproži prenos.

Gradivo je financirano iz projekta

Financer:ARIS - Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Program financ.:Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Številka projekta:P6-0436-2022
Naslov:Digitalna humanistika: viri, orodja in metode

Financer:ARIS - Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Program financ.:Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Številka projekta:GC-0002
Naslov:Veliki jezikovni modeli za digitalno humanistiko

Licence

Licenca:CC BY 4.0, Creative Commons Priznanje avtorstva 4.0 Mednarodna
Povezava:http://creativecommons.org/licenses/by/4.0/deed.sl
Opis:To je standardna licenca Creative Commons, ki daje uporabnikom največ možnosti za nadaljnjo uporabo dela, pri čemer morajo navesti avtorja.
Začetek licenciranja:03.06.2026

Sekundarni jezik

Jezik:Slovenski jezik
Ključne besede:časopisi, LLM, jezikoslovje, zgodovina


Nazaj