Digital repository of Slovenian research organisations

Show document
A+ | A- | Help | SLO | ENG

Title:Zasnova in oblikovanje Korpusa znanstvenih besedil sodobne slovenščine
Authors:ID Ledinek, Nina (Author)
ID Trojar, Mitja (Author)
Files:URL URL - Source URL, visit https://ojs.zrc-sazu.si/jz/article/view/14633
 
.pdf PDF - Presentation file, download (1,79 MB)
MD5: 7712723F03DF50B60440634B4F47B32B
 
Language:Slovenian
Typology:1.01 - Original Scientific Article
Organization:Logo ZRC SAZU - The Research Centre of the Slovenian Academy of Sciences and Arts
Abstract:V prispevku predstavljamo Korpus znanstvenih besedil sodobne slovenščine, specializirani pisni korpus slovenščine, ki obsega 33.604.256 pojavnic iz 884 znanstvenih in strokovnih besedil zlasti s področij družboslovja in humanistike, nastalih predvsem med letoma 2000 in 2023. Osredotočamo se na prikaz besedilnotipske sestave korpusa, tehničnih postopkov predpriprave korpusnih besedil, korpusne anotacije, formatov zapisa korpusnih besedil in dostopnosti korpusa. Predstavljamo tudi motivacijo za izgradnjo korpusa in njegovo aplikativno vrednost, pri čemer skušamo opredeliti specifike in prednosti Korpusa znanstvenih besedil sodobne slovenščine glede na druge slovenske korpuse, ki vključujejo strokovna in znanstvena besedila.
Keywords:korpus znanstvenih besedil, specializirani korpus, korpusno označevanje, CoNNl-U
Publication status:Published
Publication version:Version of Record
Publication date:12.11.2025
Year of publishing:2025
Number of pages:str. 119-132
Numbering:letn. 31, št. 2
PID:20.500.12556/DiRROS-25065 New window
UDC:811.163.6'322.3
ISSN on article:0354-0448
DOI:10.3986/JZ.31.2.06 New window
COBISS.SI-ID:262990339 New window
Copyright:Imetniki avtorskih pravic na prispevkih so avtorji
Publication date in DiRROS:08.01.2026
Views:151
Downloads:56
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Jezikoslovni zapiski : zbornik Inštituta za slovenski jezik Frana Ramovša
Shortened title:Jezikosl. zap.
Publisher:Inštitut za slovenski jezik Frana Ramovša ZRC SAZU
ISSN:0354-0448
COBISS.SI-ID:27991296 New window

Document is financed by a project

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P6-0038-2015
Name:Slovenski jezik v sinhronem in diahronem razvoju

Funder:Other - Other funder or multiple funders
Funding programme:Ministrstvo za kulturo Republike Slovenije
Project number:U9
Name:eSSKJ in korpus – na poti k najsodobnejšim jezikovnim podatkom

Licences

License:CC BY-SA 4.0, Creative Commons Attribution-ShareAlike 4.0 International
Link:http://creativecommons.org/licenses/by-sa/4.0/
Description:This Creative Commons license is very similar to the regular Attribution license, but requires the release of all derivative works under this same license.

Secondary language

Language:English
Title:Design and Construction of the Corpus of Scientific Texts of Contemporary Slovenian
Abstract:This paper presents the Corpus of Scientific Texts of Contemporary Slovenian, a specialized written corpus of Slovenian comprising 33,604,256 tokens from 884 scientific and expert texts, primarily in the fields of social sciences and the humanities, published mainly between 2000 and 2023. We focus on describing the text-type composition of the corpus, the technical procedures used in the preprocessing of corpus texts, corpus annotation, text encoding formats and corpus accessibility. We also discuss the rationale for constructing the corpus and its practical applications, aiming to outline the specific characteristics and advantages of the Corpus of Scientific Texts of Contemporary Slovenian in comparison with other Slovenian corpora that include specialized texts.
Keywords:corpus of scientific texts, specialized corpus, corpus annotation, CoNNI-U


Back