Digital repository of Slovenian research organisations

Show document
A+ | A- | Help | SLO | ENG

Title:Korpusnaja lingvistika, častotnaja i tolkovaja leksikografija : vektory vzaimodejstvija
Authors:ID Mečkovskaja, Nina Borisovna (Author)
Files:URL URL - Source URL, visit https://srl.si/ojs/srl/article/view/4171
 
.pdf PDF - Presentation file, download (339,32 KB)
MD5: BB5DD10444649F99247F64B1B6272640
 
Language:Russian
Typology:1.01 - Original Scientific Article
Organization:Logo ZSSD - Association of Slovenian Slavic Societies
Abstract:Pri obščem roste količestva korpusov, ih obʺemov i raznoobrazija proishodit specializacija korpusov v zavisimosti ot sostava ih targetirovannogo kontenta. Èlektronnye korpusy pervogo pokolenija (obʺemom primerno 100 mln slovoupotreblenij), nazyvaemye ili osoznavaemye kak “nacionalʹnye” ili “gosudarstvennye”, cohranjajut otnositelʹnuju sbalansirovannostʹ podkorpusov i širokuju socialʹno-gumanitarnuju adresaciju. Po mere uveličenija obʺemov bolee pozdnih korpusov proishodit ih specializacija po dvum vektoram: 1) soderžatelʹno orientirovannye monitornye (popolnjaemye) megakorpusy gazetno-žurnalʹnyh tekstov; v celevye gruppy korpusnogo kontenta dannogo klassa vhodjat sociologi i politologi, èkonomisty, demografy, žurnalisty i dr.; 2) tematičeski bezgraničnye (neizbiratelʹnye) korpusy, akkumulirujuščie ocifrovannye teksty (pečatnye i èlektronnye), ispolʹzuemye v informatike kak syrʹe dlja “obrabotki estestvnnogo jazyka” (natural language processing): mašinnogo predobučenija nejronnyh setej i sozdanija statističeskih algoritmov samosvjazyvaemosti slov v adekvatnye tekstovye reakcii iskusstvennogo intellekta. Nazvany dve naibolee značitelʹnye novatorskie razrabotki v korpusnoî leksikografii: 1) sintez tolkovogo i častotnogo slovareî v slovarjah Macmillan (2007), pozže Collins, Longman; 2) komponentnyj semantičeskij analiz 100-tysjačnogo slovnika s ispolʹzovaniem v kačestve semantičeskih komponentov 2.500 samyh častyh leksem v Macmillan 2007. Vozmožnosti korpusov v skorom vremeni privedut k krupnym dostiženijam v diahroničeskoj lingvistike.
Publication date:01.01.2025
Year of publishing:2025
Number of pages:str. 435–449
Numbering:Letn. 73, št. 3
PID:20.500.12556/DiRROS-27341 New window
UDC:81'322:81'374.2
ISSN on article:0350-6894
DOI:10.57589/srl.v73i3.4171 New window
COBISS.SI-ID:264140803 New window
Note:Lat. in cir.;
Publication date in DiRROS:03.02.2026
Views:138
Downloads:71
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Slavistična revija : časopis za jezikoslovje in literarne vede
Publisher:Slavistično društvo Slovenije
ISSN:0350-6894
COBISS.SI-ID:761092 New window

Secondary language

Language:English
Title:Corpus linguistics, frequency and explanatory dictionaries : interaction vectors
Abstract:With the overall growth in the number of corpora, their volumes and diversity, there is a specialization of corpora depending on their targeted content. Electronic corpora of the first generation (with a volume of approximately 100-million-word tokens), called or perceived as “national” or “state”, retain a relative balance of subcorpora and a broad social science and humanities audience. As the volumes of later corpora increase, there is a specialization of their purpose along two vectors: 1) content-oriented monitor (replenished) megacorpora of newspaper and magazine texts; the target groups of corpus content of this class include sociologists and political scientists, economists, demographers, journalists, etc. 2) thematically unlimited (non-selective) corpora accumulating digitalized texts (printed and electronic) used in computer science as raw material for “natural language processing” (machine pre-training 436 Slavistična revija, letnik 73/2025, št. 3, julij–september of neural networks) and creation of statistical algorithms for self-linking words into adequate verbal responses of artificial intelligence. Two most significant innovative developments in corpus lexicography are named: 1) synthesis of explanatory and frequency dictionaries in the Macmillan dictionaries (2007) and later Collins, Longman; 2) component semantic analysis of a 100,000-word dictionary using the 2,500 most frequent lexemes in Macmillan (2007) as semantic components
Keywords:frekvenčni slovarji, sinteza razlagalnih in frekvenčnih slovarjev v slovarjih Macmillan (2007), komponentna pomenska analiza 100.000-besednega slovarja, visokofrekvenčne besede kot semantični multiplikatorji, frequency dictionaries, synthesis of explanatory and frequency dictionaries in the Macmillan dictionary (2007), semantic component analysis of a 100, 000-word dictionary, high-frequency words as semantic multipliers


Collection

This document is a part of these collections:
  1. Slavistična revija

Back