Korpusnaja lingvistika, častotnaja i tolkovaja leksikografija : vektory vzaimodejstvija

Mečkovskaja, Nina Borisovna

Show document
A+ | A- | | SLO | ENG

Title:	Korpusnaja lingvistika, častotnaja i tolkovaja leksikografija : vektory vzaimodejstvija
Authors:	ID Mečkovskaja, Nina Borisovna (Author)
Files:	URL - Source URL, visit https://srl.si/ojs/srl/article/view/4171 PDF - Presentation file, download (339,32 KB) MD5: BB5DD10444649F99247F64B1B6272640
Language:	Russian
Typology:	1.01 - Original Scientific Article
Organization:	ZSSD - Association of Slovenian Slavic Societies
Abstract:	Pri obščem roste količestva korpusov, ih obʺemov i raznoobrazija proishodit specializacija korpusov v zavisimosti ot sostava ih targetirovannogo kontenta. Èlektronnye korpusy pervogo pokolenija (obʺemom primerno 100 mln slovoupotreblenij), nazyvaemye ili osoznavaemye kak “nacionalʹnye” ili “gosudarstvennye”, cohranjajut otnositelʹnuju sbalansirovannostʹ podkorpusov i širokuju socialʹno-gumanitarnuju adresaciju. Po mere uveličenija obʺemov bolee pozdnih korpusov proishodit ih specializacija po dvum vektoram: 1) soderžatelʹno orientirovannye monitornye (popolnjaemye) megakorpusy gazetno-žurnalʹnyh tekstov; v celevye gruppy korpusnogo kontenta dannogo klassa vhodjat sociologi i politologi, èkonomisty, demografy, žurnalisty i dr.; 2) tematičeski bezgraničnye (neizbiratelʹnye) korpusy, akkumulirujuščie ocifrovannye teksty (pečatnye i èlektronnye), ispolʹzuemye v informatike kak syrʹe dlja “obrabotki estestvnnogo jazyka” (natural language processing): mašinnogo predobučenija nejronnyh setej i sozdanija statističeskih algoritmov samosvjazyvaemosti slov v adekvatnye tekstovye reakcii iskusstvennogo intellekta. Nazvany dve naibolee značitelʹnye novatorskie razrabotki v korpusnoî leksikografii: 1) sintez tolkovogo i častotnogo slovareî v slovarjah Macmillan (2007), pozže Collins, Longman; 2) komponentnyj semantičeskij analiz 100-tysjačnogo slovnika s ispolʹzovaniem v kačestve semantičeskih komponentov 2.500 samyh častyh leksem v Macmillan 2007. Vozmožnosti korpusov v skorom vremeni privedut k krupnym dostiženijam v diahroničeskoj lingvistike.
Publication date:	01.01.2025
Year of publishing:	2025
Number of pages:	str. 435–449
Numbering:	Letn. 73, št. 3
PID:	20.500.12556/DiRROS-27341
UDC:	81'322:81'374.2
ISSN on article:	0350-6894
DOI:	10.57589/srl.v73i3.4171
COBISS.SI-ID:	264140803
Note:	Lat. in cir.;
Publication date in DiRROS:	03.02.2026
Views:	375
Downloads:	212
Metadata:
:	Copy citation

Share:

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:	Slavistična revija : časopis za jezikoslovje in literarne vede
Publisher:	Slavistično društvo Slovenije
ISSN:	0350-6894
COBISS.SI-ID:	761092

Secondary language

Language:	English
Title:	Corpus linguistics, frequency and explanatory dictionaries : interaction vectors
Abstract:	With the overall growth in the number of corpora, their volumes and diversity, there is a specialization of corpora depending on their targeted content. Electronic corpora of the first generation (with a volume of approximately 100-million-word tokens), called or perceived as “national” or “state”, retain a relative balance of subcorpora and a broad social science and humanities audience. As the volumes of later corpora increase, there is a specialization of their purpose along two vectors: 1) content-oriented monitor (replenished) megacorpora of newspaper and magazine texts; the target groups of corpus content of this class include sociologists and political scientists, economists, demographers, journalists, etc. 2) thematically unlimited (non-selective) corpora accumulating digitalized texts (printed and electronic) used in computer science as raw material for “natural language processing” (machine pre-training 436 Slavistična revija, letnik 73/2025, št. 3, julij–september of neural networks) and creation of statistical algorithms for self-linking words into adequate verbal responses of artificial intelligence. Two most significant innovative developments in corpus lexicography are named: 1) synthesis of explanatory and frequency dictionaries in the Macmillan dictionaries (2007) and later Collins, Longman; 2) component semantic analysis of a 100,000-word dictionary using the 2,500 most frequent lexemes in Macmillan (2007) as semantic components
Keywords:	frekvenčni slovarji, sinteza razlagalnih in frekvenčnih slovarjev v slovarjih Macmillan (2007), komponentna pomenska analiza 100.000-besednega slovarja, visokofrekvenčne besede kot semantični multiplikatorji, frequency dictionaries, synthesis of explanatory and frequency dictionaries in the Macmillan dictionary (2007), semantic component analysis of a 100, 000-word dictionary, high-frequency words as semantic multipliers

Collection

This document is a part of these collections:

Slavistična revija

Back

Show document A+ | A- | | SLO | ENG

Record is a part of a journal

Secondary language

Collection

Show document
A+ | A- | | SLO | ENG