Digital repository of Slovenian research organisations

Show document
A+ | A- | Help | SLO | ENG

Title:Counting trees : a treebank-driven exploration of syntactic variation in speech and writing across languages
Authors:ID Dobrovoljc, Kaja, Institut "Jožef Stefan" (Author)
Files:URL URL - Source URL, visit https://www.degruyterbrill.com/document/doi/10.1515/cllt-2025-0046/html
 
.pdf PDF - Presentation file, download (1,93 MB)
MD5: 956FA3ECE7A9C6897BC90B5C5D9AC9A7
 
Language:English
Typology:1.01 - Original Scientific Article
Organization:Logo IJS - Jožef Stefan Institute
Abstract:This paper presents a novel treebank-driven approach to comparing syntactic structures in speech and writing using dependency-parsed corpora. Adopting a fully inductive, bottom-up method, we define syntactic structures as delexicalized dependency (sub)trees and extract them from spoken and written Universal Dependencies (UD) treebanks in two syntactically distinct languages, English and Slovenian. For each corpus, we analyze the size, diversity, and distribution of syntactic inventories, their overlap across modalities, and the structures most characteristic of speech. Results show that, across both languages, spoken corpora contain fewer and less diverse syntactic structures than their written counterparts, with consistent cross-linguistic preferences for certain structural types across modalities. Strikingly, the overlap between spoken and written syntactic inventories is very limited: most structures attested in speech do not occur in writing, pointing to modality-specific preferences in syntactic organization that reflect the distinct demands of real-time interaction and elaborated writing. This contrast is further supported by a keyness analysis of the most frequent speech-specific structures, which highlights patterns associated with interactivity, context-grounding, and economy of expression. We argue that this scalable, language-independent framework offers a useful general method for systematically studying syntactic variation across corpora, laying the groundwork for more comprehensive data-driven theories of grammar in use.
Keywords:register variation, dependency treebanks, syntactic structures, syntactic comparison, keyness analysis, corpus-driven linguistics
Publication status:Published
Publication version:Version of Record
Submitted for review:02.06.2025
Article acceptance date:27.01.2026
Publication date:23.02.2026
Publisher:Mouton de Gruyter
Year of publishing:2026
Number of pages:str. 2-37
Source:Nemčija
PID:20.500.12556/DiRROS-28231 New window
UDC:81'32
ISSN on article:1613-7035
DOI:10.1515/cllt-2025-0046 New window
COBISS.SI-ID:271469571 New window
Copyright:© 2026 the author(s), published by De Gruyter.
Note:Nasl. z nasl. zaslona; Opis vira z dne 12. 3. 2026;
Publication date in DiRROS:12.03.2026
Views:47
Downloads:26
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Corpus linguistics and linguistic theory
Publisher:Mouton de Gruyter
ISSN:1613-7035
COBISS.SI-ID:520104729 New window

Document is financed by a project

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:Z6-4617-2022
Name:Na drevesnici temelječ pristop k raziskavam govorjene slovenščine

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P6-0411-2019
Name:Jezikovni viri in tehnologije za slovenski jezik

Licences

License:CC BY 4.0, Creative Commons Attribution 4.0 International
Link:http://creativecommons.org/licenses/by/4.0/
Description:This is the standard Creative Commons license that gives others maximum freedom to do what they want with the work as long as they credit the author.
Licensing start date:26.02.2026
Applies to:VoR

Secondary language

Language:Slovenian
Keywords:registerska variacija, odvisnostni drevesniki, odvisnostno označeni korpusi, skladenjske strukture, primerjava skladnje, analiza ključnosti, korpusno gnano jezikoslovje


Back