Digital repository of Slovenian research organisations

Show document
A+ | A- | Help | SLO | ENG

Title:FuDoBa : fusing document and knowledge graph based representations with Bayesian optimisation
Authors:ID Koloski, Boshko, Institut "Jožef Stefan" (Author)
ID Pollak, Senja, Institut "Jožef Stefan" (Author)
ID Navigli, Roberto (Author)
ID Škrlj, Blaž, Institut "Jožef Stefan" (Author)
Files:URL URL - Source URL, visit https://link.springer.com/article/10.1007/s10994-026-07008-y
 
.pdf PDF - Presentation file, download (7,33 MB)
MD5: 9BB8D0EAE48F39924BFB52DCFE589268
 
Language:English
Typology:1.01 - Original Scientific Article
Organization:Logo IJS - Jožef Stefan Institute
Abstract:Building on the success of large language models (LLMs), LLM-based representations have dominated the document representation landscape, achieving strong performance on document embedding benchmarks. However, high-dimensional, computationally expensive LLM embeddings can be too generic or inefficient for domain-specific and resource-scarce applications. To address these limitations, we introduce FuDoBa—a Bayesian optimisation-based representation learning method that integrates LLM embeddings with domain-specific structured knowledge, sourced both locally and from external repositories such as WikiData. This fusion produces low-dimensional, task-relevant representations while reducing training complexity and yielding interpretable early-fusion weights for improved classification performance. We demonstrate the effectiveness of our approach on six datasets across two domains, showing that when paired with robust AutoML-based classifiers, our method performs on par with, or surpasses, proprietary LLM-only embedding baselines, while offering modality-wise interpretability and a smaller dimensional footprint.
Keywords:document classification, Bayesian optimisation, representation learning, knowledge graphs
Publication status:Published
Publication version:Version of Record
Submitted for review:23.04.2025
Article acceptance date:02.02.2026
Publication date:06.03.2026
Publisher:Springer Nature
Year of publishing:2026
Number of pages:str. 1-39
Numbering:Vol. 115, article no. 61
Source:Švica
PID:20.500.12556/DiRROS-28309 New window
UDC:004.8
ISSN on article:1573-0565
DOI:10.1007/s10994-026-07008-y New window
COBISS.SI-ID:271609091 New window
Copyright:© The Author(s) 2026
Note:Nasl. z nasl. zaslona; Soavtorja iz Slovenije: Senja Pollak, Blaž Škrlj; Opis vira z dne 13. 3. 2026;
Publication date in DiRROS:13.03.2026
Views:25
Downloads:22
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Machine learning
Shortened title:Mach. learn.
Publisher:Kluwer
ISSN:1573-0565
COBISS.SI-ID:513211417 New window

Document is financed by a project

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:GC-0001-2024
Name:Umetna inteligenca za znanost

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:GC-0002-2024
Name:Veliki jezikovni modeli za digitalno humanistiko

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:L2-50070-2023
Name:Tehnike vektorskih vložitev za medijske aplikacije

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:J5-3102-2021
Name:Sovražni govor v sodobnih konceptualizacijah nacionalizma, rasizma, spola in migracij

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P2-0103-2022
Name:Tehnologije znanja

Funder:ARIS - Slovenian Research and Innovation Agency
Funding programme:Young Researcher Grant
Project number:PR-12394

Funder:Italian Ministry of University and Research (Ministero dell'Università e della Ricerca)
Project number:MIUR_PRIN 2020 2020ZSL9F9
Name:CRoss-modal understanding and gEnerATIon of Visual and tExtual content
Acronym:CREATIVE

Licences

License:CC BY 4.0, Creative Commons Attribution 4.0 International
Link:http://creativecommons.org/licenses/by/4.0/
Description:This is the standard Creative Commons license that gives others maximum freedom to do what they want with the work as long as they credit the author.
Licensing start date:06.03.2026
Applies to:VoR

Secondary language

Language:Slovenian
Title:FuDoBa: fusing document and knowledge graph based representations with Bayesian optimisation
Keywords:razvrščanje dokumentov, optimizacija, grafi znanja


Back