| Title: | FuDoBa : fusing document and knowledge graph based representations with Bayesian optimisation |
|---|
| Authors: | ID Koloski, Boshko, Institut "Jožef Stefan" (Author) ID Pollak, Senja, Institut "Jožef Stefan" (Author) ID Navigli, Roberto (Author) ID Škrlj, Blaž, Institut "Jožef Stefan" (Author) |
| Files: | URL - Source URL, visit https://link.springer.com/article/10.1007/s10994-026-07008-y
PDF - Presentation file, download (7,33 MB) MD5: 9BB8D0EAE48F39924BFB52DCFE589268
|
|---|
| Language: | English |
|---|
| Typology: | 1.01 - Original Scientific Article |
|---|
| Organization: | IJS - Jožef Stefan Institute
|
|---|
| Abstract: | Building on the success of large language models (LLMs), LLM-based representations have dominated the document representation landscape, achieving strong performance on document embedding benchmarks. However, high-dimensional, computationally expensive LLM embeddings can be too generic or inefficient for domain-specific and resource-scarce applications. To address these limitations, we introduce FuDoBa—a Bayesian optimisation-based representation learning method that integrates LLM embeddings with domain-specific structured knowledge, sourced both locally and from external repositories such as WikiData. This fusion produces low-dimensional, task-relevant representations while reducing training complexity and yielding interpretable early-fusion weights for improved classification performance. We demonstrate the effectiveness of our approach on six datasets across two domains, showing that when paired with robust AutoML-based classifiers, our method performs on par with, or surpasses, proprietary LLM-only embedding baselines, while offering modality-wise interpretability and a smaller dimensional footprint. |
|---|
| Keywords: | document classification, Bayesian optimisation, representation learning, knowledge graphs |
|---|
| Publication status: | Published |
|---|
| Publication version: | Version of Record |
|---|
| Submitted for review: | 23.04.2025 |
|---|
| Article acceptance date: | 02.02.2026 |
|---|
| Publication date: | 06.03.2026 |
|---|
| Publisher: | Springer Nature |
|---|
| Year of publishing: | 2026 |
|---|
| Number of pages: | str. 1-39 |
|---|
| Numbering: | Vol. 115, article no. 61 |
|---|
| Source: | Švica |
|---|
| PID: | 20.500.12556/DiRROS-28309  |
|---|
| UDC: | 004.8 |
|---|
| ISSN on article: | 1573-0565 |
|---|
| DOI: | 10.1007/s10994-026-07008-y  |
|---|
| COBISS.SI-ID: | 271609091  |
|---|
| Copyright: | © The Author(s) 2026 |
|---|
| Note: | Nasl. z nasl. zaslona;
Soavtorja iz Slovenije: Senja Pollak, Blaž Škrlj;
Opis vira z dne 13. 3. 2026;
|
|---|
| Publication date in DiRROS: | 13.03.2026 |
|---|
| Views: | 25 |
|---|
| Downloads: | 22 |
|---|
| Metadata: |  |
|---|
|
:
|
Copy citation |
|---|
| | | | Share: |  |
|---|
Hover the mouse pointer over a document title to show the abstract or click
on the title to get all document metadata. |