Digitalni repozitorij raziskovalnih organizacij Slovenije

Izpis gradiva
A+ | A- | Pomoč | SLO | ENG

Naslov:Benchmarking sentence encoders in associating indicators with sustainable development goals and targets
Avtorji:ID Gjorgjevikj, Ana, Institut "Jožef Stefan" (Avtor)
ID Mishev, Kostadin (Avtor)
ID Trajanov, Dimitar (Avtor)
ID Kocarev, Ljupčo (Avtor)
Datoteke:URL URL - Izvorni URL, za dostop obiščite https://ieeexplore.ieee.org/document/11113321
 
.pdf PDF - Predstavitvena datoteka, prenos (6,64 MB)
MD5: E5E52B5B432E515E622D470E14C40269
 
Jezik:Angleški jezik
Tipologija:1.01 - Izvirni znanstveni članek
Organizacija:Logo IJS - Institut Jožef Stefan
Povzetek:The United Nations’ 2030 Agenda for Sustainable Development balances the economic, environmental, and social dimension of sustainable development in 17 Sustainable Development Goals (SDGs), monitored through a well-defined set of targets and global indicators. Although essential for humanity’s future well-being, this monitoring is still challenging due to the variable quality of the statistical data of global indicators compiled at the national level and the diversity of indicators used to monitor sustainable development at the subnational level. Associating indicators other than the global ones with the SDGs/targets may help not only to expand the statistical data, but to better align the efforts toward sustainable development taken at (sub)national level. This article presents a model-agnostic framework for associating such indicators with the SDGs and targets by comparing their textual descriptions in a common representation space. While removing the dependence on the quantity and quality of the statistical data of the indicators, it provides human experts with data-driven suggestions on the complex and not always obvious associations between the indicators and the SDGs/targets. A comprehensive domain-specific benchmarking of a diverse sentence encoder portfolio was performed first, followed by fine-tuning of the best ones on a newly created dataset. Five sets of indicators used at the (sub)national level of governance (around 800 indicators in total) were used for the evaluation. Finally, the influence of 40 factors on the results was analyzed using explainable artificial intelligence (xAI) methods. The results show that 1) certain sentence encoders are better suited to solving the task than others (potentially due to their diverse pre-training datasets), 2) the fine-tuning not only improves the predictive performance over the baselines but also reduces the sensitivity to changes in indicator description length (performance drops even by up to 17% for baseline models as length increases, but remains comparable for fine-tuned models), and 3) better selected training instances have the potential to improve the performance even further (taking into account the limited fine-tuning dataset currently used and the insights from the xAI analysis). Most importantly, this article contributes to filling the existing gap in comprehensive benchmarking of AI models in solving the problem.
Ključne besede:representation learning
Status publikacije:Objavljeno
Verzija publikacije:Objavljena publikacija
Poslano v recenzijo:24.06.2025
Datum sprejetja članka:19.07.2025
Datum objave:05.08.2025
Založnik:IEEE
Leto izida:2025
Št. strani:str. 141434-141460
Številčenje:Vol. 13
Izvor:ZDA
PID:20.500.12556/DiRROS-23358 Novo okno
UDK:004.8
ISSN pri članku:2169-3536
DOI:10.1109/ACCESS.2025.3595894 Novo okno
COBISS.SI-ID:246189571 Novo okno
Avtorske pravice:© 2025 The Authors.
Opomba:Nasl. z nasl. zaslona; Opis vira z dne 21. 8. 2025;
Datum objave v DiRROS:21.08.2025
Število ogledov:325
Število prenosov:125
Metapodatki:XML DC-XML DC-RDF
:
Kopiraj citat
  
Objavi na:Bookmark and Share


Postavite miškin kazalec na naslov za izpis povzetka. Klik na naslov izpiše podrobnosti ali sproži prenos.

Gradivo je del revije

Naslov:IEEE access
Založnik:Institute of Electrical and Electronics Engineers
ISSN:2169-3536
COBISS.SI-ID:519839513 Novo okno

Gradivo je financirano iz projekta

Financer:ARIS - Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Številka projekta:P2-0098
Naslov:Računalniške strukture in sistemi

Financer:ARIS - Javna agencija za znanstvenoraziskovalno in inovacijsko dejavnost Republike Slovenije
Številka projekta:GC-0001
Naslov:Umetna inteligenca za znanost

Financer:EC - European Commission
Številka projekta:101211695
Naslov:Framework for Robust and Explainable Automated Large Language Model Selection
Akronim:AutoLLMSelect

Licence

Licenca:CC BY 4.0, Creative Commons Priznanje avtorstva 4.0 Mednarodna
Povezava:http://creativecommons.org/licenses/by/4.0/deed.sl
Opis:To je standardna licenca Creative Commons, ki daje uporabnikom največ možnosti za nadaljnjo uporabo dela, pri čemer morajo navesti avtorja.
Začetek licenciranja:05.08.2025
Vezano na:VoR

Nazaj