1. Digitalisation of inorganic chemistry with LLMsBoshko Koloski, Senja Pollak, Sašo Džeroski, Aleksandar Kondinski, 2026, original scientific article Abstract: Over the past few years, large language models have become technologically ubiquitous and now offer a powerful route to accelerate discoveries in chemistry. In this article, we highlight current impactful applications of large language models in inorganic chemistry, from smart text mining of the inorganic literature through the proposal and discovery of new materials to real-time experimentation. We also discuss ongoing developments and their potential future impact on the field. Published in DiRROS: 19.03.2026; Views: 221; Downloads: 123
Full text (561,79 KB) This document has many files! More... |
2. |
3. |
4. Large language models in food and nutrition science : opportunities, challenges, and the case of FoodyLLMAna Gjorgjevikj, Matej Martinc, Gjorgjina Cenikj, Jan Drole, Nives Ogrinc, Sašo Džeroski, Barbara Koroušić-Seljak, Tome Eftimov, 2026, original scientific article Abstract: Background Reliable nutrient profiling and semantic interoperability are essential for scalable dietary assessment, food labeling (e.g., traffic-light schemes), and FAIR integration of food composition and consumption data. However, general-purpose large language models (LLMs) are not systematically exposed to structured recipe–nutrition mappings and food ontologies, limiting their accuracy and trustworthiness in food and nutrition tasks. Scope and approach We review recent LLM advances in life sciences and healthcare and analyze the gap in food and nutrition applications. To address this gap, we introduce FoodyLLM, a domain-specialized LLM fine-tuned on 225k task-aligned QA pairs for (i) recipe nutrient estimation, (ii) traffic-light classification, and (iii) ontology-based entity linking to support FAIR food data interoperability. We benchmark FoodyLLM against strong general-purpose baselines (e.g., Llama 3 8B, Gemini 2.0) under zero-/few-shot prompting across five evaluation folds. Key findings Across all tasks, FoodyLLM substantially outperforms general-purpose LLMs for nutrient estimation across all macronutrients (fat, protein, salt, saturates, sugar), accuracy increases from 0.43 to 0.63 to 0.91–0.97; for traffic-light classification across all nutrients and color categories, macro F1 improves from 0.46 to 0.80 to 0.86–0.97; and for ontology-based food entity linking across FoodOn, SNOMED-CT, and Hansard, macro F1 increases from 0.33 to 0.44 (best general-purpose baseline) to 0.93–0.98 on artificial NEL data, and from 0.24 to 0.51 to 0.67–0.84 on real corpora (CafeteriaSA and CafeteriaFCD). Overall, our results demonstrate the practical value of domain-specialized LLMs in food and nutrition research. They enable automated dietary assessment, large-scale nutritional monitoring, and FAIR data integration, while opening new pathways toward sustainable and personalized nutrition. Keywords: FoodyLLM, nutrient estimation, data interoperability Published in DiRROS: 04.03.2026; Views: 293; Downloads: 156
Full text (6,37 MB) This document has many files! More... |
5. Variational oblique predictive clustering treesViktor Andonovikj, Sašo Džeroski, Biljana Mileva Boshkoska, Pavle Boškoski, 2026, original scientific article Abstract: Oblique predictive clustering trees (SPYCTs) are semi-supervised multi-target prediction models mainly used for structured output prediction (SOP) problems. They are computationally efficient and when combined in ensembles they achieve state-of-the-art results. However, one major issue is that it is challenging to interpret an ensemble of SPYCTs without the use of a model-agnostic method. We propose variational oblique predictive clustering trees, which address this challenge. The parameters of each split node are treated as random variables, described with a probability distribution, and they are learned through the Variational Bayes method. We evaluate the model on several benchmark datasets of different sizes. The experimental analyses show that a single variational oblique predictive clustering tree (VSPYCT) achieves competitive, and sometimes better predictive performance than the ensemble of standard SPYCTs. We also present a method for extracting feature importance scores from the model. Finally, we present a method to visually interpret the model’s decision making process through analysis of the relative feature importance in each split node. Keywords: machine learning, predictive clustering, interpretable models, structured output prediction, uncertainty quantification Published in DiRROS: 17.02.2026; Views: 289; Downloads: 139
Full text (2,20 MB) This document has many files! More... |
6. Predictions of failed satellite retrieval of air quality using machine learningEdward Malina, Jure Brence, Jennifer Adams, Jovan Tanevski, Sašo Džeroski, Valentin Kantchev, Kevin W. Bowman, 2025, original scientific article Abstract: The growing fleet of Earth observation (EO) satellites is capturing unprecedented quantities of information about the concentration and distribution of trace gases in the Earth's atmosphere. Depending on the instrument and algorithm, the yield of good remote soundings can be a few percent owing to interferences such as clouds, non-linearities in the retrieval algorithm, and systematic errors in the radiative transfer algorithm, leading to inefficient use of computational resources. In this study, we investigate machine learning (ML) techniques to predict failures in the trace gas retrieval process based upon the input satellite radiances alone, allowing for efficient production of good-quality data. We apply this technique to ozone and other retrievals using measurements from multiple satellites: the Suomi National Polar-orbiting Partnership Cross-Track Infrared Sounder (Suomi NPP CrIS) and joint retrievals from the Atmospheric Infrared Sounder (AIRS) Ozone Monitoring Instrument (OMI). Retrievals are performed using the MUlti-SpEctra, MUlti-SpEcies, Multi-SEnsors (MUSES) algorithm. With this tool, we can identify 80 % of ozone retrieval failures using the MUSES algorithm at a cost of 20 % false positives from CrIS. For AIRS-OMI, 98 % of ozone retrieval failures are identified at a cost of 2 % false positives. The ML tool is simple to generate and takes <0.1 s to assess each measured spectrum. The results suggest that this tool can be applied to data from many EO satellites and can reduce the processing load for current and future instruments. Keywords: trace gases, failure prediction Published in DiRROS: 20.01.2026; Views: 252; Downloads: 203
Full text (13,63 MB) This document has many files! More... |
7. Optimizing foamed glass production with machine learningUroš Hribar, Sintija Stevanoska, Christian Leonardo Camacho Villalón, Matjaž Spreitzer, Jakob Koenig, Sašo Džeroski, 2025, original scientific article Abstract: Foamed glass is a lightweight material commonly used for insulation. However, optimizing its properties remains a challenge due to the large number of synthesis parameters involved in its production. While previous studies have investigated synthesis conditions, a comprehensive study applying machine learning approaches is lacking in the literature. In this paper, we apply machine learning methods, i.e., random forests of predictive clustering trees and a multilayer perceptron, training them on 124 experimental data points to accurately predict the apparent density and closed porosity of foamed glass. We then apply a multiobjective optimization algorithm together with the multilayer perceptron to find optimal values for the process parameters used in foamed glass production. Our results show that the combination of machine learning and multiobjective optimization is an effective proxy for the development of novel foamed glass materials. Keywords: process optimization, machine learning, foamed glass Published in DiRROS: 18.11.2025; Views: 506; Downloads: 219
Full text (1,51 MB) This document has many files! More... |
8. FoodSEM : large language model specialized in food named-entity linkingAna Gjorgjevikj, Matej Martinc, Gjorgjina Cenikj, Sašo Džeroski, Barbara Koroušić-Seljak, Tome Eftimov, 2026, published scientific conference contribution Keywords: large language models, food ontology, food data, named-entity linking Published in DiRROS: 02.10.2025; Views: 557; Downloads: 0
Full text (60,01 KB) |
9. |
10. Discovery of exact equations for integer sequencesBoštjan Gec, Sašo Džeroski, Ljupčo Todorovski, 2024, original scientific article Abstract: Equation discovery, also known as symbolic regression, is the field of machine learning that studies algorithms for discovering quantitative laws, expressed as closed-form equations or formulas, in collections of observed data. The latter is expected to come from measurements of physical systems and, therefore, noisy, moving the focus of equation discovery algorithms towards discovering approximate equations. These loosely match the noisy observed data, rendering them inappropriate for applications in mathematics. In this article, we introduce Diofantos, an algorithm for discovering equations in the ring of integers that exactly match the training data. Diofantos is based on a reformulation of the equation discovery task into the task of solving linear Diophantine equations. We empirically evaluate the performance of Diofantos on reconstructing known equations for more than 27,000 sequences from the online encyclopedia of integer sequences, OEIS. Diofantos successfully reconstructs more than 90% of these equations and clearly outperforms SINDy, a state-of-the-art method for discovering approximate equations, that achieves a reconstruction rate of less than 70%. Keywords: symbolic regression, equation discovery, online encyclopedia of integer sequences Published in DiRROS: 27.03.2025; Views: 824; Downloads: 549
Full text (425,51 KB) This document has many files! More... |