MsGEN : measuring generalization of nutrient value prediction across different recipe datasets

Ispirova, Gordana; Eftimov, Tome; Džeroski, Sašo; Koroušić-Seljak, Barbara

Show document
A+ | A- | | SLO | ENG

Title:	MsGEN : measuring generalization of nutrient value prediction across different recipe datasets
Authors:	ID Ispirova, Gordana, Institut Jožef Stefan (Author) ID Eftimov, Tome, Institut Jožef Stefan (Author) ID Džeroski, Sašo, Institut Jožef Stefan (Author) ID Koroušić-Seljak, Barbara, Institut Jožef Stefan (Author)
Files:	URL - Source URL, visit https://www.sciencedirect.com/science/article/pii/S0957417423020092?via%3Dihub PDF - Presentation file, download (3,27 MB) MD5: 590CD5C276E2F45AADC6BCF60EF204BB
Language:	English
Typology:	1.01 - Original Scientific Article
Organization:	IJS - Jožef Stefan Institute
Abstract:	In this study, we estimate the generalization of the performance of previously proposed predictive models for nutrient value prediction across different recipe datasets. For this purpose, we introduce a quantitative indicator that determines the level of generalization of using the developed predictive model for new unseen data not presented in the training process. On a predefined corpus of recipe embeddings from six publicly available recipe datasets (i.e., projecting them in the same meta-feature vector space), we train predictive models on one of the six recipe datasets and test the models on the rest of the datasets. In parallel, we define and calculate generalizability indexes which are numbers that indicate how generalizable a predictive model is i.e., how well will a predictive model learned on one dataset perform on another one not involved in the training. The evaluation results prove the validity of these indexes – their relation with the accuracy of the predictions. Further, we define three sampling techniques for selecting representative data instances that will cover all parts from the feature space uniformly (involving data from all datasets) and further will improve the generalization of a predictive model. We train predictive models with these generalized datasets and test them on instances from the six recipe datasets that are not selected and included in the generalized datasets. The results from the evaluation of these predictive models show improvement compared to the results from the predictive models trained on one recipe dataset and tested on the others separately.
Keywords:	ML pipeline, predictive modeling, nutrient prediction, recipe datasets
Publication status:	Published
Publication version:	Version of Record
Submitted for review:	18.08.2023
Article acceptance date:	06.09.2023
Publication date:	16.09.2023
Publisher:	Elsevier
Year of publishing:	2023
Number of pages:	str. 1-40
Numbering:	Vol. , article no. 121507
Source:	Nizozemska
PID:	20.500.12556/DiRROS-17082
UDC:	004.8
ISSN on article:	1873-6793
DOI:	10.1016/j.eswa.2023.121507
COBISS.SI-ID:	165116419
Copyright:	© 2023 The Authors. Published by Elsevier Ltd.
Note:	Nasl. z nasl. zaslona; Soavtorji: Tome Eftimov, Sašo Džeroski, Barbara Koroušić Seljak; Opis vira z dne 20. 9. 2023;
Publication date in DiRROS:	25.09.2023
Views:	1497
Downloads:	913
Metadata:
:	Copy citation

Share:

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:	Expert systems with applications
Publisher:	Elsevier
ISSN:	1873-6793
COBISS.SI-ID:	23001861

Document is financed by a project

Funder:	ARRS - Slovenian Research Agency
Project number:	P2-0098
Name:	Računalniške strukture in sistemi

Funder:	ARRS - Slovenian Research Agency
Project number:	P2-0103
Name:	Tehnologije znanja

Funder:	EC - European Commission
Funding programme:	H2020
Project number:	101005259
Name:	Communities on Food Consumer Science
Acronym:	COMFOCUS

Licences

License:	CC BY-NC-ND 4.0, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Link:	http://creativecommons.org/licenses/by-nc-nd/4.0/
Description:	The most restrictive Creative Commons license. This only allows people to download and share the work for no commercial gain and for no other purposes.
Licensing start date:	16.09.2023

Secondary language

Language:	Slovenian
Title:	MsGEN: measuring generalization of nutrient value prediction across different recipe datasets
Keywords:	strojno učenje

Back

Show document A+ | A- | | SLO | ENG

Record is a part of a journal

Document is financed by a project

Licences

Secondary language

Show document
A+ | A- | | SLO | ENG