Dataset for the article "Cytogenotoxic effects of polycyclic aromatic hydrocarbons complex mixture in human peripheral blood, lung A549 and liver HepG2 Cells: Translation of a real-scenario exposure to in vitro" This (ReadMe) file was created on 2026-02-13 by Matjaž Novak and Luka Kazensky ------------------- GENERAL INFORMATION ------------------- Author/leading researcher details Name and surname: Luka Kazensky ORCID: 0009-0007-9685-4279 Institution: Division of Toxicology, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: lkazensky@imi.hr Role in this dataset: data collector Name and surname: Marija Jelena Lovrić Štefiček ORCID: 0000-0002-3440-9168 Institution: Division of Environmental Hygiene, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: mlovric@imi.hr Role in this dataset: data collector Name and surname: Vilena Kašuba ORCID: 0000-0002-2151-0400 Institution: Division of Toxicology, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: vkasuba@imi.hr Role in this dataset: data collector Name and surname: Matjaž Novak ORCID: 0000-0002-5713-2552 SICRIS ID: 34200 Institution: Department of Genetic Toxicology and Cancer Biology, National Institute of Biology Address: 1000 Ljubljana, Slovenia Email: matjaz.novak@nib.si Role in this dataset: data collector Name and surname: Karolina Belingar ORCID: 0009-0001-2120-4438 SICRIS ID: 58563 Institution: Department of Genetic Toxicology and Cancer Biology, National Institute of Biology Address: 1000 Ljubljana, Slovenia Email: karolina.belingar@nib.si Role in this dataset: data collector Name and surname: Katarina Matković ORCID: 0000-0001-5341-8855 Institution: Division of Toxicology, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: kmatkovic@imi.hr Role in this dataset: data collector Name and surname: Marko Gerić ORCID: 0000-0002-5886-4106 Institution: Division of Toxicology, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: mgeric@imi.hr Role in this dataset: data collector Name and surname: Jasmina Rinkovec ORCID: 0000-0002-0378-4774 Institution: Division of Environmental Hygiene, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: jrinkovec@imi.hr Role in this dataset: data collector Name and surname: Ivana Jakovljević ORCID: 0000-0002-0556-0088 Institution: Division of Environmental Hygiene, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: ijakovljevic@imi.hr Role in this dataset: data collector Name and surname: Katarina Baralić ORCID: 0000-0003-3290-2204 Institution: Department of Toxicology “Akademik Danilo Soldatović”, Faculty of Pharmacy, University of Belgrade Address: 11000 Belgrade, Serbia Email: katarina.baralic@pharmacy.bg.ac.rs Role in this dataset: data collector Name and surname: Danijela Đukić-Ćosić ORCID: 0000-0003-1618-9154 Institution: Department of Toxicology “Akademik Danilo Soldatović”, Faculty of Pharmacy, University of Belgrade Address: 11000 Belgrade, Serbia Email: danijela.djukic.cosic@pharmacy.bg.ac.rs Role in this dataset: data collector Name and surname: Mirta Milić ORCID: 0000-0002-9837-7185 Institution: Division of Toxicology, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: mmilic@imi.hr Role in this dataset: data collector Name and surname: Želimir Jelčić ORCID: 0000-0001-8973-9124 Institution: TAPI R&D, PLIVA Croatia Ltd. Address: 10000 Zagreb, Croatia Email: zelimir.jelcic@zg.ht.hr Role in this dataset: data collector Name and surname: Gordana Pehnec ORCID: 0000-0001-5155-1847 Institution: Division of Environmental Hygiene, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: gpehnec@imi.hr Role in this dataset: data collector Name and surname: Bojana Žegura ORCID: 0000-0002-5731-0785 SICRIS ID: 20767 Institution: Department of Genetic Toxicology and Cancer Biology, National Institute of Biology, Biotechnical Faculty, University of Ljubljana Address: 1000 Ljubljana, Slovenia Email: bojana.zegura@nib.si Role in this dataset: principal investigator Name and surname: Goran Gajski ORCID: 0000-0002-1886-1453 Institution: Division of Toxicology, Institute for Medical Research and Occupational Health Address: 10000 Zagreb, Croatia Email: ggajski@imi.hr Role in this dataset: principal investigator Date of data collection: 2023-03-01 to 2025-12-11 Geographical location of data collection: Zagreb, Croatia (field work, laboratory, and computational analysis), Belgrade, Serbia (computational analysis), Ljubljana, Slovenia (laboratory) Information on the funders/programmes/projects that made the data collection possible: European Union’s Horizon Europe research and innovation program (EDIAQI project #101057497), the Croatian Science Foundation (HUMNap project #1192), the Foundation of the Croatian Academy for Science and Arts, the European Regional Development Fund project KK.01.1.1.01.0007 "Research and Education Centre of Environmental Health and Radiation Protection," the European Union – Next Generation EU (#533-03-23-0006, BioMolTox and EnvironPollutHealth), the Ministry of Science, Technological Development and Innovation, Republic of Serbia (#451-03-136/2025-03/200161), the Slovenian Research Agency (#P1-0245), and the bilateral collaborations between the Republic of 1039 Croatia and the Republic of Slovenia (MULTIap project #BI-HR/25-27-014). ----------------------------- SHARING/ACCESSING INFORMATION ----------------------------- Data licences/restrictions: CC BY-NC 4.0 Links to publications that cite or use the data: NA (publication in review) Links to other publicly available data sites: http://ctdbase.org - Comparative Toxicogenomics Database (CTD) was used to identify gene/protein biomarkers and explore how multiple chemicals simultaneously influence gene expression and protein activity. ---------------------- VIEWING DATA AND FILES ---------------------- File list: - Supplementary_Table_1.doc - Description: Concentrations of individual PAHs ng/mL) in stock solutions used to model low (1 hour), medium (8 hours), and high (16 hours) indoor exposure scenarios for in vitro cytogenotoxicity testing - Supplementary_Table_2.xls - Description: Results of in silico toxicogenomic data analysis - Supplementary_data_11PAHs.pdf - Description: Quantitative comet assay image analysis Additional related data collected that were not included in this dataset: NA Versioning history of data: v1.0 -------------------------- METHODOLOGICAL INFORMATION -------------------------- Supplementary_Table_1.doc - The low and medium exposure concentrations corresponded to normal activity, while the high exposure concentration represented a combination of resting and normal activity conditions. Exposure periods were calculated based on minute ventilation values (L/min), derived from tidal volume (L/breath) and respiratory rate (breaths/min) under rest and normal activity conditions (Pleil et al., 2021). These calculations provided estimates of the total volume of air exchanged by the lungs during inhalation and exhalation over 1-, 8-, and 16-hour intervals, with minute-to-hour conversions applied (Table 1). This approach yielded total inhaled volumes of 960 L for the low (1-hour normal activity), 7680 L for the medium (8- hour normal activity), and 10,560 L for the high exposure scenario, the latter accounting for 8 hours at rest and 8 hours of normal activity. Since PAH concentrations are expressed in nanograms per cubic meter (ng/m3), the volumes were converted from cubic meters to liters,a nd subsequently to milliliters, to enable accurate preparation 190 of the stock solutions. Supplementary_Table_2.xls - All extracted genes associated with the analyzed PAH. GeneMANIA (https://genemania.org) was used to characterize the relationships within the obtained gene set. The ToppGene Suite (https://toppgene.cchmc.org),particularly its ToppFun tool, was employed for functional enrichment analysis. The investigated PAHs were analyzed through several steps: extracting Gene–PAH interaction data from CTD; identifying shared genes using the CTD MyVenn tool (excluding PAHs with fewer than five associated genes from defining the common set, though they were included later in the cumulative analysis); assessing gene–gene interactions with GeneMANIA; and performing functional enrichment with ToppGene Suite to identify key molecular functions, biological processes, pathways, and disease associations. After defining the common gene set, all genes interacting with the PAHs were compiled into a cumulative set with duplicates removed. Evaluating both the shared and cumulative genes enabled identification of common toxicity mechanisms, potentially reflecting additive or synergistic effects, as well as additional PAH-specific pathways relevant to their broader molecular impact. GeneMania network analysis could not be performed for this full set due to the GeneMANIA 3000-gene input limit. Supplementary_data_11PAHs.pdf - Comet assay images derived from measurements in PBCs were analyzed in ImageJ (NIH) using the FracLac plugin and custom Mathematica scripts to extract morphometric descriptors. Fractal and multifractal spectra were computed by the box-counting method across Q = −10 to 10 (step 0.1), yielding generalized dimensions DQ and singularity spectra f(α). Shape parameters (Area, Perimeter, Circularity, Eccentricity) and gray-level co-occurrence matrix (GLCM) features (Entropy, ASM, IDM, Contrast) were derived for each image. Data were aggregated per treatment (low, medium, high) and time point (4 and 24 hours). This exploratory analysis complemented standard comet endpoints and was not used for statistical decision-making. As an exploratory approach, comet images were also represented using a graph-based framework, where comet pixels or defined components are treated as connected nodes. This allows the use of simple topological descriptors and concepts from persistent homology to examine overall organization and structure within the comet image. Methods used to collect/obtain the data: Included in methodological information Data processing methods: Software information: NA