<?xml version="1.0"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title>Fake news detection through LLM-driven text augmentation across media and languages</dc:title><dc:creator>Sittar,	Abdul	(Avtor)
	</dc:creator><dc:creator>Smiljanić,	Mateja	(Avtor)
	</dc:creator><dc:creator>Guček,	Alenka	(Avtor)
	</dc:creator><dc:creator>Grobelnik,	Marko	(Avtor)
	</dc:creator><dc:subject>fake news detection</dc:subject><dc:subject>low-resource languages</dc:subject><dc:subject>data imbalance</dc:subject><dc:subject>synthetic data generation</dc:subject><dc:subject>prompt engineering</dc:subject><dc:subject>style-based features</dc:subject><dc:subject>semantic features</dc:subject><dc:description>The proliferation of fake news across social media, headlines, and news articles poses major challenges for automated detection, particularly in multilingual and cross-media settings affected by data imbalance. We propose a fake news detection framework based on LLM-driven, feature-guided text augmentation. The method generates realistic synthetic samples across languages, media types, and text granularities while preserving mean ing and stylistic coherence. Experiments with classical and transformer-based models (Random Forest, Logistic Regression, BERT, XLM-R) across social media, headlines, and multilingual news datasets show consistent improvements in performance. For inherently balanced datasets (e.g., social media), synthetic augmentation yields negligible but stable performance changes. Across imbalanced scenarios, synthetic augmentation substantially improves minority-class recall and F1-score (e.g., fake news recall from 0.57 to 0.86), while preserving majority-class performance, leading to more balanced and reliable classifiers, whereas oversampling significantly degrades results due to overfitting on duplicated language patterns. Overall, a hybrid semantic- and style-based model proves to be the most robust strategy, outperforming oversampling and matching or exceeding baseline performance across datasets</dc:description><dc:publisher>MDPI</dc:publisher><dc:date>2026</dc:date><dc:date>2026-04-28 13:42:06</dc:date><dc:type>Neznano</dc:type><dc:identifier>29227</dc:identifier><dc:identifier>UDK: 004.8</dc:identifier><dc:identifier>ISSN pri članku: 2504-4990</dc:identifier><dc:identifier>DOI: 10.3390/make8040103</dc:identifier><dc:identifier>COBISS_ID: 276627715</dc:identifier><dc:source>Švica</dc:source><dc:language>sl</dc:language><dc:rights>© 2026 by the authors.</dc:rights></metadata>
