Fake news detection through LLM-driven text augmentation across media and languagesSittar, Abdul (Avtor) Smiljanić, Mateja (Avtor) Guček, Alenka (Avtor) Grobelnik, Marko (Avtor) fake news detectionlow-resource languagesdata imbalancesynthetic data generationprompt engineeringstyle-based featuressemantic featuresThe proliferation of fake news across social media, headlines, and news articles poses major challenges for automated detection, particularly in multilingual and cross-media settings affected by data imbalance. We propose a fake news detection framework based on LLM-driven, feature-guided text augmentation. The method generates realistic synthetic samples across languages, media types, and text granularities while preserving mean ing and stylistic coherence. Experiments with classical and transformer-based models (Random Forest, Logistic Regression, BERT, XLM-R) across social media, headlines, and multilingual news datasets show consistent improvements in performance. For inherently balanced datasets (e.g., social media), synthetic augmentation yields negligible but stable performance changes. Across imbalanced scenarios, synthetic augmentation substantially improves minority-class recall and F1-score (e.g., fake news recall from 0.57 to 0.86), while preserving majority-class performance, leading to more balanced and reliable classifiers, whereas oversampling significantly degrades results due to overfitting on duplicated language patterns. Overall, a hybrid semantic- and style-based model proves to be the most robust strategy, outperforming oversampling and matching or exceeding baseline performance across datasetsMDPI20262026-04-28 13:42:06Neznano29227UDK: 004.8ISSN pri članku: 2504-4990DOI: 10.3390/make8040103COBISS_ID: 276627715Švicasl© 2026 by the authors.