Optimization Based Undersampling for Imbalanced Classes Dengesiz Sınıflamada Optimizasyona Dayalı Azörnekleme


Creative Commons License

SAĞLAM F., Sözen M., Cengiz M. A.

Adiyaman University Journal of Science, cilt.11, sa.2, ss.385-409, 2021 (Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 11 Sayı: 2
  • Basım Tarihi: 2021
  • Doi Numarası: 10.37094/adyujsci.884120
  • Dergi Adı: Adiyaman University Journal of Science
  • Derginin Tarandığı İndeksler: Scopus, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.385-409
  • Anahtar Kelimeler: Classification, Imbalanced classes, Optimization, Undersampling
  • Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

The classification methods consider the probability of predicting the majority class to be high when the number of class observations is different. To address this problem, there are some methods such as resampling methods in the literature. Undersampling, one of the resampling methods, creates balance by removing data from the majority class. This study aims to compare different optimization methods to determine the most suitable observations to be taken from the majority class while undersampling. Firstly, a simple simulation study was conducted and graphs were used to analyze the discrepancy between the resampled datasets. Then, different classifier models were constructed for different imbalanced data sets. In these models, random undersampling, undersampling with genetic algorithm, undersampling with differential evolution algorithm, undersampling with an artificial bee colony, and under-sampling with particle herd optimization were compared. The results were given rank numbers differing depending on the classifiers and data sets and a general mean rank was obtained. As a result, when undersampling, artificial bee colony was seen to perform better than other methods of optimization.