Local resampling for locally weighted Naive Bayes in imbalanced data


SAĞLAM F., Cengiz M. A.

COMPUTING, cilt.106, sa.1, ss.185-200, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 106 Sayı: 1
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1007/s00607-023-01219-0
  • Dergi Adı: COMPUTING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Sayfa Sayıları: ss.185-200
  • Anahtar Kelimeler: Class imbalance, Locally weighted learning, Oversampling, Resampling, SMOTE, Undersampling
  • Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

Locally Weighted Naive Nayes (LWNB) method establishes a weighted Naive Bayes model in different neighborhoods of each query point. LWNB, like other classification methods, is affected by class imbalance. The class imbalance problem is the case where the class variable has a skewed distribution and causes the classification algorithms to be biased towards the majority class. It is possible to overcome this problem with resampling approaches such as undersampling and oversampling. Resampling on the data set may not reflect correctly on local regions, since regions are assumed to be independent of outside. Therefore, local regions should be considered without outside interference. In this study, we proposed a novel resampling approach that is applicable for both undersampling and oversampling. We examined how the imbalance of the data set should be reflected in each local region and aimed to prevent the imbalance problem by resampling data in the local regions separately. In this method, we calculated the appropriate resampling rate and the number of neighbors for each local region based on the data imbalance rate and the resampling rate which can be decided by the researcher. The proposed approach was compared with the classical resampling approaches on 25 datasets that are frequently used in the literature and achieved promising results.