Feature Selection in Text Classification


Şahin D. Ö., Ateş N., Kılıç E.

24th Signal Processing and Communication Application Conference (SIU), Zonguldak, Türkiye, 16 - 19 Mayıs 2016, ss.1777-1780 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu.2016.7496105
  • Basıldığı Şehir: Zonguldak
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.1777-1780
  • Anahtar Kelimeler: text classification, feature selection, term weighting
  • Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

In recent years, text classification have been widely used. Dimension of text data has increased more and more. Working of almost all classification algorithms is directly related to dimension. In high dimension data set, working of classification algorithms both takes time and occurs over fitting problem. So feature selection is crucial for machine learning techniques. In this study, frequently used feature selection metrics Chi Square (CHI), Information Gain (IG) and Odds Ratio (OR) have been applied. At the same time the method Relevancy Frequency (RF) proposed as term weighting method has been used as feature selection method in this study. It is used for tf.idf term as weighting method, Sequential Minimal Optimization (SMO) and Naive Bayes (NB) in the classification algorithm. Experimental results show that RF gives successful results.