Traditional Machine Learning-Based Classification of Cashew Kernels Using Colour Features


Creative Commons License

Baitu G. P., Gadalla O. A. A., Öztekin Y. B.

JOURNAL OF TEKIRDAG AGRICULTURE FACULTY-TEKIRDAG ZIRAAT FAKULTESI DERGISI, cilt.20, sa.1, ss.115-124, 2023 (ESCI) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 20 Sayı: 1
  • Basım Tarihi: 2023
  • Doi Numarası: 10.33462/jotaf.1100782
  • Dergi Adı: JOURNAL OF TEKIRDAG AGRICULTURE FACULTY-TEKIRDAG ZIRAAT FAKULTESI DERGISI
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.115-124
  • Anahtar Kelimeler: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine K-Nearest Neighbour, Cashews
  • Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

Cashew is one of the major commercial commodities contributing to the national economy of Tanzania as foreign revenue. And yet still the processing of cashew is run locally using manual labour for a big part. If processed well under ideal conditions, cashews kernels are expected to be white in colour. But due to various factors like prolonged roasting in the steam chambers or over-drying, some cashew kernels tend to have a slight brown colour, and these are referred to as scorched cashews. Despite sharing the same characteristics with white cashew kernels, including nutritional quality, these cashew kernels are supposed to be graded differently. In many places around the world, particularly in Tanzania, the sorting and grading process of cashew kernels is performed by hand. In international trade, cashew grading is very important and this means more effective and consistent methods need to be applied in this stage of production in order to increase the quality of the products. The objective of this study was to evaluate the use of traditional Machine Learning techniques in the classification of cashew kernels as white or scorched by using colour features. In this experiment, various colour features were extracted from the images. The extracted features include the means (mu), standard deviations (sigma), and skewness (gamma) of the channels in RGB and HSV colour spaces. The relevant features for this classification problem were selected by applying the wrapper approach using the Boruta Library in Python, and the irrelevant ones were removed. 5 models are studied and their efficiencies analysed. The studied models are Logistic Regression, Decision Tree, Random Forest, Support Vector Machine and K-Nearest Neighbour. The Decision Tree model recorded the least accuracy of 98.4%. The maximum accuracy of 99.8% was obtained in the Random Forest model with 100 trees. Due to simplicity in application and high accuracy, the Random Forest is recommended as the best model from this study.