A review of machine learning applications in Cacao Post-harvest management

Cacao beans play a crucial role in chocolate production, making them a highly relevant crop. The post-harvesting stage of cacao beans, encompassing classification, quality assessment, and fermentation, holds significant importance. With the increasing demand for marketable cacao beans, the need for reliable, accurate, and fast technologies has emerged. This study provides a comprehensive review of machine learning techniques in the post-harvesting stage of cacao beans from 2016 to the present. Analyzing 36 studies, it focuses on classification, quality assessment, and fermentation. The proposed framework includes the application domain, learning algorithms, performance metrics, and reported impacts. Notably, it explores various machine learning applications like classification, quality assessment, and fermentation, highlighting commonly used algorithms like ANN, CNN, and SVM. In terms of performance metrics, GLCM achieved the highest accuracy (99.61%) in cacao classification, ANFIS excelled in quality assessment (99.715%), and k-NN emerged as the most accurate for fermentation. This review serves as a valuable resource for researchers in the cacao bean sector, offering insights into machine learning advancements.


Introduction
Cacao (Theobroma cacao) is one of the most globally cultivated and renowned food crops.Commonly known as the main ingredient for chocolate, cacao yields various well-known products such as tableya, cocoa powder, cocoa liquor, cocoa mass, and other confectionery products.As time passed, cacao also found its place in the cosmetics and pharmaceutical industries, continuously having increasing demands in domestic and global markets.
The paper offers a comprehensive examination of machine learning applications in cacao post-harvest cultivation, focusing on fermentation, quality assessment, and fruit classification.It explores research findings, trends, and the performance of various ML techniques and algorithms.This paper mainly aims to (1) provide a practical assessment of the current trends and development in the utilization of machine learning in cacao post-harvest management, (2) to investigate and evaluate the effectiveness of wide range of machine learning algorithms in this specific domain, and (3) to determine the most effective and efficient machine learning model algorithm for each cacao's post-harvest management field: fruit classification, quality assessment, and fermentation.

Methodology
This review systematically assesses the utilization of machine learning in cacao's post-harvest management.To ensure the validity of the findings, researchers explored various academic online databases and journals relevant to the chosen academe, such as IEEE Explore, Elsevier, PubMed, ResearchGate, Google Scholar, and other agricultural and technological journals.The use of related keywords such as "cacao post-harvest", "machine learning", "cacao bean quality assessment", and "cacao fermentation", together with operators, were capitalized to refine the search findings.Every search finding was thoroughly reviewed according to the inclusion and exclusion criteria to ensure that only necessary and relevant papers would be evaluated.The researchers selected peer-reviewed articles, conference papers, and reviews published between 2016 and the present, focusing on applying machine learning (ML) in cacao postharvest management.The chosen studies were categorized into three post-harvest fields in cacao production: fruit classification, quality assessment, and fermentation.Upon having the final list of included papers, manual and systematic examination was conducted to extract the necessary data and information such as the title, author, methodology, utilized ML algorithms, and performance metric results.This review addresses thirty-six papers deemed pertinent and sourced from various databases.

Individual Summary of selected publications by application area
This section provides a synopsis of the 36 selected literature under the 3 ML-application fields in cacao post-harvest management production.Table 1 summarizes the list of applications and the corresponding studies for each.

Classification of cacao pods
Accurate selection of cocoa beans is crucial for product quality [1].However, manual classification during harvest can lead to pod contamination, contributing to a decline in cacao production [2].Additionally, cross-pollination poses challenges in identifying cocoa varieties, impacting quality control and pricing [3].The colour of pods serves as a ripening indicator linked to cocoa bean quality [4].Adopting advanced technology can address these challenges and enhance the efficiency of cocoa production.
[1] proposed a CNN-based model for classifying healthy and diseased cacao pods and plants.The study used a labelled dataset of 312 images consisting of healthy, Monilia-damaged, and Phytophthora-damaged cacao.The YOLOv5m model was employed to classify healthy beans from unhealthy ones using phone images.
[2] employed an Artificial Neural Network (ANN) and Support Vector Machine (SVM) to distinguish healthy and unhealthy cacao beans without harvesting.The study utilized colour histograms and local binary pattern features, with the ANN classifier proving more accurate than SVM.The combination of CH and LBP, image processing techniques, and a large dataset enhanced the efficiency and reliability of bean segregation.[5] propose a technique to determine the maturity of cacao pods through the sound they produce when tapped in different locations.Sounds were transformed into Mel spectrograms and classified as ripe or unripe using a trained CNN model.The CNN model outperformed other machine learning algorithms like SVM, KNN, and Decision trees in the study.[6] focused on the UF-18 cacao variety, renowned for its prolonged lifespan and distinctive orange-yellow colour when ripe.They classified bean maturity based on pod physical characteristics such as colour, shape, and texture using Matlab and algorithms like OTSU, K-Means, and SVM, with SVM demonstrating excellent reliability.The study also incorporated sound analysis, tapping cacao beans four times to determine their maturity.
[7] present a method of detecting and segmenting cacao pod infection through visualization.The SVM model segregated healthy and infected fruits using four clusters and a quadratic kernel, while the K-means model clustered pixels based on their color in the Lab color space.This method can improve the inspection process for farmers and agricultural technicians.
[3] developed a non-invasive cacao pod classifier using a Residual Network (ResNet) architecture based on Convolutional Neural Network (CNN).The study aimed to identify specific cacao varieties using pod images-Criollo, W-10, BR-25, and unknown.The ResNet model, trained using ImageAI, outperformed other CNN models, including VGG16, InceptionV3, and DenseNet.
[8] investigated cacao pod maturity classification using CocoaMFDB and various tools under different lighting conditions.Preprocessing techniques and color space analysis were applied, and feature extraction involved GLCM and MobileNet.The chi-square distance achieved the highest accuracy, emphasizing its impact on maturity determination.
[4] revealed ANN's usefulness in identifying the cacao pods' maturity.The model utilized image processing, and the algorithm was composed of RGB images of criollo cocoa pods with an amelonada shape, which could change its colour from green to yellow when the fruit was ripe.The process consisted of four steps: colour enhancement, segmentation, feature extraction and ANN classification.The suggested model yields much lower accuracy compared to similar previous studies.
[9] employ deep learning classifiers to swiftly identify healthy cacao pods and those affected by pests, enabling prompt responses to prevent losses.The study categorized cacao pod images into three conditions: healthy, pest attack, and black pod disease attack.Model training utilized k-fold cross-validation, and the proposed CNN model achieved the highest accuracy on the test set compared to MobileNetV3Small and NASNetMobile.
Another study by [10] utilizes AI and deep learning algorithms, specially the CNN model, to address the spread of infection to cacao pods.Severity of pods infection is classified into three categories using transfer learning from a pretrained MobileNetV3Small architecture.The model's performance was compared to EfficientNetB0 and NASNetMobile, with the proposed CNN model outperforming the others.

Quality Assessment
The four main characteristics of cocoa beans are size, shape, colour, and texture.Cacao quality evaluation traditionally relies on manual inspection of those four characteristics.This approach is subjective, labor-intensive, and prone to human error.It is not suitable for large quantities of beans [11].The constraints of manual inspection are being overcome by the increasing use of machine vision and automation technology, supported by artificial intelligence, to assess the quality of agricultural and food commodities.[12] proposed MELS-SVM, an improved SVM model using the ensemble method.The model uses the least square approach to obtain hyperplanes and the One-Against-All method for multiclass procedures.The model extracts morphological parameters from cocoa bean images.
[13] developed an artificial neural network (ANN) to classify cocoa beans into good, fair, and bad based on attributes like slate, mould, free fatty acid, foreign matter, and bean count.The Multilayer Perceptron (MLP) model uses backpropagation for training and WEKA as a tool for implementing the algorithm on the cocoa bean data set, ensuring accurate classification of beans.
[14] propose a classifier system for the quality assessment of cocoa beans using image processing and a Backpropagation Neural Network Algorithm.It utilizes PCA for feature extraction and backpropagation for classification based on texture and color.The system aims to help farmers and traders reduce losses from manual selection.
[15] employed a supervised machine-learning method to classify cocoa beans into four categories: slate, purple, violet, and good.Supervised machine learning is a type of artificial intelligence that learns from labelled data and makes predictions based on the learned patterns.
[16] developed a method to classify cocoa beans into seven classes using GLCM, CNN feature extraction methods, and various classifiers.GLCM performed better and more reliable than CNN for classification.SVM and XGBoost classifiers were applied to the extracted features, showing higher accuracy than AdaBoost.GLCM features outperformed CNN features with both classifiers.
[17] implemented an automated method for cacao bean grading classification using image processing and an adaptive neuro-fuzzy inference system (ANFIS).The method involves defect identification and classification using CNN techniques, followed by grade level determination using ANFIS techniques that combine artificial neural networks and fuzzy logic systems.[18] proposed a new method for classifying cocoa beans' pulp using computer vision techniques like k-means clustering, morphological operations, a bag of visual words, and a support vector machine classifier.This method aids in removing the pulp and efficiently estimates the quality of cocoa beans, dividing them into excellent and poor-quality classes.
[19] employ a decision tree-supervised machine learning approach to classify cocoa beans into four quality categories: good, slatty, mouldy, and weevilly.They capture digital images of cut cocoa beans, process and segment them, extract colour, texture, and statistical features, and store them in a feature database.They use the Classification and Regression Tree (CART) algorithm for classification.[11] developed four machine-learning algorithms to categorize cocoa beans into four groups: large, medium, small, and rejected.Four machine learning algorithms (KNN, SVM, Decision Tree, Random Forest) were developed to categorize cocoa beans based on size.Image processing techniques were used to extract features, and redundant features were eliminated.
[20] suggested using CNN and PCA techniques to reduce the dimensionality and complexity of image data.CNN generated feature maps, while PCA transformed variables into uncorrelated components.SVM was employed as the classifier for cocoa bean categorization, with different kernels and parameters tested.
[21] proposed a cacao bean classifier employing a computer vision system that categorizes cacao beans based on their external physical quality: good, broken, clumped, flat, or mouldy.They use image processing and pre-trained CNN to extract and analyse the features from the datasets.The method consists of image acquisition, preprocessing, feature extraction, and classification.
[22] developed a computer vision system to classify cocoa beans into healthy and diseased ones using a dataset from Ecuador.They used YOLOv5 for image identification and preprocessing, focusing on the beans using interval analysis and mathematical morphology techniques.[23] proposes two methods for implementing Machine Learning techniques in intelligent farming.The first assesses agricultural product quality using a feature classification framework, comparing methods like GLCM.The second method uses temporal data classification to monitor farmers' activities, employing a quaternion representation and HTM algorithm to predict and detect accidents or abnormal situations.
[24] developed a machine-learning model using enhanced image feature extraction and a regularized ANN model to classify cocoa beans into 14 grades based on extracted features.They used cut testing to visually inspect interior characteristics, such as colour, compartmentalization, and defects.They combined colour histograms and grey-level cooccurrence matrix (GLCM) features to represent the beans' colour and texture information.
[25] implemented convolutional neural networks (CNNs) to automate the cut-test process and predict cocoa bean quality scores based on colour and surface characteristics.The ResNet V2 152-layer model and data augmentation techniques were employed to enhance the CNN models' performance.

Fermentation
Cacao bean's fermentation is a crucial stage of its post-harvest management as it influences the development of the flavour [26], aroma, and processing properties [27] influencing its quality and price [28].The standard method for classifying cacao beans' fermentation degree relies on manual estimation systems and demands specific expertise, which is prone to the inconsistency of prediction [29].Also, some analytical equipment and methods were expensive for farmer groups.With this, the utilization of technology for swift and precise measurement of fermentation level motivated numerous scientists to develop a predictive model [28] [30] used hyperspectral images and spectral classification techniques to study cocoa bean fermentation stages, identifying three categories: slightly, correctly, and highly fermented A Support Vector Machine (SVM) accurately classified beans based on spectral signatures extracted using a superpixel segmentation algorithm.The study also analyzed bean uniformity and the impact of training sample numbers on classification performance.
[29] conducted an initial investigation into a computer vision and artificial intelligence-based system for measuring the fermentation level of cocoa beans.Using image analysis and color features, an Artificial Neural Network Multilayer Perceptron achieved high accuracy in classifying fermented and unfermented beans.This method provides a rapid and precise alternative to the traditional cut test for assessing cocoa bean quality.
[27] employed an electronic nose system with nine gas sensors to identify the fermentation level of cocoa beans.Six machine learning methods were employed to process sensor responses and classify fermentation time.The bootstrap forest algorithm achieved the highest accuracy with the lowest misclassification rate.Half of the algorithms successfully segregated cocoa beans based on fermentation level, while the other half were unsuccessful.[28] proposed utilizing technology to measure the cacao bean fermentation rate precisely, focusing on forecasting pH and the fermentation index.Color features and a PLSR model demonstrated high accuracy in predicting both pH and the fermentation index, indicating their effectiveness as visual indicators for assessing cocoa bean fermentation.
[31] utilized machine vision and multiclass SVM classifiers for evaluating cocoa bean fermentation levels in quality control.Colour features were extracted from RGB, HSV, and YCbCr colour spaces to characterize fermentation.The proposed method achieved higher accuracy than the conventional cut-test method, demonstrating improved efficiency and reliable classification of cocoa beans based on fermentation level.
[32] used the k-Nearest Neighbors algorithm to classify cacao beans into three fermentation levels based on photograph colour features.Accuracy was assessed through function tests and statistical analysis, including a confusion matrix.The research highlights the potential of devices and machine learning to enhance cacao quality and value, reducing risks for farmers and traders.Further exploration is recommended, such as incorporating additional features and datasets, as well as exploring alternative classification algorithms for device improvement.
[33] used various computer vision methods to gauge the fermentation level of the cocoa beans.Four methods, including vector quantification, K-means, Fuzzy grouping, and average displacement, were compared and evaluated for training and classification.The dataset consisted of images taken during the fermentation process, and a reference pattern was developed using inputs from sensory profiles and cocoa testers.[34] explores categorizing the degree of roasting of cocoa beans using smartphone camera images and an extreme learning machine (ELM) algorithm.Color features from RGB and CIELAB color spaces, as well as texture features from the grey-level co-occurrence matrix (GLCM), are used to generate a 22-dimensional feature vector for each region of interest (ROI) in the images.[35] propose a novel method using unsupervised (K-means clustering) and supervised (SVM) machine learning to classify cocoa beans as well-fermented or over-fermented based on spectral signatures.This approach streamlines quality assessment, improves efficiency, and reduces costs without sample destruction.[36] demonstrated continuous monitoring of volatile compounds during cocoa refining using a gas sensor system.Trained Kernel Distribution Models (KDMs) characterized volatile profiles, identifying deviations in processing variables.An electronic nose (E-nose) tracked compounds validated by comparison with gas chromatography-mass spectrometry (GC-MS).
[26] achieved high classification rates for cocoa bean quality using a calorimetric sensor e-nose and near-infrared chemo-intermediary-dyes spectra technique.SVM and ELM algorithms yielded the highest classification rates, demonstrating the system's potential for nondestructive and in situ classification of cocoa bean fermentation levels.Contrastingly, some algorithms were more commonly used.Support Vector Machines (SVM) were applied in four studies, making it prominent, while Artificial Neural Network (ANN) featured in five studies, demonstrating versatility.Convolutional Neural Network (CNN) was utilized in three studies, mainly for image-related tasks.Random Forest appeared in two studies, and K-Nearest Neighbors (k-NN) in one study.

Machine Learning Algorithms
The accompanying pie chart visually depicts the algorithm distribution, emphasizing both the variety of approaches and the higher prevalence of certain algorithms like SVM, ANN, CNN, Random Forest, and k-NN.This suggests that while researchers explore a diverse set of ML algorithms, certain methodologies have gained popularity and are more frequently adopted.Artificial Neural Network (ANN) stands out as the most commonly used algorithm, appearing in five studies.Its popularity is attributed to its capacity to model complex relationships and its versatility in handling diverse data types, making it effective for classification tasks.

Distribution of ML Algorithms used in each studies
Support Vector Machines (SVM) is the second most frequently employed algorithm, featured in four studies.Its ability to handle both linear and non-linear classification tasks, coupled with effectiveness in managing high-dimensional data, positions it as a preferred choice for classification and image processing tasks.
Convolutional Neural Network (CNN) is the third most commonly used algorithm, appearing in three studies.Known for its unique architecture designed for grid-like data like images, CNN has become the go-to choice for image classification tasks.Its capability to capture local spatial dependencies within images has contributed to its success in various computer vision applications.
The inclusion of these three algorithms in Table 2 underscores their prominence and widespread adoption in the research community.ANN, SVM, and CNN have consistently demonstrated their efficacy and versatility, making them go-to solutions for addressing diverse machine learning challenges.Examining the data in Table 5, the k-NN algorithm emerges as the most accurate for the fermentation application, achieving a remarkable accuracy score of 99.4% on a balanced dataset.This algorithm is a simple and effective instancebased learning algorithm that classifies new instances based on their similarity to the training data.The balanced dataset contributed to the high accuracy of k-NN, where the number of instances in each class is approximately equal.This can help to avoid bias towards the majority class and improve the performance of the algorithm.

Performance Metrics
Using a variety of machine learning algorithms and techniques is significant for addressing the unique requirements of different cacao processing stages.The success of these models has the potential to enhance efficiency and accuracy in cacao cultivation and processing, benefiting the chocolate industry.However, the choice of algorithm should consider specific problem and data characteristics, along with other factors.

Conclusion
This study offers a comprehensive overview of the application of Machine Learning techniques in the post-harvest stage of cacao beans.The use of machine learning has greatly facilitated the identification and classification of cacao beans, providing efficient and accurate solutions.This review emphasizes the performance metrics associated with each of the machine learning algorithms identified in the collected studies.

Figure 1 Figure 1
Figure 1 Distribution of ML Algorithms used in each study Figure 1 illustrates the distribution of Machine Learning algorithms across the 36 analyzed studies.A total of 36 different algorithms were utilized, with several employed in only one study, showcasing the diverse approaches taken.These include CART, ResNet V2, Backpropagation Neural Network, ANFIS, Multi-Layer Perceptron, MELS-SVM, KDM, ELM, Vector Quantization, YOLOv2, Bootstrap Forest, MobileNet V3, EfficientNetBO, GLCM, Chi-Square distance+GLCM, and ResNet Model.

Table 1
Overview of ML applications in cacao's various post-harvest fields

Table 2
Top 3 Most-used ML algorithms

Table 2
presents the top three most frequently utilized Machine Learning algorithms among the remaining 33.These algorithms, widely applied across various domains, include Artificial Neural Network (ANN) for classification, Support Vector Machines (SVM) for classification and image processing, and Convolutional Neural Network (CNN) for classification.

Table 3
List of Machine Learning Algorithms utilized in Cacao Classification Post-Harvest Classification

Table 4
List of Machine Learning Algorithms utilized in Cacao Quality Assessment Post-Harvest Classification

Table 3
outlines the learning algorithms employed for classifying cacao's healthiness, maturity, ripeness, and variety.Object detection via YOLOv5m and machine learning methods including ANN, SVM, CNNs (EfficientNetB0, MobileNetV3), and others like SVM, GLCM, and ResNet were used.Notably, the Chi-square distance metric, applied with GLCM as a feature extractor, achieved the highest accuracy in cacao classification.A study focusing on cacao pod maturity classification demonstrated that employing automation tools based on similarity measures, specifically utilizing the Chi-square distance metric with GLCM, significantly enhanced efficiency and accuracy in determining cacao pod maturity.Moving to Table4, which presents the performance of various machine learning algorithms for quality assessment of cacao beans based on morphological features, it can be observed that the ANFIS model achieved the highest accuracy of 99.715%, followed by the MCE model with 99.705%.The MCE model is a combination of multiple machine learning algorithms, which can improve the accuracy and robustness of the model.ANFIS, being a powerful machine learning algorithm that combines the capabilities of neural networks and fuzzy logic, likely demonstrated superior performance in classifying morphological features of cacao pods by effectively capturing the intricate patterns and relationships present in the data.
Among 36 machine learning algorithms used in cacao pod classification, quality assessment, and fermentation, Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) are the most commonly employed.The Gray Level Co-occurrence Matrix (GLCM) algorithm achieved the highest accuracy in cacao pod classification, while Adaptive Neuro-Fuzzy Inference System (ANFIS) achieved 99.715% accuracy in quality assessment.For fermentation, the k-Nearest Neighbors (k-NN) algorithm demonstrated 99.4% accuracy.This comprehensive review provides insights into recent Machine Learning applications in the post-harvest stage of cacao beans.It highlights widely used algorithms, performance metrics, and the need for further advancements in the field.A valuable resource for researchers in the cacao bean industry seeking efficient and accurate solutions.