Identification of arabica coffee post-harvest processing using a convolutional neural network

. Indonesia ’ s economy is greatly boosted by coffee, one of its flagship commodities. The post-harvest processing of coffee involves various processes, and the different methods have a crucial connection to the subsequent stages. Digital image analysis using Convolutional Neural Network (CNN) methods can be utilized to improve the identification of coffee beans. This study uses CNN with the ResNet-18 and MobileNetV2 architectures for image analysis. The research results show that the MobileNetV2 architecture produces the best accuracy of 98.89% at a data proportion of 70:20:10, and the ResNet-18 architecture produces the best accuracy of 99.56% at a data proportion of 50:25:25. This shows that both of them can handle differences in data proportions well in identifying the post-harvest process of Arabica coffee. The choice between the two can be considered based on available computational resources, desired model weight size, and relevant data proportion requirements for the desired application.


Introduction
Coffee is one of Indonesia's leading commodities, which significantly contributes to the country's economy-based on data from the BPS-Statistics Indonesia, Indonesia's coffee production 2021 reached 786.2 thousand tons [1].Apart from that, Indonesia is also known as one of the best coffee producers in the world, with arabica and robusta coffee varieties, which are famous in various international markets.One of the coffee-producing cities in Indonesia is Batu City, located in East Java Province.Various types of coffee thrive in the Batu City area, including Arabica coffee, which has good quality.
Batu City, one of the cities in East Java, located in the highlands, is known as a producer of Arabica coffee.The coffee is processed post-harvest using a full wash, honey, and natural processes.The different types of post-harvest coffee processes have a significant relationship with the coffee roasting process.Coffee beans processed using natural, full wash, and honey processes have different treatments that determine the roasting process.Apart from that, different types of post-harvest coffee processes also affect the prices of different ingredients on the market.Therefore, it is necessary to identify coffee beans using better methods so buyers do not feel disadvantaged.One solution that can be used is to use digital image analysis using the convolutional neural network (CNN) method.In this research, the architectures used for image analysis are ResNet-18 and MobileNetV2 and then compares the level of accuracy of the two architectures.
ResNet (Residual Network) and MobileNet are types of CNN architecture with their respective advantages.ResNet uses residual blocks to overcome the vanishing gradient problem in neural networks and can increase object recognition accuracy in vision tasks.Meanwhile, MobileNet is specifically designed for mobile applications or limited power devices, so it uses lighter and more computationally efficient convolution operations.In this research, the CNN method with the ResNet-18 and MobileNetV2 algorithms was used to identify the types and post-harvest processes of coffee in Batu City.These two algorithms are expected to provide accurate and efficient results in identifying post-harvest coffee processes.

Materials and methods
During the research, several tools were used, categorized into three parts: tools for image acquisition, tools for designing convolutional neural network architectures, and tools for labeling post-harvest processes.Details of the equipment used for image acquisition and designing network architecture are presented in Table 1 and Table 2. Equipment for sample labeling is a digital scale and plastic clips.The material used in this research is Arabica coffee obtained from the KWT Srikandi farmer group in Batu City, East Java Province.In this research, the primary method used is convolutional neural networks to identify the post-harvest process of Arabica coffee.This research involves several process stages, including sample preparation, post-harvest process labeling, image acquisition, system design, and results analysis.The results of this research produce conclusions in the form of a network architecture that provides the best performance in identifying the post-harvest process of Arabica coffee.
The sample in this study was Arabica coffee with a known post-harvest process.Samples will be divided into three groups based on the type of post-harvest process, namely Arabica coffee with natural processing, full wash, and honey.Samples will be taken of each type, with the amount of each sample being 10 grams of coffee.Each group will take 30 samples, so there will be 90 samples for three groups.

Image acquisition
Image acquisition is the process of capturing and collecting images.Image taking is done in the closed box measuring 30 cm x 30 cm x 30 cm.Illumination uses a 12-watt/120-volt white Stripe LED lamp.The image acquisition process begins by placing Arabica coffee, which has been labeled for post-harvest processing, on white HVS paper in the box.Position the camera lens and Arabica coffee perpendicularly at a distance of 28.5cm.The camera for image standardization uses manual mode.The camera settings were set as in Table 3; then, images were taken ten times for each sample so that 900 images were obtained from 90 samples.Each image taken is treated, namely changing the position of the parameters before taking the following image and, for example, changing the position of Arabica coffee, which was initially on the left, to the right.The purpose of changing positions is to train the model to recognize parameters from various positions.The images that have been taken are transferred to a laptop and put into a folder according to the type of post-harvest process for each image.

Design of arabica coffee post-harvest process identification system
The system was designed using Google Collaboratory with the Python language.The choice of Google Colaboratory was due to its ease of use and ability to run commands in Python with processing that can be done in parallel.Google Colaboratory also provides a kernel that allows control of input and output between the computer's CPU, memory, and file system.The Google Colaboratory feature also collects all commands, code, charts, and comments in one shareable file.The reason for choosing the Python language is its ease of understanding and usability, which can be applied to various operating systems [2].The Arabica coffee post-harvest process identification system is designed by preparing datasets, splitting datasets, normalization, and augmentation, importing architecture, training preparation, training, prediction processes, and measuring architectural performance.The dataset prepared is 900 RGB images of Arabica coffee with a 1:1 ratio.The dataset is separated into three post-harvest process classification folders: natural, full wash, and honey processes.The dataset is uploaded to Google Drive to be prepared for processing in Google Collaboratory.Then, the dataset is split into three parts, namely training data, testing data, and prediction data, into three variations of ratio, namely 50:25:25, 60:25:15, and 70:20:10.Then data augmentation is carried out before the primary process to increase image variations.Augmentation is only carried out on training data.The augmentations are random vertical flip, random horizontal flip, random resized crop, and random rotation.Augmentation is not carried out on test data and prediction data because it tests the accuracy of the original image predictions.Then, import the architecture used in the coffee identification system.The architecture used is MobileNetV2 and ResNet-18.Then, training preparation is carried out with a learning rate of 0.001.Then, the training process is carried out by carrying out a maximum of 100 epochs of iteration.The model will stop iterating when it does not experience an increase in accuracy within five epochs.During training, accuracy calculations are performed on each batch to monitor model performance.This calculation must be done because accuracy varies and depends on model parameters such as learning rate, number of epochs, and momentum [3].

System performance measurement
Application system performance is evaluated by calculating accuracy and error values.A confusion matrix is used, which describes the level of classification errors.The confusion matrix provides information about true positives, false positives, and false negatives, representing true and false classifications [4].In order to evaluate the performance of the application system, testing was carried out using a new coffee image.The accuracy value is calculated based on the number of correctly classified compared to the total number of images used.In addition, the error is calculated by comparing the number of incorrectly classified test images with the total test images.The higher the accuracy value and the lower the error value (closer to zero), that the classification system is working well and accurately.On the other hand, if the accuracy value is low and the error value is high, the classification system cannot perform classification well and needs to be evaluated and improved.

Image acquisition results
In this research, 300 images were acquired from each post-harvest coffee process.So, from 3 post-harvest processes, there are 900 images.In image acquisition, coffee beans using the full wash post-harvest process have clean and bright characteristics.This happens because the fermentation process helps remove the mucus layer covering the coffee beans, resulting in balanced, clean, and bright characteristics [5].In image acquisition, coffee beans using the natural post-harvest process have surface characteristics that tend to be rough and wrinkled.The color of natural coffee beans is generally darker, with more striking color variations such as dark brown or black.The surface of natural coffee beans can also have stains or marks from dried fruit skin.In image acquisition of coffee beans using the postharvest process, honey has varying color characteristics, ranging from bright yellow to red or black.Honey process coffee beans have a sticky and slippery surface due to the retained layer of mucilage or mucilage.
The captured images were split into three different proportion ratios for training data, validation data, and test data for training, validation, and testing [6].The proportions of data are selected by trial and error to find the best architectural performance, namely with ratio proportions of 50:25:25, 60:25:15, and 70:20:10.Normalizing the image pixel size is done by changing the original image size, namely 3024 x 3024 pixels, to 224 x 224 pixels.This is done so that the computing process becomes much lighter.

MobileNetV2 architecture accuracy
The accuracy of the results of training and test data using the MobileNetV2 architecture can be seen in Figure 1 The prediction accuracy results using the MobileNetV2 architecture are presented as a confusion matrix, as shown in Figure 2.With a prediction data proportion of 25%, 203 data points are predicted correctly, and 22 data points are mispredicted, resulting in an accuracy of 90.22%.With a predicted data proportion of 15%, 124 data points were predicted correctly, and 11 data points were mispredicted, resulting in an accuracy of 91.85%.With a predicted data proportion of 10%, 89 data points were predicted correctly, and 1 data point was mispredicted, resulting in an accuracy of 98.89%.Based on the resulting prediction accuracy, it can be concluded that the best accuracy in predicting the post-harvest process of Arabica coffee using the MobileNetV2 architecture is at a prediction data proportion of 10%.

ResNet-18 architecture accuracy
The results of the accuracy of the training data and test data using the ResNet-18 architecture can be seen in Figure 3 The prediction accuracy results using the ResNet-18 architecture are presented as a confusion matrix, as shown in Figure 4.With a prediction data proportion of 25%, 224 data points are predicted correctly, and 1 data point is mispredicted, resulting in an accuracy of 99.56%.With a predicted data proportion of 15%, 127 data points were predicted correctly, and 8 data points were mispredicted, resulting in an accuracy of 94.07%.With a predicted data proportion of 10%, 89 data points were predicted correctly, and 1 data point was mispredicted, resulting in an accuracy of 98.89%.Based on these results, the best accuracy in predicting the post-harvest process of Arabica coffee using the ResNet-18 architecture is at a prediction data proportion of 25%.

Best architectural selection
Convolutional neural network models can automatically extract relevant features for a given task [7].The best architecture can be seen from the high level of prediction accuracy from variations in the ratio proportion of the data tested.The performance results for each architecture from 3 different data proportions can be seen in Table 4.In the MobileNetV2 architecture using a proportion of 50:25:25, test data accuracy reaches 89.33%, and prediction data accuracy reaches 90.22%.This shows that the model is good at predicting data that has never been seen before.At 60:25:15, where the proportion in the training data is increased to 60%, the accuracy of the test data and training data increases to 91.56% and 91.85%.This shows that the more training data is used, the better the model predicts new data.At the proportion of 70:20:10, where the test data and prediction data only use proportions of 15% and 10%, a significant increase in the accuracy of the test data and the accuracy of the prediction data is seen, reaching 98.89%.This shows that the model is very good at predicting new data and has high generalization ability.The generalization of the model using the MobileNetV2 architecture is quite optimal because the average accuracy on prediction data shows good performance.This can be seen from the increase in accuracy as the proportion of prediction data used decreases [8][9], which uses the MobileNetV2 architecture to detect mask use.Based on this, the MobileNetV2 architecture can be recommended for identifying the post-harvest process of Arabica coffee because it can achieve an accuracy of up to 98.89% at a data proportion of 70:20:10.
In the ResNet-18 architecture, the proportion of 50:25:25 test data accuracy reaches 100%, and prediction data accuracy reaches 99.56%.There is a decrease in the accuracy of the prediction data, which indicates that the model tends to overfit the test data and cannot generalize well to data that has never been seen before.In this situation, there are indications that the model is experiencing slight overfitting.In the proportion of 60:25:15, it can be seen that the accuracy of the training data has decreased to 73.89%.This decrease may be due to the relatively lower complexity of the model in capturing patterns on larger training data.However, the accuracy of the test data is still high; namely 98.67%, and the accuracy of the prediction data is 94.07%.This shows that the model can still generalize well on data not used during training, but some unique patterns in the training data may be complicated for the model to capture.At the proportion of 70:20:10, the accuracy of the training data increased slightly to 78.25%, the accuracy of the test data remained high at 98.33%, and the accuracy of the prediction data reached 98.89%.These results show that the ResNet-18 architecture can also produce a model with fairly good generalization on the dataset used.
The MobileNetV2 architecture tends to increase the accuracy of the training data as the amount of training data increases.This aligns with the theory that the more training data, the higher the system accuracy [10].However, in the ResNet-18 architecture, there is a decrease in the accuracy of the training data when the proportion of training data is increased.Similar results were also found in research about wayang image classification using CNN [11].This research uses variations in the number of epochs, but there is no significant difference where the data proportion of 70:30 always produces higher accuracy than the data proportion of 80:20; this is also different from the previous theory.According to [12], this can happen because variation in images within the same class may increase as the training data increases.In cases like this, accuracy depends on the model's ability to handle high image variations within a class.The number of datasets used also influences the level of model accuracy; the more datasets used, the smarter the resulting model [13].However, in the case of identifying the post-harvest process of Arabica coffee, both MobileNetV2 and ResNet-18 architectures can perform classification well, as indicated by an increase in the average accuracy of the prediction data as the amount of training data increases.

Conclusion
In this research, a performance comparison was carried out between the ResNet-18 and MobileNetV2 architectures in identifying the post-harvest process of Arabica coffee (Coffea arabica) using the convolutional neural network method.Apart from that, this research also compares the effect of three different data proportions, namely 50:25:25, 60:25:15, and 70:20:10.The research results show that the MobileNetV2 architecture produces the best accuracy of 98.89% at a data proportion of 70:20:10.The ResNet-18 architecture produces the best accuracy of 99.56% at a data proportion of 50:25:25.This shows that both of them can handle differences in data proportions well in identifying the post-harvest process of Arabica coffee.According to our research findings, it has been observed that the ResNet-18 and MobileNetV2 architectures demonstrate almost identical levels of performance in identifying the post-harvest process of Arabica coffee.The choice between the two can be considered based on factors such as the level of computing available, the size of the desired model weights, and the proportion of data requirements relevant to the desired application.

Fig. 1 .
. At a data proportion of 50:25, it was found that the best results were achieved in the 31st epoch, with training data accuracy of 78.44% and test data accuracy of 89.33%.At a data proportion of 60:25, the best results were achieved in the 25th epoch, with training data accuracy of 78.89% and test data accuracy of 91.56%.At a data proportion of 70:20, the best results were achieved in the 42nd epoch, with training data accuracy of 85.71% and test data accuracy of 98.89%.Based on the resulting accuracy, the data proportion of 70:20 has the best accuracy among the other proportions.MobileNetV2 accuracy chart for data proportion 50:25 (a), 60:25 (b), and 70:20 (c).

Fig. 3 .
. At a data proportion of 50:25, it was found that the best results were achieved in the 33rd epoch, with a training data accuracy of 89.56% and a test data accuracy of 100%.At a data proportion of 60:25, the best results were achieved in the 15th epoch, with training data accuracy of 73.89% and test data accuracy of 98.67%.At a data proportion of 70:20, the best results were achieved in the 16th epoch, with training data accuracy of 78.25% and test data accuracy of 98.33%.Based on these results, the data proportion 50:25 has the best accuracy among other proportions.ResNet-18 accuracy chart for data proportion 50:25 (a), 60:25 (b), and 70:20 (c).

Table 1 .
Image acquisition equipment.