Synergizing Smart Agriculture with Hybrid Deep Learning: Predicting Crop Yields Using IoT

. Agriculture can be defined as the systematic and intentional practice of cultivating and managing plants and animals to produce food, fiber, and other agricultural products. Agricultural practices in India hold the second position globally and encompass approximately 61.1% of the total land area in the country. The Indian economy primarily relies on agriculture and agro-industrial products. Various factors, such as soil composition (including elements like Nitrogen, phosphorus, and Potassium), crop rotation practices, soil moisture content, ambient temperatures, precipitation patterns, and other relevant variables, can significantly influence crop productivity. Smart Agriculture (SA) implementation has recently yielded significant practical benefits, establishing it as a highly significant and valuable system. Using environmental information, including wind velocity, temperature, and moisture, in outdoor plantations facilitates farming operations' strategic management and regulation, enhancing crop yield and quality. Accurately predicting crop yield trends poses a challenge due to the intricate nature of sensing data, characterized by complexity, nonlinearity, and multiple variables. This study proposes a Hybrid Deep Learning model for Predicting Crop Yields (HDL-PCY) using the Internet of Things (IoT). The HDL-PCY system utilizes the Empirical Mode Decomposition (EMD) technique to break down the crop yield information into distinct element groups with varying frequency attributes. Subsequently, a Long Short-Term Memory (LSTM) network is trained for each group to serve as a sub-predictor. Finally, the predictions generated by the LSTM networks are combined to produce the overall prediction result. The obtained results demonstrate that the proposed HDL-PCY can achieve higher levels of accuracy of 97.32%, 98.03%, 98.74%, and 95.92% for precipitation, temperature, pH, and moisture content, respectively, thereby catering to the requirements of SA.


Introduction
Agriculture is one of the prominent industries observed in the country.Various agricultural activities contribute significantly to the nation's financial growth, resulting in a substantial improvement.Hence, it is commonly denoted as the most comprehensive income-generating approach.In India, approximately 61.34% of the total land area is allocated for agricultural purposes.This leads to the satisfaction of the needs of approximately 1.18 billion individuals.
A significant scope of agribusiness modernization characterizes the contemporary era.Therefore, farmers increasingly adopt strategies to maximize their advantages and attain higher profits while minimizing costs [1].The analysis of informational indexes is facilitated by utilizing Data Science techniques, specifically Data Analytics (DA).This enables the extraction of meaningful insights from the data contained within these indexes with the assistance of specialized software and frameworks.
Historically, the anticipated yield was determined by a rancher's knowledge of a particular piece of land and its potential for crop production [2].Due to the gradual evolution of conditions, farmers increasingly prioritize cultivating various crops.Given the present circumstances, it is evident that a substantial number of farmers require additional data about the latest crop yields.The individuals in question lack awareness of their financial gains after cultivation [3].Similarly, the financial viability of a farm can be enhanced by possessing a comprehensive comprehension and accurate assessment of crop performance under natural circumstances.The soil constituents, namely Nitrogen, Phosphorus, and Potassium, are obtained from the region.The handling section of the study incorporates two additional datasets, namely the crop and feature datasets.These datasets were obtained from the reputable online platform kaggle.comand contain unique information.
Given the growing world population needs and the ongoing challenges of global warming, the agricultural sector is experiencing a significant shift as it incorporates advanced technologies [4].An example of a paradigm shift can be observed in the convergence of SA and HDL, which presents unparalleled prospects for improving crop yield prediction and optimizing farming techniques.This paper examines the interdependent connection between the domains mentioned, proposing an innovative methodology for forecasting agricultural output by leveraging the IoT and HDL models.
The increasing need for food security compels the examination of novel approaches to enhance agricultural productivity.Although robust, conventional agricultural practices frequently exhibit constraints in adjusting to swiftly evolving environmental circumstances.In the present context, integrating IoT technologies in agriculture has emerged as a significant and influential force with the potential to bring about transformative changes [5].Implementing sensors and actuators within the agricultural ecosystem facilitates real-time monitoring of essential parameters, including soil moisture, temperature, and nutrient concentrations.When effectively utilized, this abundance of data has the potential to provide valuable insights into the complex dynamics of crop growth, thereby enabling precision agriculture and optimizing resource allocation [6].
The emergence of HDL models, which combine the advantageous aspects of neural networks and conventional Machine Learning (ML) algorithms, has facilitated advancements in predictive analytics [7].These models demonstrate exceptional performance in acquiring complex relationships and trends within extensive datasets, rendering them highly effective tools for complicated tasks such as PCY.HDL models exhibit the potential to enhance the precision and comprehension of PCY by combining the capabilities of deep neural networks and classical ML algorithms.
This study aims to establish a connection between SA and HDL by presenting a comprehensive framework for predicting crop yields.Incorporating data generated by the IoT into HDL models facilitates a comprehensive comprehension of the various factors that impact crop growth, thereby enhancing the precision and sophistication of predictions.The proposed framework emphasizes the technical complexities of model development and explores practical consequences and advantages for farmers and stakeholders within the agricultural value chain [8].
This paper envisions a future in which the integration of SA and HDL facilitates the adoption of data-driven and environmentally friendly farming methods, thereby addressing the intricate challenges of modern agriculture.The following sections will examine the , 050 (2024) BIO Web of Conferences MSNBAS2023 https://doi.org/10.1051/bioconf/2024820500909 82 methodology, implementation, and results, offering a comprehensive analysis of the effectiveness of the proposed framework and its potential to transform PCY in the field of SA.

Related works
The convergence of modern agricultural methods and advanced technologies in the present era has brought about a paradigm shift in traditional farming practices, leading to the emergence of precision farming.Within this context, the literature survey delves into the ever-evolving realm of SA, specifically emphasizing the harmonious integration of HDL methodologies and the IoT.Integrating these technologies presents significant opportunities for tackling crucial agricultural challenges, such as the accurate prediction of crop yields.Li et al. (2023) used transfer learning and multiple data sources to detect rice diseases accurately and interpretably [9].The method uses multiple data sources and transfer learning to improve rice disease detection.Implementation involves integrating data sets and using transfer learning algorithms.The output is a resilient rice disease detection model with high accuracy.The approach is interpretable and accurate, providing valuable insights for disease management.However, drawbacks like the need for large computational resources and data integration difficulties must be considered.
Yoosefzadeh Najafabadi et al. ( 2023) explored using ML techniques to improve plant breeding programs in modern times.The methodology integrates ML techniques into plant breeding processes [10].Implementation involves applying these methods to modern breeding programs.Values show that this process improves plant breeding using more advanced and effective methods.Better breeding efficiency and faster genetic enhancement are benefits of this method.However, applying these methods to many plant species and breeding goals may be difficult.Fan et al. (2023) examined the implications of using DL and AI for sustainability.The methodology includes a comprehensive literature review of the SDGs, clean energy, and environmental health [11].Implementation integrates and consolidates research findings from scholarly literature.A comprehensive review article provides insights into the importance of DL and AI in sustainability.A comprehensive understanding of the topic is one of the benefits of this approach, but it may depend on the quality and quantity of available scholarly resources.Kanna et al. (2023) proposed advanced DL methods for detecting diseases in cauliflower plants.The method uses advanced DL techniques to predict diseases [18].The model is trained using relevant datasets during implementation.This study created a sophisticated predictive model to detect cauliflower plant diseases early.The model's output values show improved disease prediction.This method detects diseases early, allowing for prompt treatment.A potential obstacle is the need for large and diverse training datasets.
Wang et al. (2023) explored the impact of AI and cyber-physical-social systems on global food safety and sustainability.The methodology includes theoretical research into incorporating these technologies [13].The implementation phase involves deciding on the use of AI and cyber-physical systems.The following discussion addresses how these technologies could improve global food security.One benefit of this approach is its forwardthinking.However, real-world application and acceptance of these technologies may have drawbacks.Kong et al. (2023) proposed LCA-Net, a neural network aggregating information across stages to identify and classify crop pests and diseases accurately.The method develops a neural network architecture.Training the LCA-Net with multiple datasets is required for implementation [14].This study created a neural network for precise crop pest and disease identification and classification.This neural network performs better in accuracy and efficiency.The network's lightweight design makes deployment easy.However, the network may struggle to handle unexpected pest and disease variations.
Khan and Shahriyar (2023) proposed a framework for improving onion crop management using IoT sensors and cloud technology [15].The methodology integrates IoT sensors and cloud technology for onion crop management.Implementation involves using these technologies in agriculture.This study refined onion crop management, improving efficiency as shown by improved result values.Using precision agriculture and resource optimization is beneficial.However, setup costs and the need for reliable internet connectivity may prevent these practices from being implemented.
Esmail et al. ( 2023) developed a smart irrigation system using IoT and ML techniques.The methodology combines IoT devices and ML algorithms to optimize irrigation processes [16].Implementation involves installing and integrating irrigation system sensors and algorithms.An intelligent irrigation system using machine learning forecasts improves water use.The pros include water conservation and increased agricultural productivity, but the cons include regular maintenance and sensor reliability.
In summary, the literature review has revealed diverse scholarly investigations at the juncture of SA and HDL, particularly within forecasting agricultural productivity utilizing the integration of the IoT [17].Integrating real-time sensor data with advanced deep learning algorithms has exhibited encouraging outcomes, providing a glimpse into the prospective advancements in precision farming.The integration of SA and HDL is expected to significantly impact the development of sustainable and efficient farming practices as the agricultural landscape undergoes continuous transformation [12].

Hybrid Deep Learning model for Predicting Crop Yields (HDL-PCY) using the Internet of Things (IoT)
The predictor under consideration exhibits a hybrid architecture, wherein the EMD decomposition technique is employed to mitigate the nonlinear complexity.Additionally, the Intrinsic Mode Functions (IMFs) are partitioned into three groups utilizing Convolution Neural Networks (CNN) networks.The modeling and prediction tasks for each group involve utilizing the DL network known as the LSTM.Ultimately, the predictions generated by the LSTM are aggregated to derive the final prediction outcome.

Empirical Mode Decomposition (EMD)
The EMD technique disintegrates intricate signals into a limited set of IMFs by considering their frequency characteristics.These IMFs are required to adhere to the subsequent conditions: Two conditions must be satisfied: (1) the absolute value of the discrepancy between the count of zero crossings and extreme points is either 0 or 1, and (2) the average value of the envelope formed by local maxima and minima must be zero at all points.The EMD is a data processing or mining technique designed to adapt to changing patterns in time series data.It functions primarily as a smoothing mechanism for such data.Algorithm 1 shows the Empirical Mode Decomposition (EMD) procedure.It is worth noting that the quantity of trained forecasting sub-models will vary compared to the number of IMF components within the training prediction window.Hence, it is imperative to aggregate individual IMFs into a predetermined quantity based on their frequency characteristics.This study categorized the IMFs into three groups based on their frequency characteristics.This categorization involved labeling, grouping, and aggregating the decomposition components that exhibited similar frequency features.Subsequently, a separate model was trained for each group.Consequently, the quantity of models within each prediction window will remain constant.

LSTM network
The LSTM is trained using the stochastic gradient descent algorithm, utilizing the available input and output data to obtain the ideal weight.The LSTM network underwent training using the cumulative IMF sequences within each group.The LSTM network comprises multiple LSTM cells with a specific configuration of 2 hidden layers.As depicted in Fig. 1, The input of the LSTM network is denoted as   , ℎ  = 1,2, … … .Similarly, the output is denoted as   , ℎ  = 1,2, … … .
LSTMs exhibit a comparable fundamental structure to RNNs.Still, they possess a more intricate configuration for their hidden layer components, as depicted in Fig. 1.The components of a neuron include an Input Gate (IG), an Output Gate (OG), a Forget Gate (FG), and a storage cell.The problem of vanishing gradients is circumvented by the implementation of these gates, which selectively determine the passage of data and its inclusion in the cell.The IG is responsible for determining the data allowed to enter the cell.On the other hand, the OG determines the data that is permitted to leave the cell.Lastly, the FG determines the data that is discarded from the cell.These three gates' opening and closing times are determined through network connectivity.The LSTM network is characterized by a set of equations that elucidate its operational mechanisms and facilitate the adjustment of the parameters associated with its constituent components.
Forget Gate (  ): The extent to which the previous cell state is preserved is determined by the operation of the   .The function produces a numerical value ranging from 0 to 1, representing the extent of data being forgotten.This is determined by considering the current input (  ) and the previous hidden state (ℎ −1 ) as input parameters.The equation known as the FG equation is given by: , 050 (2024) BIO Web of Conferences MSNBAS2023 https://doi.org/10.1051/bioconf/2024820500909 82 Cell State (  ): The cell state is a crucial component of the LSTM architecture, as it serves as the internal storage mechanism.The modification of the cell state (  ), FG and IG influence its subsequent state.The subsequent equation represents the cellular state: Contender Cell State (  ): One potential addition to the existing cell state is the concept of the prospective cell state.The calculation involves the utilization of the previous hidden state (ℎ −1 ) and the current input (  ).The subsequent equation presents a potential cellular state.
Output Gate (  ): The OG regulates the extent to which the current cellular state is disclosed as the output.The function produces a numerical value ranging from 0 to 1, which represents the amount of data to be produced based on the current input (  ) and the previous hidden state (ℎ −1 ).The following provides the equation for the OG: Hidden State (ℎ  ): The hidden state signifies the result of the LSTM at each subsequent temporal increment.The calculation involves analyzing the original state of the cell in comparison to its current state.The subsequent expression provides the equation for the hidden variable: The variables "M" and "b" in the given equations represent the relative weights and biases, respectively, of the LSTM network.The hyperbolic tangent function, commonly denoted as tanh, is a mathematical function that maps input values to a range between -1 and 1.In contrast, the sigmoid function, often represented as , limits the input values to a range between 0 and 1.The LSTM network possesses the capability to discern and preserve enduring connections within data streams through the utilization of specific equations.This characteristic renders it a favorable option for predicting crop yield in SA.

HDL-PCY using the IoT for SA
Fig. 2 shows the HDL-PCY using the IoT for SA.The quantity of groups is predetermined as three.The model encompasses two fundamental processes, namely training and prediction.The initial step involves training the CNN and LSTM using the IMFs decomposed through EMD.The data undergoes decomposition into IMFs using the EMD technique.Subsequently, labels are assigned to each IMF, and these IMFs are further categorized into three distinct groups (Groups 1-3) based on their frequency characteristics.The CNN is trained using IMFs and corresponding labels, after which the sequences are assigned to their respective groups.In conclusion, the training process involves using GRU models for each group, acquiring three distinct GRU sub-predictors.The prediction process involves utilizing trained networks to forecast future climate data trends.This process is executed by aggregating the group predictions of GRU models, which are based on IMF groups.

Results and discussion
The model is trained using a large dataset consisting of agricultural parameters, which represents a wide range of crops.An additional dataset is utilized as the feature dataset.The datasets were obtained from the website kaggle.com.The dimensions of the crop dataset amount to 7908 kilobytes.The dataset encompasses various prediction parameters: precipitation, temperature, pH, moisture content, and terrain.The dataset includes various crops such as wheat, millet, sugarcane, and green gram.Multiple values are accessible for each prediction parameter for a single crop.
The performance of the proposed method has been compared with Recurrent Neural Network (RNN), LSTM, Gated Recurrent Unit (GRU), and the proposed HDL for PCY.The Root Mean Square Error (RMSE) has been employed as a metric to quantify the disparity between the predicted values and the data gathered.3 shows the RMSE values for PCY using various DL-based predictors.The RMSE values function as a metric for evaluating the predictive precision, where smaller values correspond to superior performance.In the context of various environmental factors that impact crop yield, the HDL model consistently exhibits superior predictive capabilities compared to the RNN, LSTM, and GRU models.The HDL model consistently demonstrates the lowest RMSE values for precipitation, temperature, pH, and moisture content, which are 2.68, 1.97, 1.26, and 4.08, respectively.This study's findings emphasize the HDL method's efficacy in improving the accuracy of crop yield predictions under various environmental conditions.This highlights its potential to advance agricultural forecasting and decisionmaking procedures in SA.Specifically, it achieves accuracy rates of 97.32%, 98.03%, 98.74%, and 95.92% for precipitation, temperature, pH, and moisture content, respectively.The findings of this study highlight the effectiveness of the HDL method in improving the precision of crop yield predictions.This suggests that the approach has the potential to make a substantial contribution to SA by offering dependable and resilient forecasts under various environmental circumstances.

Conclusion
The present study introduces a novel approach, namely the Hybrid Deep Learning model for Predicting Crop Yields (HDL-PCY), which leverages the Internet of Things (IoT) capabilities.The HDL-PCY system employs the EMD technique to partition the crop yield data into discrete components with different frequency characteristics.Following this, each group's LSTM network is trained to function as a sub-predictor.Ultimately, the LSTM networks' predictions are aggregated to yield a comprehensive prediction outcome.The results obtained in this study indicate that the proposed HDL-PCY can attain significantly improved levels of accuracy for various parameters.Specifically, the accuracy rates for precipitation, temperature, pH, and moisture content were 97.32%, 98.03%, 98.74%, and 95.92% respectively.These findings suggest that the HDL-PCY model is well-suited to meet the specific needs of the SA.

,Algorithm 1 . 1 .
050 (2024) BIO Web of Conferences MSNBAS2023 https://doi.org/10.1051/bioconf/2024820500909 82 Empirical Mode Decomposition (EMD) Input: Signal x[n] Output: Set of Intrinsic Mode Functions (IMFs) Procedure: Initialize the signal as the current working signal: c[n] = x[n] 2. Initialize an empty set to store the IMFs: imfs = [] 3. Repeat the following steps until c[n] becomes a monotonic function or the number of iterations reaches a predefined limit: a. Find all local extrema (maxima and minima) of c[n].b.Interpolate the upper and lower envelopes connecting the maxima and minima, respectively.c.Compute the mean of the upper and lower envelopes: m[n] = (upper[n] + lower[n]) / 2. d.Extract the first IMF as h[n] = c[n] -m[n].e. Set the current working signal as the residual: c[n] = c[n] -h[n].f.Store the IMF in the set of IMFs: imfs.append(h[n]).4. The remaining residual c[n] is considered as the last IMF: imfs.append(c[n]). 5.The set of IMFs contains the decomposed components of the original signal x[n].Example Usage: x = [your input signal] imfs = empirical_mode_decomposition(x)

,Fig. 1 .
Fig. 1.LSTM network structure Input Gate (  ): The quantity of newly acquired data that is incorporated into the cellular state is contingent upon the   .The system generates a numerical value ranging from 0 to 1, which indicates the degree of relevance of the newly acquired data.It takes as input the current value (  ) and the previous hidden state (ℎ −1 ).The subsequent equation delineates the   :

,Fig. 3 .
Fig. 3. RMSE values for PCY using various DL-based predictorsFig.3shows the RMSE values for PCY using various DL-based predictors.The RMSE values function as a metric for evaluating the predictive precision, where smaller values correspond to superior performance.In the context of various environmental factors that impact crop yield, the HDL model consistently exhibits superior predictive capabilities compared to the RNN, LSTM, and GRU models.The HDL model consistently demonstrates the lowest RMSE values for precipitation, temperature, pH, and moisture content, which are 2.68, 1.97, 1.26, and 4.08, respectively.This study's findings emphasize the HDL method's efficacy in improving the accuracy of crop yield predictions under various environmental conditions.This highlights its potential to advance agricultural forecasting and decisionmaking procedures in SA.

,Fig. 4 .
Fig. 4. Accuracy values (%) for PCY using various DL-based predictors Fig. 4 depicts the accuracy values (%) for PCY using various DL-based predictors.Greater accuracy values indicate superior performance in capturing the inherent patterns of crop yield influenced by environmental factors.The proposed HDL consistently performs better than RNN, LSTM, and GRU models across multiple parameters, including precipitation, temperature, pH, and moisture content.It is worth mentioning that the HDL model demonstrates high levels of accuracy in predicting various environmental factors.Specifically, it achieves accuracy rates of 97.32%, 98.03%, 98.74%, and 95.92% for precipitation, temperature, pH, and moisture content, respectively.The findings of this study highlight the effectiveness of the HDL method in improving the precision of crop yield predictions.This suggests that the approach has the potential to make a substantial contribution to SA by offering dependable and resilient forecasts under various environmental circumstances.