Estimating Surface Nitrate Concentrations in the Coastal Areas of the Around Savu Sea and Southern Sumba Island Based on Remote Sensing Data

. Nitrate is an essential nutrient in phytoplankton's photosynthesis process. In addition, phytoplankton uses nitrate for their growth and reproduction. Nitrate abundance on the coast will affect primary productivity and biogeochemical cycles. The availability of nitrate observation data, especially around the Savu Sea coast, is minimal. In this study, the estimation of nitrate in the coastal area of the southern part of Sumba Island and the eastern part of Savu Island by using the generalized additive model (GAM). Seventy-one nitrate observation data were used to build the GAM model, and remote sensing data were used as input data for nitrate estimation. Sea Surface Temperature (SST) and chlorophyll-a data were obtained from Aqua-MODIS. Sea Surface Salinity (SSS) and Sea Surface Windspeed (SSW) data were obtained from a Microwave Imaging Radiometer with Aperture Synthesis (MIRAS) Soil Moisture-Ocean Salinity (SMOS), and Advanced Scatterometer (ASCAT), respectively. This study uses the Generalized Additive Model (GAM) approach to predict the distribution of nitrate concentrations and determine the main driving factors associated with nitrate. Based on the result, temperature is the dominant factor in nitrate estimation, while chlorophyll-a has a relatively small influence. The best model to predict nitrate distribution uses four parameters, namely SST, SSS, SSW, and chlorophyll-a. The validation results of the expected nitrate value obtained from the model with the observed nitrate value obtained results with the same value range of 0 - 0.35; the difference is the value of the distribution. From the comparison results, the R 2 value is 0.357.


Introduction
As a principal nitrogenous substance, nitrate is essential as a supporting nutrient for ocean primary producers in photosynthesis.[1].Phytoplankton rely on nitrate to fuel their growth and photosynthesis, providing food for many marine organisms, including zooplankton, shellfish, and fish.Furthermore, nitrate is essential for the process of nitrification in the ocean.Nitrate, the most oxidized form of inorganic nitrogen, is necessary for nitrogen availability based on nutrient exchange fluxes [2].Nitrate concentration in a water body is very dynamic.Nitrate is an essential compound for water quality assessment.Estimates of nitrate distribution can be made from satellite image measurements, but this has received little attention.
Marine nitrate variability is influenced by three dominant physical, biological, and chemical processes [3].Physical processes transfer nitrate-rich aquatic sources to the ocean surface, while biological and chemical processes increase the formulation of available nitrogen and the availability of nitrate in the ocean [4,5].The combined effects of physical and biogeochemical processes are seen in the spatial and temporal distribution of sea surface nitrate (SSN).Upwelling and seawater mixing are the main physical processes that drive water from the thermocline layer to the upper layers, thereby increasing concentration [5].Moreover, the horizontal mixing of different water masses, characterized by sea surface salinity (SSS) and SST, also signifies the spatial dynamics of SSN [3].The nutrient enrichment by the upwelling process in coastal areas causes increased phytoplankton growth and primary production [3,[6][7][8].This is because the waters that experience the upwelling process are usually cold, the upwelling process is also characterized by SST and sea surface wind speed (SSW) [3,9,10].
In oceanography, remote sensing makes it easier for researchers to obtain aquatic data.Various satellites and sensors provide day-to-day data that can be used for monitoring and assessing oceans and coastal areas.The availability of these data makes it easier for researchers to conduct studies in a location that is limited in the availability of observation data [11,12].The utilization of remote sensing data in the study of nitrate content in coastal areas has been carried out.After an approach using satellite image data, an inverse relationship was found between nitrate concentration and sea surface temperature (SST).This is because nitrate influx into the euphotic zone occurs when cold, nutrient-rich deep waters are brought to the surface by upwelling phenomena.The relationship between SST and nitrate concentration is strongly influenced by the characteristics of regional hydrodynamic and biogeochemical conditions and, therefore, shows high spatial and temporal variability [12][13][14][15].
Utilization of remote sensing data in the SSN estimation study has been done.Empirically estimating SSN is complex and requires the relationship of several parameters, including SST, SSS, SSW, and chlorophyll-a [1,[16][17][18].The Generalized Additive Model (GAM) approach can be used to estimate and determine the relationship between nitrate abundance and the dynamics of oceanographic environmental parameters.The generalized Additive Model (GAM) is a generalized semi-parametric model of multiple linear regression with features that do not require normality of the data distribution [19][20][21].The GAM method can objectively predict the abundance of a parameter based on environmental parameters over a large geographic area and is one of the appropriate methods for modelling nitrate distribution in coastal areas [19,[22][23][24].The model generated from the GAM method utilizes the effects of environmental parameters and their contribution to nitrate variability [21,22,25,26].
In this study, SSN estimation was conducted in the coastal areas of the southern part of Sumba Island and the eastern part of Savu Island.The data used is a combination of in-situ data and satellite image data.Both data are used to create and validate a model that can estimate the distribution of SSN concentrations.Although it only uses a simple model, this model helps estimate SSN concentrations in terms of spatial and temporal.This model can also estimate nitrate concentrations in coastal areas.

In-situ Data
In-situ data were obtained by conducting two surveys (on 27-28 July 2023 around the coastal area of Savu Sea and 28-29 August 2023 in the southern part of Sumba island), and the data taken consisted of several parameters including nitrate, SST, SSS, SSW, and chlorophyll-a.Data collection was carried out at 24 stations, with details of 9 stations (S1-S9) around the coastal area of the Savu Sea and 15 stations (S10-S24) in the southern part of Sumba island.Sampling locations can be seen in Figure 1.Data collection points (S1-S9) located around the coastal area of the Savu Sea are marked in cyan, and data collection points (S10-S24) located in the southern part of Sumba Island are marked in blue.

Remote Sensing Data
This research uses some acquisition of several remote sensing data as supporting data in the implementation of the research.The remote sensing data consists of four oceanographic parameters to study the relationship between nitrate and oceanographic parameters, including SST, SSS, SSW, and chlorophyll-a, which can be seen in Table 1.Removing remote sensing data adapts to the in-situ data obtained, starting from the station's location and the date and time.

Generalized Additive Model (GAM)
GAM provides a structure to generalize the general linear model by allowing the additivity of non-linear functions of the variables.Alternatively, the advantage of GAM is that it minimizes the error in predicting the dependent variable Y from various distributions by estimating the unspecified function associated with the linking function with the dependent variable.GAM provides a flexible response by defining the model in terms of smooth functions instead of detailed relationships between parametric and covariate [27,28].
All data from direct observations were used to create the model, and oceanographic data obtained from remote sensing were used as input data in the GAM model.The model examines the estimated distribution of nitrate from remote sensing data.The result of the model defines the value of the predicted nitrate distribution supported by oceanographic parameters.Oceanographic data incorporated into the GAM model are remote sensing data, including SST, SSS, SSW, and chlorophyll-a.

Ccorrelation analysis
Correlation analysis determines the relationship between two or more variables that may be related so that the degree of correlation between variables can be measured [29].The most common coefficient used in calculating the correlation between variables is the moment correlation (Pearson's r) , which measures the major and direction of the linear relationship [30].The value of the determination coefficient (r) describes the proportion of variance in the result variable that the variance of the predictor variable can explain.

Figure 2. Heatmap correlation coefficients
Figure 2 shows the correlation between nitrate, SST, SSS, SSW, and chlorophyll-a.The results of the correlation analysis show that nitrate is dominantly correlated with SST (r = 0.24) and chlorophyll-a (r = 0.076) compared to other parameters.This is because these two parameters have a relationship during the upwelling process, where the mass of water rises to the surface, the nitrate concentration will increase with increasing depth [15,22,31].The lowest correlation value occurred between nitrate and SSW with a value of (r = -0.27).This is because the influence of wind speed on the surface does not have much impact on the distribution of nitrate; wind speed on the surface will have more effect on the distribution of chlorophyll-a and salinity.In this case, wind speed does not dominantly affect the distribution of nitrate in the surface layer due to solid stratification, and wind speed is also mostly weakened during the halocline process [32,33].The leading cause of the increase in nitrate flux is not the increased mixing caused by wind but rather the stratification in the water mass in certain seasons.A comparison of in-situ and remote sensing data was conducted to see the value of the coefficient of determination (R 2 ) of all parameters obtained.The results of the comparison will be used as an estimator in GAM, the results of the comparison are also valuable for knowing the most influential parameters in the distribution of sea surface nitrate.Some parameters that are compared include SST, SSS, SSW, and chlorophyll-a.It is quite difficult to expect a high comparison and correlation between in-situ measurement data and remote sensing data.This is because remote sensing data measurements using satellite imagery cannot be maximized in coastal areas, with the primary factor being the different spatial resolution between each satellite image.Our goal is to compare the values of several parameters affecting sea level nitrate distribution obtained from in-situ measurements and the values of several parameters obtained from satellite data.The results of the comparison of SST obtained from in-situ measurements get a value within a range of 26-32 o C, while SST obtained from satellite imagery gets a value within a range of 25-32 o C, the results of the acquisition of SST do not have a considerable difference marked by the value of the coefficient of determination (R 2 ) of 0.781 Figure 4a.This significant difference can be caused by satellite imagery being less than optimal in capturing images in coastal areas due to proximity to land, requiring interpolation in processing data obtained from satellite imagery [34].The results of the comparison of chlorophyll-a obtained from in-situ measurements and those obtained from satellite imagery Figure 4b, with a range of chlorophyll-a values from both in-situ measurements and satellite image measurements ranging from 0 -0.8 mg/l and coefficient of determination (R 2 ) of 0.985, when viewed from the value (R 2 ) the value of chlorophyll-a between in-situ measurements and satellite image measurements is not significantly far away.In this case, the value of chlorophyll-a in-situ measurements and satellite image measurements have an almost perfect linear relationship because the value of (R 2 ) is close to 1 [28][29][30].The results of the comparison of SSS obtained from in-situ measurements and satellite images Figure 4c, with the range of the SSS values of both in-site measurement and satellite image measuring ranging between 28 -38 ppt and the value of the determinant coefficient (R 2 ) of 0.971, if reviewed from the value (R 2 ) of the value from SSS between in-situ measures and non-significant distant satellite image measures, the value (R 2 ) has the same case as chlorophyll-a.The results of the SSW comparison obtained from in-situ measurements and satellite images Figure 4d, with the range of SSW values from the in-situ measurement ranging from 2 to 6 m/s and the measuring of satellite images ranging between 2 -5.5 m/ s and the value of the determinant coefficient (R 2 ) of 0.874, if reviewed from the value (R 2 ), the value from SSW between in-situ measuring and satellite images measuring is not significant far away, of the value (R 2 ) has the same case as SSS and chlorophyll-a.When compared with SSS, SSW, and Chlorophyll, SST values are smaller because the correlation between the in-situ data and image satellite data is not so closely related.

Model specification and selection
The data set is divided into training and testing sections using the following scheme: remote sensing data is used for model building.In contrast, in-situ data is stored and used for model testing.This is used to evaluate the model's ability to predict sea surface nitrate distribution and is used to represent the model's temporal behaviour.Of some models that have been made, there are four best models in Table 2, the best models can be determined from several value factors, including D 2 , R-adj, CGV score, and AIC [22].The best model is expanded with SST, SSS, SSW, and chlorophyll-a, corresponding statistical values of D 2 = 70.4%,R-adj = 0.617, CGV score = 0.0055, and AIC = -170.63.The model uses four df parameters for interception: 4.54 df SST, 7.58 df SSS, 1 df SSW, and 2.77 df chlorophyll-a for smoothing, as well as for a total of 17.9 df.

. An Expanded Model With SST, SSS, SSW, and Chlorophyll-a
Including oceanographic parameters in a model can produce functional relationships that define the predictive SSN values while improving fit and predictiveness.Based on the evaluation of 11 models, Table 2 is the list of the best four GAM models.All of the best GAM models use the SST values to predict SSN.Therefore, this model installs four parameters, including SST, SSS, SSW, and chlorophyll-a, and the interaction of the four parameters.The next model gives the best by equation 1.  =  0 +  1 (  ) +  2 (  ) +  3 (  ) +  4 (ℎ  ) (1) This model uses a total of 17.9 df (4 for intercept, 4.54 for SST term, 7.58 for SSS term, 1.00 for SSW term, and 2.37 for Chl term), with P-value values SST (< 0), SSS (< 0.1), SSW (< 1), and Chl (< 0.001).The fit statistics for this model are the best among all models, with D 2 = 70.4%,R-adj = 0.617, CGV score = 0.0055, and AIC = -170.63(Table 2).SST is the strongest factor in SSN concentrations so that SST parameters will be used in every model.Figure 4 shows a significant difference in the value of the directly observed nitrate and predicted nitrate.The nitrate value obtained from direct observation ranges between 0 -0.35 ppm, while the nitrate value produced by model 1 or the predicted nitrate value ranges between 0 -0.25 ppm.From the results of the comparison between direct observation nitrate and nitrate prediction generated from model 1 have a fairly different correlation, this is characterized by the value of R 2 = 0.357.The results of model 1 have a higher R 2 value when compared to the other three models, and this is because model 1 includes four parameters.

An Expanded Model With SST, SSS, and SSW
After using all the oceanographic parameters in the previous model with the value of several aspects that affect the good and bad of the model.The next model experiment will reduce one oceanographic parameter, hoping this model can improve SSN's quality and predictive value.This model installs three oceanographic parameters, including SST, SSS, and SSW.The second model of the four best models is represented by equation 2.
The fit statistics for this model were one of the best among the models, with D 2 = 69%, R-adj = 0.609, CGV value = 0.0054, and AIC = -170.24Table 2. Compared to the previous model, the result of a model created with the parameters used is that the value of some factors that influence whether the model is good still has a smaller value than before.However, this model performs reasonably well and is almost the same as the previous one.The difference between this model and the preceding model is minimal, but the result of the SSN prediction value differs.

An Expanded Model SST, SSW, and Chlorophyll-a
All models built are the result of estimation from the correlation analysis that has been carried out from the results of the analysis of model 1 and model 2, the results are not too bad, so this model also uses three oceanographic parameters in the hope that the results of this model will improve from the previous models.This is done to find the model with the best SSN prediction value.This model utilizes the correlation value between parameters that do not differ much in value, including SST, SSW, and chlorophyll-a.The following model is by the equation 3. The fit statistics for this model are also the best among the models created, with D 2 = 67.8,R-adj = 0.614, CGV score = 0.0051, and AIC = -173.29 (Table 2).The model used a total of 13.68 (3 for intercept, 3.46 for SST term, 1.00 for SSW term, 7.22 for chlorophyll-a term), with P-value of SST and SSW (< 0.1), and chlorophyll-a (< 0).Of the four models that have been selected, this model has the smallest total df value among the others, which is 13.68 (Table 2).This model also has an R-adj value that is almost close to model 1, but when viewed from the D 2 value, this model is the best model that has been selected in third place with a value of 67.8%. Figure 6 shows nitrate concentrations that have been compared with nitrates that have been observed and predicted, yielding the same value as previous results.Model 3 is similar to Model 2, where the number of parameters used in making the model is three, but the difference is the parameter used.A comparison between the observed nitrate variables and the predicted nitrates using model 3 is seen from (R 2 ), model 3 has R2 = 0.105.This indicates that the comparison and correlation between the variables are very minimal because the value of R 2 is too small.

An Expanded Model SST, SSS, and Chlorophyll-a
This model is a development between model 2 and model 3.This model is almost the same as the previous model using SST, SSS, and chlorophyll-a parameters.The difference between this model and model 2 and model 3 is the combined utilization of SSS and chlorophyll-a.This model gives the following by the equation 4.
The fit statistics for this model were also the best among the models built, with D 2 = 67.3,R-adj = 0.605, CGV value = 0.0053, and AIC = -171.19 values for SST (< 0.01), SSS (< 1), and chlorophyll-a (< 0).This model has the lowest value when compared to the previous three models.Although this model has the lowest value, its performance is almost similar to the last three models.

Evaluation Model
According to four models built, the nitrate concentration values are lower than the maximum when viewed from the spread of values.Based on Figure 4 -7, model 1 experienced the highest R 2 , and model 4 had the lowest R 2 .The results of the analysis and evaluation of the models that have been made show that the performance of all models has not been maximized because the amount of data used in building the model is too small.All models, using four or three parameters, have less satisfactory results; it is characterized by the four models' values of R2, even none exceeding 0.5.

Conclusion
This research explores the potential of the Generalized Additive Model (GAM) to model and predict marine surface nitrate concentrations with several parameters, namely, SST, SSS, SSW, and chlorophyll-a.GAM has modelled predictions of sea surface nitrate concentrations around the coastal seas of Savu and the southern part of Sumba Island.The model's results still need to be improved, as the nitrate concentration spread values produced from the model are still below the maximum.This is because the amount of data in making a model is less or less than the maximum, significantly affecting the model's outcome.The model's results are seen by comparing the nitrates observed directly with the predicted nitrate values of models using R 2 , whereas the highest value of R 2 was only 0.357.The performance results of the models that have been produced are still less optimum in predicting sea-surface nitrate values.

Figure 3 .
Figure 3.Comparison between in-situ and remote sensing data using linear regression (a) insitu SST and Aqua-MODIS SST (b) in-situ chlorophyll-a and Aqua-MODIS chlorophyll-a (c) in-situ SSS and MIRAS SMOS SSS (d) in-situ SSW and ASCAT SSW

Figure 4 .
Figure 4. Scatterplot of observed and predicted nitrate model 1

Figure 5 .Figure 5
Figure 5. Scatterplot of observed and predicted nitrate model 2 Figure 5 shows the distribution of nitrates when compared to directly observed and predicted there are differences.The distribution of nitrate, both directly observed and predicted from model 2, has the same range of values as the results in model 1 Figure 4. What distinguishes the results of model 1 from model 2 is that the parameters used in model 2 are three, while model 1 uses four.In addition, what distinguishes the results of model 1 and model 2 is the coefficient of determination (R 2 ), in this case, the R 2 value generated from model 2 is smaller than that caused by the model, while model 2 has R 2 = 0.29.

Figure 6 .
Figure 6.Scatterplot of observed and predicted nitrate model 3

Figure 7 .
Figure 7. Scatterplot of observed and predicted nitrate model 4 The nitrate concentrations obtained from Model 4 have the same range of values as the previous models, the difference being the value spread.Model 4 is a model that has three parameters and is the same as Model 2 and Model 3, but the spread results will be different from one model to another.It is characterized by the R 2 value obtained from the smallest model among the three previous models.This factor is influenced by the parameters used in these models, which are less correlated.The value of R 2 from model 4 is 0.105.

Table 2 .
Final GAM model specifications, Estimated Terms, and Fit Statistics Reported by mcgv2 . The model used was 13.68 (3 for intercept, 3.96 for SST term, 1.00 for SSS term, 7.19 for chlorophyll-a term), with P-