Comparison between top and bottom of atmosphere Sentinel-2 image for mangrove mapping in Balikpapan Bay, East

. Sentinel-2 is high-resolution multispectral imagery that launched by the European Space Agency on June 23, 2015 for Sentinel-2A and March 7, 2017 for Sentinel-2B. The two satellites were launched with the aim of land monitoring studies, including vegetation, soil, and water cover, as well as the observation of inland waterways and coastal areas. In 2018, Sentinel-2 produced bottom-of-atmosphere (L2A) imagery derived from top-of-atmosphere (L1C), which has been atmospherically corrected using Sen2Cor algorithm. However, there is an overcorrection effect due to inaccuracies of digital elevation model, over-detection of clouds over bright targets, and miss-classification of topographic shadows. This research aims to explore the application of Sentinel-2 imagery for mangrove mapping by comparing two levels of data, including L1C and L2A. L2A is divided into two, namely L2A atmospherically corrected using the Sen2Cor method (L2A_Sen2Cor) and dark object subtraction method (L2A_DOS). The classification scheme was built based on in-situ data containing seven objects: water, clouds, built-up, cloud shadows, bare land, mangroves, and land vegetation using random forest classification. The comparison of each level of data is analyzed based on the spectral signature and accuracy assessment using confusion matrix. The result shows that there are differences in the spectral signature between L1C and L2A data because of atmospheric impacts. L2A outperforms L1C, as shown by the higher coefficient of determination (R 2 ). The accuracy is in the range of 93.7 – 95.4%, with the best accuracy shown by L2A_Sen2Cor.


Introduction
Remote sensing is one of the technologies that can be used to detect, monitor, and analyze objects and conditions on the Earth's surface [1][2] [3].This technology provides significant benefits by offering spatial and temporal information [4] [5][6] [7].The ease of accessing data from sources like as MODIS, Himawari, Landsat, and Sentinel, along with advancements in computer technology and cloud-based computing, accelerate the processing of remote sensing data [8] [9].
Sentinel-2 satellite imagery is a high-resolution multispectral image launched by the European Space Agency.Sentinel-2's mission is to monitor variability on the Earth's surface [10].To support this mission, Sentinel-2 is equipped with a twin-satellite system in the same orbit.Sentinel-2A was launched on June 23, 2015, and Sentinel-2B on March 7, 2017 [11].These satellites provide a high-frequency revisit time of five days at the equator, ensuring continuous observation and enabling long-term monitoring and improved change detection.Sentinel-2 carries the Multi-Spectral Instrument (MSI) with 13 spectral bands, including four bands at 10 m, six bands at 20 m, and three bands at 60 m spatial resolution [10].These specifications make Sentinel-2 imagery an alternative choice for mangrove mapping [12][13] [14] [15].
Since 2018, Sentinel-2 has released Level-2A products, atmospherically corrected Surface Reflectance (SR) images derived from the associated Level-1C products [10].The L2A product is processed using the Sen2Cor algorithm, which corrects atmospheric, terrain, and cirrus effects on Top of Atmosphere (TOA) data [10].This data is suitable for mangrove mapping as it has undergone radiometric correction [16] [17].However, ESA's provided L2A data still experiences overcorrection effects, especially on slopes, due to inaccuracies in the digital elevation model [18].
The identification of mangrove objects using satellite imagery is fundamentally related to the reflectance patterns of each wavelength bouncing off the objects and received by the satellite [12][13] [14][15] [19].Mangroves are plants that grow in coastal areas with brackish or saline water conditions [20] [21].Mangroves provide abundant benefits, including habitats for other living organisms, coastal protection, carbon absorption and storage, tourism, and as a source of fuel and building materials for local communities [22][23] [24].The environmental conditions of mangroves in areas with high soil moisture cause different spectral responses compared to terrestrial plants.It makes remote sensing data suitable for mangrove detection.Various studies have utilized Sentinel-2 for mangrove mapping, including mangrove detection [25][26] [9][27], mangrove density [28][29], mangrove health [30] [19], and mangrove species [31] [32].
Balikpapan Bay is one of the highly productive bays with a mangrove ecosystem serving as habitat, livelihood source, and ecotourism [33][34] [35].Additionally, massive developments such as the construction of the national capital can directly impact the ecosystem in Balikpapan Bay.The bay also has hilly topography [36], which could potentially be affected by overcorrection in Sentinel-2 L2A data provided by ESA.Furthermore, research related to the utilization of Sentinel-2 for mangrove mapping is still limited in Balikpapan Bay.Therefore, it is crucial to have accurate mangrove maps in this region.The objective of this study is to map mangroves at different data levels, namely, top of atmosphere (L1C) and bottom of atmosphere (L2A), using machine learning methods (random forest) in Balikpapan Bay, East Kalimantan.It is expected that this research will produce an accurate map of mangroves in the study area.

Data Collection
Sentinel-2A imagery with two data levels, L1C and L2A was taken on February 26, 2022 from sentinelhub.com.L1C represents data that has been geometric and radiometric correction, such as orthorectification and spatial registration using a global reference system [10].This study separates L2A data into two categories: L2A corrected with Sen2Cor method and L2A with dark object subtraction method (L2A_DOS).L2A_DOS was produced from the corrected L1C using the Semi-Automatic Classification Plugin in QGIS Software.Figure 2 uses a combination of SWIR, NIR, and green to show how mangroves appear at each data level.Mangroves are represented by a dark brown color.L2A_Sen2Cor data is L1C data that has been scene classification and atmospheric correction using the Sen2Cor algorithm [10].According to the ESA's data quality report from February 2022, there are several limitations associated with L2C_Sen2Cor data, such as the blocky patterns on the scene classification mask that lead to a local over-detection of clouds, some differences occur in the overlap area between adjacent tiles, bluish color in color composite and inaccurate surface reflectance due to inaccuracies of the digital elevation model, corrupted pixels affected by missing or degraded instrument source packets that can affect the surface reflectance of other spectral bands, and discontinuities visible in terrain correction on very flat areas [18].These limitations can lead to misclassification.
Sentinel-2 has thirteen bands with varying spatial resolutions ranging from 10 m to 60 m.This imagery has a temporal resolution of 10 days, but when both constellations are combined, it can be reduced to 5 days.In this study, 10 bands were utilized to match the number of bands in L2A_Sen2Cor and L2A_DOS data, allowing for a fair comparison (Table 1).In-situ data containing information obtained through field observations on August 21-25, 2023, and sentinel-2 data.Field observations were conducted using an Olympus Tough-6 camera equipped with a Global Positioning System.There are 1400 in-situ data points, with 200 points per object.This data was employed to build a model and validation of mangrove maps obtained from all three data.

Image Processing
Analysis related to the level data was conducted by understanding the spectral signatures of objects.The spectral signatures can be used to determine how each object reacts to different wavelengths.The classification method employed in this study is random forest.Two types of data are needed for random forest classification: training and test sets.This data came from insitu data.The in-situ data is divided into two parts, with 60% used for modeling (training data) and 40% for validation (test data).The classification scheme in this study is based on land cover, consisting of seven classes: water, clouds, built-up, cloud shadows, bare land, mangroves, and land vegetation.The classification results are classed into three primary classes: water, non-mangrove, and mangrove [41], and accuracy assesment is performed using a confusion matrix.In order to accomplish the research goal of analyzing input data for mangrove mapping in Balikpapan based on Sentinel-2A data levels, the processing scheme is illustrated in Figure 3.

Results and Discussion
The spectral signatures of each data level are used to examine understanding connected to the data levels (Figure 4).This signatures is formed due to the interaction between wavelengths and objects on the Earth's surface.The spectral signatures of the three data sets are identical in pattern, but the digital values vary escpecially L2A_DOS.L1C and L2A_Sen2Cor have same digital values in Band 5 through 12 but because of the atmospheric influence, there are variances in blue, green, and red band.Based on the interaction of wavelengths with mangroves, the low values for the blue and red bands are though to be caused by absorption and usage by chlorophyll for photosynthesis, reflecting the green and NIR bands, [42] [43].The mangrove map based on random forest classification for each data level are displayed in Figure 5. Spatially, it can be observed that the three datasets do not exhibit significant differences, but misclassifications are still evident in all three datasets.The two main causes of misclassifications are land vegetation and cloud shadows being classified as mangrove.More misclassifications are found in the L2A data, indicating that atmospheric correction increases the degree of uncertainty in the mangrove map and leads to higher levels of misclassification for non-mangrove objects [44].The spatial distribution suggests that mangrove mapping on L1C data using random forest technique produces more representative results than L2A.When using remote sensing data for mapping, misclassifications are expected, but these can be minimized by eliminating objects through an understanding-based approach related to the objects and study locations [1][45] [2].In this study, as a case study, mangroves found outside coastal areas can be eliminated to obtain more accurate mangrove map.The utilization of remote sensing technology is considered effective for mapping the extent of mangroves because it can reduce costs and time as well as lower the risk, especially in study areas like Balikpapan Bay inhabited by crocodiles.In this study, random forest classification was employed to generate a mangrove map in Balikpapan Bay.Based on accuracy assesment using confusion matrix shown in Table 2, random forest classification using the three datasets demonstrated an overall accuracy of 93.7%-95.4% and a kappa accuracy of 92.7%-94.7%.The accuracy ranking from highest to lowest is as follows: L2A_Sen2Cor, L2A_DOS, and L1C.Based on accuracy, L2A is superior to L1C.Research related to the utilization of Sentinel-2 for mangrove mapping is still limited in Balikpapan Bay, so a direct comparison cannot be made.However, the accuracy that this study produced can match that of earlier studies conducted in different places using the same satellite and methods [31] [26].
It is necessary to examine each object's distribution using producer and user accuracy (UAs and PAs) to know accuracy of each object.The overall accuracy of L1C data is 93.7%, which is less accurate than L2A data.The PAs are 94.18%, 87.01%, 93.02%, 91.76%, 91.95%, 98.86%, 98.83% for water, cloud, built-up, cloud shadows, bare land, mangrove, land vegetation classes, respectively.The UAs are 95.29%,98.52%, 88.89%, 95.12%, 88.89%, 97.75%, 93.40% for the classes in the same order.In L2A_DOS, the overall accuracy is 94.6%.The PAs 97.675, 90.90%, 88.37%, 92.94%, 93.10%, 98.86%, and 100% for water, cloud, built-up, cloud shadows, bare land, mangrove, and land vegetation classes, respectively.Meanwhile, the UAs are 95.45%,97.22%, 87.35%, 98.75%, 90.00%, 98.86%, and 95.55% for the classes in the same order.In contrast to the other two datasets, L2A_Sen2Cor has the highest overall accuracy (95.4%).The PAs are 96.51%,90.90%, 97.67%, 91.76%, 90.80%, 100%, and 100% for water, cloud, built-up, cloud shadows, bare land, mangrove, and land vegetation classes, respectively.The UAs are 95.40%,100%, 87.50%, 97.50%, 94.04%, 97.77%, and 97.72%.In comparison to the other two datasets, L2A data offers superior producer and user accuracy based on the precision of each object, particularly mangrove objects.We used producer and user accuracy to determine the misclassification in Figure 6.Producer accuracy (PA) indicates the percentage of truth that field sample data will be correctly classified in the image, while user accuracy (UA) indicates the percentage of probability that map users will find the correct information in the field [46].If the producer's accuracy value is smaller than the user's accuracy in a class, that class tends to be overestimated, otherwise it tends to be underestimated.The three data agreed that that build-up and land vegetation are underestimated while cloud and cloud shadow are overestimated.However, if we concentrate on mangroves, we can observe that L2A_DOS has a comparable value on PA and UA while L1C and L2A_sen2cor are underestimated.
Based on spatial distribution, L1C data shows more representative results compared to L2A data.L2A data shows more misclassifications than L1C because of atmospheric correction, which raises the level of uncertainty in the mangrove map [44].However, L2A_Sen2Cor shows better accuracy than the other two datasets when taking into account both overall accuracy and objects accuracy.There are no misclassifications caused by the limitation of Sentinel-2 DEM in hilly area, as reported in the ESA data quality report [18].

Conclusion
The overall accuracy of the mangrove map in Teluk Balikpapan using the three datasets is 93.7% to 95.4%, with the highest accuracy shown by L2A_Sen2Cor, which has better producer and user accuracy of each object compared to the other level data.However, based on the spatial distribution of mangroves, atmospheric correction leads to increased misclassification in L2A data.Misclassifications due to the limitations of the Sentinel-2 DEM are not evident in this study.For the specific case like this research, if the main target is a mangrove map, then L1C data is more relevant for understanding how accurately mangrove locations are identified in the images.However, if the goal is to obtain information about how accurately mangroves are classified, then L2A_Sen2Cor data can be used.

Figure 2 .
Figure 2. Composite bands of SWIR, NIR, Green to display mangroves in Balikpapan Bay which are marked with dark brown color at the data level.a) L1C, b) L2A_DOS, dan c) L2A_Sen2Cor.

Figure 4 .
Figure 4. Spectral signatures of mangrove.Colors represent data levels.Yellow for L1C, blue for L2A-DOS, and green for L2A_Sen2Cor.

,Figure 6 .
Figure 6.Comparison between producer and user accuraccy.Color represent data level.Yellow for L1C, blue for L2A-DOS, and green for L2A_Sen2Cor.