Strategy to minimize phenotyping in the selection of new table grape varieties

,


Introduction
The development and selection of new varieties of grapevine is a large process that implies the maintenance of thousands of plants in field during the juvenile period and, after, some years more for the evaluation of their morphological characters.This morphological evaluation involve a high economic and environmental cost, because need, on the one hand, for skilled labor capable of characterizing various descriptors in a large number of individuals ripening simultaneously and, on the other hand, an important consumption of resources to keep the plants in the field for an initial evaluation.Markerassisted selection (MAS) offers the possibility of accelerating the process with the consequent saving of resources.
Some of the main characters sought in the improvement of table grape are the absence of seeds, diversity of bright colors and berry shapes, different flavors as muscat or cotton candy, flesh crispy texture and large berry size.Diverse studies have been realized on the markers linked to the main major QTLs for the most important traits, as berry skin color [1], seedlessness [2][3][4][5] and muscat [6], and concrete alleles have been proposed for selection of individuals in breeding programs.Other minor QTLs have been described for traits as berry texture [7,8], berry weight [2,3] and phenology [9], but the application of these QTLs for MAS still does not give good results.
It should be noted that the application of MAS in different progenies can lead to different results, because of the distinct allelic combinations in parents.So, it is necessary verify previously the utility of the used marker in a concrete progeny, especially for minor QTLs.For this propose the use of core collections could be of great help.Core collections have been frequently applied on research purposes for diversity evaluation and association studies in large germplasm collections [10,11], but they could be also a useful tool for the evaluation of large progenies.
The technique of Genomic Selection (GS), introduced by Meuwissen et al. [12] consists of the estimation of genetic values (genomic estimated breeding values-GEBVs) for quantitative traits from the genome sequencing.It uses a training collection of individuals genotyped and phenotyped and a candidate collection of individuals exclusively genotyped, to which genomic values estimated from training collection are assigned.Fodor et al. [13] carried out a GS simulation study in grapevine obtaining a precision of 90% for simple traits and 50% for complex traits, and confirmed that the use of core collections as training collections leads to higher precision.In that simulation, they used large core collections (1000), but to go further we aim to apply the method to a real progeny by minimizing the size of the training collections by using nested collections [10].
The general aim of this work is to propose a methodology that minimizes the phenotyping work for thousands of individuals of breeding programs.To this purpose, we applied the concept of training collection of the Genomic Selection technique, and combined it with the traditional MAS, using SSR markers linked to diverse characters, in order to estimate the morphological traits in each individual from its genetic analysis with molecular markers linked to the traits of interest.

Plant material
The plant material used is a progeny of 2016 individuals established at the IMIDRA's Finca de La Isla (Arganda del Rey, Madrid, Spain), in organic farming and trained in double cord.This progeny was generated at the Finca de El Encín, from a series of controlled crosses made in 2012 between the varieties Flame Seedless (male parent) and Muscat of Alexandria (female parent).

Nested core collections and morphological descriptions
A general outline of the approach adopted is described in Figure 1.We propose to carry out a random selection of 15-20% of the progeny (in our case, they involved 392 individuals).This initial collection, without any selection, was evaluated by both genotypic and morphological data (27 markers and 12 phenological and berry/bunch traits).Morphological descriptions were realized in 2017.From the data obtained on this initial collection, a highly representative Training Collection should be obtained.In this work, a Training Collection of 27 hybrids was built using the Maximization strategy [14], implemented in MStrat 4.1 [15].Twenty core collections were obtained making 500 iterations, and that with the highest target score and Shannon index was selected.
As final step, the evaluation of the estimation of the phenotypes is proposed by using a new collection called "Evaluation Collection".This collection was built with the estimates of the phenotypes obtained for the complete progeny.A minimum representation of each estimated phenotypic state 5 times for the main traits (color, seedlessness and muscat flavor) was required.The parameters used for its building could be those proposed by Vargas et al. [16] for the maximization of minority classes in core collections using MStrat software.Finally, though it is not part of the proposed methodology, a quick scanning of berry skin color, seedlessness, muscat flavor and berry size was performed in 2020 in the complete progeny (1802 individuals), in order to verify the results obtained on the Evaluation Collection.

Markers selection
Twenty seven markers linked to quality traits were selected (Table 1): two associated to berry skin color, six for muscat flavor, two for seedlessness, three for berry weight, twelve for berry texture, two for fertility and five for phenology characters.Fourteen of them co-localize with 9 QTLs previously published [1-3, 5-9, 17-19], and the rest were selected from previous results obtained by the Laboratory of Plant Genotyping in IMIDRA (in prep.).
Information about segregation in the progeny was obtained from Zarouri et al. [20], who analyzed 270 SSRs in a diversity panel, of which 127 segregated in this progeny.

DNA extraction and genotyping
The complete progeny (2016 individuals) was genotyped for 27 SSR markers.DNA extractions were carried out from young leaves using the QiagenDN easy 96 Plant Kit (Hilden, Germany) with minor modifications.The 27 selected nuclear microsatellites were simultaneously amplified in a multiplex PCR.Multiplex PCR was performed in a final volume of 20 μl containing 1× Multiplex PCR Master Mix (Qiagen, Hilden, Germany), 5 ng of template DNA, and an equimolar amount of 0.2 μM of each primer pair in the concentration indicated in Table 2.The forward primers were labelled with four different fluorochromes to be later displayed together in the capillary electrophoresis (Table 2).Thermocycler conditions were those of Mix B, previously published by Ibáñez et al. [22].
The multiplex PCR was analyzed in an ABI 3130 Genetic Analyzer, and the fragments were sized with GeneMapper 4.0 using GeneScan™-500 LIZ™ Size Standard an internal marker (Applied Biosystems).

Statistical analysis
Correlation between morphological descriptions of different years was evaluated using Pearson test for quantitative traits and Kendall's Tau-b for qualitative traits (IBM SPSS Statistics 23).Association analysis between the described traits and molecular data in the Training Collection was carried out by the non-parametric test Kruskal-Wallis (IBM SPSS Statistics 23).Morphological descriptions of two years were used and molecular results were codified as genotypic and allelic data and both were tested.Correlation between imputed and observed phenotypes for seedlessness, berry color and muscat flavor was evaluated in the Evaluation Collection with chi-square (χ2) tests using 2×2 contingency tables and Cramer's V test using R. Correlation between imputed and observed phenotypes for flesh consistency and berry size in the Evaluation Collection was evaluated using the non-parametric test Kruskal-Wallis (IBMSPSS Statistics 23).Correlation between imputed and observed phenotypes for seedlessness, berry color, muscat flavor and berry size in the complete progeny (1,802 individuals) was evaluated using Kendall's Tau-b test (IBM SPSS Statistics 23).

Representativeness of the training collection
The Training Collection of 27 individuals obtained using MStrat showed a representativeness of 89% of the total diversity contained in the 392individuals used.The redundancy of the initial collection showed that the 100% of the representativeness would be reached with 43 individuals.

Phenotypic characterization in the Training Collection
The range of states of the different traits showed an equilibrated enough distribution.Only in size and weight berry, high levels of the OIV descriptor were not represented, probably because parents have small and medium size.Hard flesh consistency and late ripening date showed also low representation, which is consequent with the low representation of these states at global level.The distribution of the main interest traits studied in this work is showed in Figure 2. Correlation between years was significant at 0.01 level, showing correlation coefficients (r 2 ) in the range of 0.5-0.9 for all traits, except for ripening date.The reason for this exception could be that exceptional climatological conditions had place in 2021, which led to the freezing and death of high percentage of the buds.

Molecular characterization in the progeny
The progeny of 2016 individuals was analyzed with 27 SSR markers.Eleven markers were used to control plant sampling and to exclude self-fertilization and crosscontamination in the field.In particular, with the marker VVIB19, was obtained a fast information about selfpollinations, because its genotype has to be identical for all hybrids, so was not used in statistical analyses.The percentage of self-pollination was of 10.2%, in 2016 genotyped individuals.Moreover, five cross contaminations and three triploid individuals were detected.Discarding these individuals, a total of 1802 individuals were used for MAS study.
The marker VVIQ67 presented null alleles.

Correlation between molecular and phenotypic data in the Training Collection
Molecular data were tested in genotypic and allelic mode with Kruskal-Wallis.Significant genotypic and allelic associations (at 0.01 level) for two years were detected between color and the markers VV6718 and VMC7g3 and between seedlessness and the marker p3_VvAGL11.The VV6718 and VMC7g3 markers co-localize on chromosome 2, at a distance of 2 Mbp and very close to the MybA1 gene, which is most responsible for color [23].In Figure 3 (A-B) the presence/absence of the alleles that showed significant associations is represented against the trait of color of the skin.The 182 base pair allele of the marker VV6718 shows a higher correlation with the presence of color than the 119 allele of VMC7g3, in which it has been observed two white individuals that do carry the 119 allele and one red that does not.
In the case of seedlessness, two markers located close to the QTL involved were evaluated, but only the p3_VvAGL11 marker seems to be able to detect this trait in the progeny.A possible association with two alleles of the same marker was found.The linkage of this marker with the trait of seedlessness has been confirmed in previous works [3,24].The gene VvAGL11 belongs to the D-lineage of MADS-box genes that control ovule identity [3], and this marker, designed inside the gene, is so far the most efficient tool for MAS.Probably, the marker Chr18b, located 7 Mbp from p3_VvAGL11, and with which significant associations were previously obtained in a diversity panel of more than 400 varieties (data not shown), might not be useful for MAS in this progeny.Figure 3 (C-D) shows the distribution of the trait versus presence/absence of the allele associated with seedlessness (197 base pair) and the allele associated with presence of seed (187 base pair).
Due to the complexity observed in the stenospermocarpic seedlessness in terms of diversity in the size and number of rudimentary seeds, the fresh seed weight character was evaluated, obtaining a significant association with p3_VvAGL11 also at the 0.01 level for both years, as can be seen in the box-and-whisker plot (Fig. 4 A).Regarding the muscat flavor, the markers VvZAG79 (allele 255) and FAM12 (allele 361) showed significant associations at level 0.01 for 2020and at level 0.05 for 2021.Both markers co-localize on chromosome 5, next to the QTL previously published for muscat flavor [6], at a distance of approximately 2 Mbp from the gene VvDXS.The presence of these alleles is associated to both neutral and muscat flavor, but the absence of them exclusively implies neutral flavor (Fig. 3 E-F).So, this marker could be used in MAS to discard individuals with neutral flavor.Emanuelli et al. [25] confirmed the role of VvDXS in the muscat flavor in grapevine and identified a SNP as probable responsible for the flavor in most of muscat varieties, like so other nucleotide variations in three muscat-like aromatic mutants in the coding region of this gene.Based on this SNP, different assays for targeting Muscat-flavored grapevine genotypes, as HRM (highresolution melting) or digital PCR (dPCR) have been developed [26,27].For future works with SSRs, new markers should be designed closed to the described mutation.
Other significant associations were detected at 0.05 level.One association was detected between firmness of flesh (2020 and 2021) and the 210 base pair allele of the marker VChr18a-151R (Fig. 4 B), which had been previously obtained at a 400 varieties diversity panel (results not shown).Carreño et al. [7] and Correa et al. [8] detected a QTL for berry firmness in a region of the chromosome 18 where other QTLs had been detected for berry size and seedlessness.The marker VChr18a-151R is localized at a distance of 17 Mbp from the mentioned locus, and at a distance of 2Mbp from other QTL for berry weight [5], so it could be a new candidate QTL for the study of berry firmness, given the relationship between berry firmness and weight.For the marker VMCNG2H2.2, the association with the QTL detected by Correa et al. [8] and Wang et al. [28] on the chromosome 8 for berry firmness was not detected with this progeny, though it had been detected in a previous project on a diversity panel (no published).
Different associations were detected between berry width, length and weight and the markers VVIS58-378R (Chr7) and VVIP33 (Chr15) (Fig. 4 C-F), of which the last have been previously published [2,3].The marker VVIS58-378R associated to berry width and weight in 2020, and VVIP33 associated to berry width and length in 2020 and 2021 and to berry weight in 2020.
Muscat flavor showed association for two years (2020 and 2021) with the marker UDV-026, localized on chromosome 8, and in 2020 with the markers UDV-095 and VVIN70, localized on chromosome 14 at an approximate distance from each other of 4 Mbp (Fig. 4 G-H).The effect of these markers is similar to that showed on chromosome 5, the absence of the allele associates to neutral flavor.Maybe by adding the effect detected in these possible new QTLs to that of the gene VvDXS, could be possible explain a higher percentage of the variability and improve the use of MAS for this trait.
Ripening date associated in 2021 with the markers VVIP33 and VMC4D9-2, located on chromosome 15 about 2 Mbp away.This QTL was detected by Grzeskowiak et al. [9] for budburst and veraison beginning and the marker VVIP33 showed association with ripening in a previous work of our group on a diversity panel (no published).The exceptional climatological conditions of 2021 could have derived in a shorter interval of ripening for all individuals in comparison to previous years.This tendency is that expected for next decades, as a consequence of the climatic change, so the confirmation of a QTL for ripening date influenced by environmental effect could be interesting for the selection of individuals with major adaptation to climate change.

Phenotype estimation in the progeny in function of the molecular data
Phenotypes were assigned in the complete progeny for the traits that showed significant associations at 0.01level (color, seedlessness and flavor).For color, all individuals of the progeny with the allele 182 at the marker VV6718 were estimated as "red", while the individuals without that allele were estimated as "white".For seedlessness, all individuals with the allele 197 at the marker p3_VvAGL11 were estimated as seedless while individuals without it were estimated as seeded.Muscat flavor showed identical association with markers FAM12 (allele 361) and VvZAG79 (allele 255) in the Training Collection, but the linkage of these alleles was not complete in the 1,802 individuals of the progeny, since 146 individuals with the allele 255 of the marker VvZAG79 did not present the allele 361 of the marker FAM12, and 105 individuals with the allele 361 of the marker FAM12 did not present the allele 255 of the marker VVZAG79.In consequence, both markers were combined to estimate phenotypes and only individuals with both associated alleles were classified as "muscat flavor".

Correlation between estimated and observed phenotypes in the evaluation collection
To evaluate the correct estimation of the phenotypes from molecular data, a new collection of individuals was built, and morphologically described for the traits of interest.The resulting distribution of the estimated traits from molecular data on the 26 hybrids was 11 red and 15 white, 18 muscat and 8 neutral, 18 seedless and 8 seeded.
Significant correlation at 0.01 level was obtained between estimated and observed phenotypes in 2021 for color and seedlessness.Cramer's V correlation coefficient was of 0.856 for color and 0.916 for seedlessness.Respecting color, all estimated phenotypes were coincident with observed phenotypes, except two individuals estimated as white and observed as red (Fig. 5).
In the case of seedlessness, only one individual estimated as seedless was observed as seeded (Fig. 5), that corresponds to 3.84% false positives, in the range of that obtained by Bergamini et al. [24] using the same marker (1.68% false positives).
Regarding muscat flavor, the chi-square test did not show significant results.Nevertheless, all individuals "muscat flavor" observed, except one, had been estimated as "muscat flavor" from molecular data.The alleles 361 (FAM12) and 255 (VvZAG79) were present in individuals with and without muscat flavor, but the absence of them implies a probability of 87% of absence of muscat flavor according to this study.In consequence, both markers could be used to discard individuals in a progeny since around the 50% of them would not present muscat flavor.Based on the estimated characters, it is considered that, within the progeny tested, 50.3% of the hybrids present color on the grape skin at different intensities, compared to 49.7% that present green-yellow skin color.It has been estimated that there are a total of 867 seedless hybrids (48.1%) with a reliability of around 90%.On the other hand, the data indicate that 806 individuals with a probability of 39% should have muscat flavor.Therefore, the number of individuals with muscat flavor in the progeny is probably around 314.These data indicate that, in function of each desired character, the number of individuals discarded in large progenies in an early stage of cultivation could be elevated, representing, for example for seedlessness, the half of the progeny.
Finally, flesh firmness and berry size were characterized too in the Evaluation collection.Correlation for observed and estimated values in flesh firmness was not significant, but in berry size a significant association at 0.05 level was observed with the allele 276 of the marker VVIS58-378R, located in the chromosome 7 (p = 0.046).Presence of this allele correlates with higher berry size (Fig. 6).These results suggest that 27 individuals could be not enough to estimate quantitative traits and the number of the individuals in training and evaluation collections should be lightly increased.
In order to confirm the reliability of the results obtained for the Evaluation collection, given its small size that could derive in false negative or positive results, main traits were described in the complete progeny too.In this case, all correlations between estimated and observed data were significant at 0.01 level.The degree of correlation was high for skin color (correlation coefficient of 0.8), medium for seedlessness (0.6) and low for muscat flavor (0.2) and berry size (0.1).The fact of obtaining significant results at 0.01 level for muscat flavor (not significant in Evaluation collection) and for berry size (significant at 0.05 level) when analyzing the complete progeny, points to a probably insufficient number of individuals for complex traits search.Nevertheless it is sufficient for traits linked to major effect QTLs.In base to the results obtained, if this technique is used, traits showing associations at 0.05 level would require to lightly increase the Training Collection for higher reliability.

Conclusions
In this work, an approach to reduce phenol typing work in large table grape progenies is proposed.To this objective we applied the concept of training collection of the Genomic Selection technique, and combined it with the traditional MAS, in order to estimate the morphological charactersin each individual in a certain progeny.The reduction of phenotyping work is achieved through the construction of core collections.The establishment of the minimum size of the core collections and the more adequate methodology to select the individuals is an objective to evaluate in next years, especially with the aim of capturing QTLs of minor effect for quantitative characters.
The effectiveness of this approach depends on the previous information on markers linked to the traits of interest for each progeny.In this work, the most commonly used traits in the selection of new table grape varieties were included, but any other could be analyzed.

Figure 1 .
Figure 1.General outline of the proposed approach.
Nine traits were described in Training and Evaluation Collections and in the parents according to the OIV descriptors: Berry weight (OIV code 503), berry width (220), berry length (221), particularity of flavor (236), formation of seeds (241), weight of seeds (243), firmness of flesh (235), berry skin color (225) and time of physiological stage of full maturity of the berry (304).They were described for two years (2020 and 2021) in the Training Collection and for one year (2021) in the Evaluation collection.Bunches were harvested when they reached 19-22 º brix.

Figure 3 .
Figure 3. Allelic associations detected for skin color (A-B), seedlessness (C-D) and muscat flavor (E-F).The trait is represented against the presence/absence of one concrete allele of one marker.Only the year 2020 is showed.

Figure 4 .
Figure 4. Different allelic associations detected in the Training Collection.In each figure is represented one character evaluated against the presence or absence of an allele of a specific marker.A) Fresh seed weight-Marker p3_AGL11.B) Flesh firmness-MarkerVChr18a-151R.C and D) Berry width and weight-MarkerVVIP33.E and F) Berry width and weight-Marker VVIS58-378R.G) Muscat flavor-Marker UDV-026.H) Muscat flavor-Marker UDV-095.I) Ripening date-Marker VVIP33 (2020).J) Ripening date-Marker VVIP33 (2021).

Figure 6 .
Figure 6.Allelic association detected for berry size with Kruskal-Wallis (p = 0.046).The observed data of the trait are represented against the presence/absence of the allele 276 base pairs of the marker VVIS58-378R.

Table 1 .
[21]markers selected and references of the QTLs involved."Mbp"indicatesthe markerposition in the chromosome according to 12X.2 version of the grapevine reference genome sequence from The French-Italian Public Consortium (PN40024)[21].
*Results still not published obtained in previous projects.

Table 2 .
Primers concentration and fluorochrome label in multiplex PCR.