Genetic structure and diversity of Santa Inês sheep flocks in Brazilian Mid-North 2 3

Objective
The genetic structure and diversity of Santa Inês sheep flocks from the Mid-North sub-region of Brazil were assessed using microsatellite markers.


Methods
A total of 257 DNA samples from animals raised in six farms were genotyped using a panel of 20 microsatellite loci. Different programs were used to assess the influence of null alleles on genetic differentiation estimates, the probability of loci being under selection, and to calculate different parameters to assess the genetic variability and the power of markers to determine the kinship among flocks. Deviations from the Hardy-Weinberg equilibrium and linkage disequilibrium were tested using the R package genepop. A Bayesian clustering analysis was performed using the STRUCTURE program. The R packages poppr, adegenet, ape, polysat, and ggplot2 were used to construct a dendrogram and perform the principal coordinates analysis (PCoA). The Wilcoxon sign-rank test was conducted to detect population bottlenecks. Network graphics were constructed to assess the bidirectional distribution of the gene flow among flocks.


Results
The average values obtained for the number of alleles per locus, expected heterozygosity (He), polymorphism information content (PIC), discriminatory capacity, the combined probability of identity, and the probability of exclusion for the markers were 15.4, 0.886, 0.877, 0.954, 0.025, and 0.920, respectively. The lowest degree of genetic variability was observed in Farm 6, i.e. He (0.700), PIC (0.653), and allelic richness (Ar) (3.760), whereas Farm 1 had the highest values of He (0.890), PIC (0.882), and Ar (4.690). Signals of genetic bottleneck and moderate genetic differentiation were observed in all flocks. The migration rates in all flocks were high, with a trend towards Farm 1.


Conclusion
There is moderate structuring and high genetic diversity in the flocks evaluated. It is necessary to review the management strategies, because of the signals of bottleneck and genetic erosion.


INTRODUCTION
calculating the following parameters: observed heterozygosity (H o ); expected heterozygosity 135 (H e ); and the number of alleles per locus (A). The allelic richness (Ar) and the private allelic 136 richness (PvAr) were obtained using the program Hp-Rare [12]. The polymorphic information 137 content (PIC) was estimated using the Cervus software [13]. 138 The probability of exclusion (PE), the probability of identity (PI), and the combined 139 probability of identity (CPI) were obtained using the Genalex program [14] in order to 140 investigate the power of markers to assess the kinship among flocks. The discriminatory 141 capacity was estimated using the FORSTAT program [15]. Deviations from the Hardy-

142
Weinberg equilibrium (HWE) and linkage disequilibrium were tested using the R package 143 Ggenepop [16]. Subsequently, the Bonferroni correction was implemented to prevent 144 erroneously significant estimates (p < 0.05). 145 For the analysis of genetic differentiation among flocks, the parameters F ST , R ST , and 146 D est were evaluated using the SPAGeDi software [17] and the R package diveRsity [11]. R ST is 147 a population genetic distance that is analogous to F ST , but that includes the information of 148 allele size. D est is a parameter of real differentiation based on the proportion of unique alleles 149 of each subpopulation [18]. 150 Also using the SPAGeDi software [17], we performed the allele size permutation test 151 in order to verify the level of markers that fit the stepwise mutation model (SMM) (R ST > 152 pR ST , in whereich pR ST represents the mean of R ST after 10,000 permutations).

153
A Bayesian clustering analysis applying Markov Chain Monte Carlo estimation was 154 performed to assess the relationship among flocks. For this, we adopted the admixture model 155 implemented in the STRUCTURE software v2.3.2 [19]. For the purpose of estimation, in In 156 each of the five runs performed for each optimal group (K) (20 simulations each), the first 5 × used. The ΔK method [20] was used to determine the value of K that best fit the data 159 using the STRUCTURE HARVESTER program v.0.6.1 [21]. The R package pophelper 160 POPHELPER v.2.2.9 [22] was used to calculate the mean of the five replicates of the best K 161 and to generate a final bar plot illustrating more clearly the array among genotypes.

162
A dendrogram using Neighbor-Joining algorithm based on Nei's genetic distance was 163 constructed using the R packages Ppoppr, ADEadegenet, Aape, Ppolysat, and Gggplot2. The 164 principal coordinates analysis (PCoA) based on a shared allele distance matrix was performed 165 using the same packages mentioned above in order to assess the distribution profile of the 166 genotypes in a 2D plane.

167
The Wilcoxon sign-rank test was conducted carried out to detect population 168 bottlenecks using the Bottleneck program v.2.1.02 [23]. Generally, the genetic bottleneck 169 effect makes the HWE heterozygosity higher than the heterozygosity under mutation-drift 170 equilibrium. Thus, for these analyses, we considered that the markers were according to 171 mutational models that aim to determine the expected number of alleles based on the observed 172 heterozygosity. These following three models were used: stepwise mutation model (SMM)

173
[24], that assumes that a mutation in a locus will result in the gain or loss of a repeat unit; 174 infinite allele model (IAM) [25], in which a new mutation always generates a new allele; and 175 the two-phase mutation model (TPM) [26], that assumes that mutations may occur gradually 176 or stepwise. The following parameters were adopted for the TPM: 95% single-step mutations 177 and 5% multiple-step mutations (with variance among multiple steps of 12).

178
Network graphics were constructed using the function divMigrate of the R package

183
A c c e p t e d A r t i c l e Performance of the microsatellite markers 185 Possible null alleles were detected for nine of the 20 microsatellite loci used. However, only 186 one locus was discarded from the analyses because its frequency was greater than 0.20. As 187 null alleles with a frequency lower than 0.20 do not compromise population evaluations [27], 188 these alleles were kept in the current study. No significant differences (p > 0.05) were 189 detected between the parameters of global genetic differentiation before (F ST = 0.052) and 190 after (F ST = 0.201049) the ENA correction (Table 3).  (Table 3).  Table   199 3). Half of the loci were neutral and the other half showed evidence of selection ranging from 200 strong (0.91 to 0.97) to decisive (0.99 to 1.00) ( Table 3). When all the sampled animals were 201 considered in the evaluation, the markers showed high performance, high polymorphism, and 202 high discriminatory capacity between individuals, with mean PIC and DC values of 0.877 ± 203 0.05 and 0.954 ± 0.02, respectively (Table 3).

204
When the 257 animals were considered to evaluate the capacity of the 20 markers to 205 infer determine the kinship, the PI values were betweenranged from 0.009 (CSRD247) and to  (Table 3).

209
The minimum level of genetic diversity (H e ) of the markers ranged fromwas 0.760    (Table 4).

224
The allele size permutation test detected the influence of the stepwise mutation model 225 on the markers used in the current study. The R ST value (0.0946) was significantly higher than 226 the pR ST (0.04838) (p-value = 0.005) ( Table 3). This suggests that mutation is influencing the 227 genetic differentiation among some flocks. Nevertheless, when each locus was considered 228 separately, most of the loci did not show the condition mentioned above. This suggests 229 indicates that the microsatellite markers used in this study could be also fitting multiple-step   In the present study, the high level of genetic diversity of the flocks was, in part, 292 probably due in part to the miscegenation occurred in their origin and the system of non-293 random mating adopted by breeders. In this system, they prioritize the use of breeding

451
A c c e p t e d A r t i c l e

463
A c c e p t e d A r t i c l e

469
1) The details about the flocks are were shown in Table 1.

470
A c c e p t e d A r t i c l e

472
1) The details about the flocks were shown in Table 1.