Genetic characteristics of Korean Jeju Black cattle with high density single nucleotide polymorphisms

Objective Conservation and genetic improvement of cattle breeds require information about genetic diversity and population structure of the cattle. In this study, we investigated the genetic diversity and population structure of the three cattle breeds in the Korean peninsula. Methods Jeju Black, Hanwoo, Holstein cattle in Korea, together with six foreign breeds were examined. Genetic diversity within the cattle breeds was analyzed with minor allele frequency (MAF), observed and expected heterozygosity (HO and HE), inbreeding coefficient (FIS) and past effective population size. Molecular variance and population structure between the nine breeds were analyzed using a model-based clustering method. Genetic distances between breeds were evaluated with Nei’s genetic distance and Weir and Cockerham’s FST. Results Our results revealed that Jeju Black cattle had lowest level of heterozygosity (HE = 0.21) among the studied taurine breeds, and an average MAF of 0.16. The level of inbreeding was −0.076 for Jeju Black, while −0.018 to −0.118 for the other breeds. Principle component analysis and neighbor-joining tree showed a clear separation of Jeju Black cattle from other local (Hanwoo and Japanese cattle) and taurine/indicine cattle breeds in evolutionary process, and a distinct pattern of admixture of Jeju Black cattle having no clustering with other studied populations. The FST value between Jeju Black cattle and Hanwoo was 0.106, which was lowest across the pair of breeds ranging from 0.161 to 0.274, indicating some degree of genetic closeness of Jeju Black cattle with Hanwoo. The past effective population size of Jeju Black cattle was very small, i.e. 38 in 13 generation ago, whereas 209 for Hanwoo. Conclusion This study indicates genetic uniqueness of Jeju Black cattle. However, a small effective population size of Jeju Black cattle indicates the requirement for an implementation of a sustainable breeding policy to increase the population for genetic improvement and future conservation.


INTRODUCTION
Cattle are an integral part of animal agriculture since 8000 BC, when it is thought that they become domesticated in different parts of the world such as India, Middle East and North Africa [1]. Different cattle breeds have been domesticated and adapted throughout the world due to variable geographical and climatic conditions. Jeju Black cattle (JJBC; Jeju Heugu) is one of the indigenous cattle breeds in the Jeju Island, south of Korean peninsula. The JJBC are thought to be originated from the native cattle in Korea main land according to island model of speciation [2]. The evidence of ancient cattle bones from the archaeological sites in Jeju Island suggests the existence of the breed approximately 1,100 to 2,000 years ago. DNA analysis of bones recovered from Gonaeri and Gwakji-ri in Aewaleup, Jeju city, indicates that ancestors of the present JJBC had been raised by humans since prehis-toric times [3]. Also, the historical documents (Annals of the Choseon Dynasty) and the paintings found in the mural (Anak Tomb no. 3, during Goguryeo Dynasty in 357 AD) support cattle existence in Korean peninsula. However, there are several reports about cattle originating in a controversial way [4]. Whatever their origin, JJBC has been categorized as an endangered species due to a substantial shrinkage in population size until 1980s. It is reported that registered JJBC comprise approximately 619 individuals (Korea Seed Stock Database), while other sources indicated that the population size might be 400 to 500 [5,6]. Due to the small size of the JJBC population, it is essential to evaluate and monitor the level of inbreeding, which is an important parameter to assess the genetic diversity of the breed.
JJBC are adapted to the subtropical environment of the Jeju Island. Also, beef of JJBC is rich in oleic acid, linoleic acid, and unsaturated fatty acids, which make it a premium quality for Korean consumers. However, in the past decades, this indigenous breed was paid little attention by beef cattle producers due to their slow growth and thus were was not competitive compared to 'Hanwoo' , a breed in mainland of Korea that has been extensively bred for superior meat quality.
Genomic studies using high throughput whole genome sequencing data have become popular in recent years. Single nucleotide polymorphisms (SNP) are one of the common genetic variants for any organism, and genotyping with a high density SNP microarray chip provides genome information in an efficient and cost effective manner. Many useful genetic parameters such as linkage disequilibrium (LD), effective population size (N e ), inbreeding coefficient, levels of heterozygosity, etc. can be estimated with high density SNP chips [7][8][9], so as to provide powerful tools to study evolutionary and conservation biology.
Development of sustainable breed improvement strategies depends on the precise characterization of animal genetic diversity [10]. Although several studies have investigated the diversity pattern of Korean cattle along with JJBC [2,5,11], it is still controversial whether Jeju Black are a separate breed. Thus, an accurate definition of breed origin for JJBC is necessary for future conservation and improvement programs. Genetic diversity study with microsatellite markers often causes estimated values of greater genetic differentiation than SNP markers [12,13]. Moreover, SNP markers give more accurate estimates in population admixture analysis than pedigree information [14]. In this study, genetic characteristics and population structure of JJBC breed were investigated with high density SNP chips, and were compared with Hanwoo, seven Bos indicus and Bos taurus breeds, by analyzing allelic richness (A R ), level of inbreeding (F IS ), effective population size (N e ), and genetic distances between the breeds.

Animals and genotypes
Animal ethics approval statement was not applicable as the Hanwoo DNA was extracted either from commercial bull semen straws or from tail hair samples obtained from different farmers with the permission of the owners. A total of 373 animals from nine cattle breeds were chosen to study, including two indicine breeds (Brahman, 15; Nelore, 26), seven taurine breeds (Angus, 27; Holstein, 76; Hereford, 25; Hanwoo, 66; Jeju Black, 78; Brown Wagyu, 10; and Black Wagyu, 50). Hanwoo DNAs were extracted either from AI bull semen straws or tail hair samples obtained from different farmers with the permission of the owners, and JJBC DNAs were prepared either from AI bull semen straws or tail hair in Jeju Island, Korea. Holstein DNAs were collected from semen straws provided by Nonghyup Dairy cattle improvement center. All other cattle DNAs were provided by Texas A & M University, USA, except the two Japanese Wagyu breeds. No ethics statement was required for the collection of DNA samples from the Brahman, Nelore, Angus, and Hereford cattle, because DNA samples were provided by the two authors in USA under their rules and regulations. The data of two Japanese Wagyu breeds were provided by the authors in Japan, which was previously used and published. Genomic DNA purification and genotyping was accomplished by DNA Link Inc. (Seoul, Korea), a commercial genome analysis service provider in Korea.

Single nucleotide polymorphism genotyping and assembly of data sets
The samples of all breeds in this study except the two Wagyu breeds were genotyped using a customized Affymetrix 150K SNP Axiom array by DNA Link Inc. (Seoul, Korea), which included the 50K SNPs from the Bovine 50K v.3 Bead Chip (Illumina, San Diego, CA, USA). The samples of the two Wagyu breeds were genotyped with the Illumina 50K SNP chip. All the genotyped data were then merged and the common SNP markers on autosomal chromosomes were selected, resulting in 45,526 SNP markers in the final dataset across the breeds.
Quality control and filtering of single nucleotide polymorphism markers SNP quality control and filtering were performed across the nine cattle breeds to remove SNP markers with less than 95% call rate and animals with less than 95% call rate with PLINK software program [15]. This process resulted in 41,186 SNPs in 372 animals across the breeds. SNP quality filtering was also performed using PLINK version 1.9, and SNP markers with high LD were pruned using the following parameter option; -indep pair wise command--indep-pairwise 50 5 2 (SNP window size, 50; SNP markers shifted per step, 5; r 2 threshold, 2), because pruning of SNP markers in high LD can counter the effect of ascertainments bias and generate meaningful comparison between breeds [16]. After pruning, a total of 18,524 SNP markers were finally selected for analysis.

Estimation of genetic diversity within breed and population differentiation
Genetic diversity within cattle breeds can be estimated by expected heterozygosity (H E ), observed heterozygosity (H O ), and inbreeding coefficient (F IS ), for which the parameters were calculated using R software package divRsity v1.9.9 [17]. Analysis of molecular variance (AMOVA) to determine the partition of genetic diversity was performed with ARLEQUIN v3.5 [18]. Population differentiation was calculated by pairwise F ST estimates according to the approach of Weir and Cockerham' s [19] with R package, divRsity v1.9.9.

Population structure analysis
Population structure of the studied cattle breeds were also carried out using the software ADMIXTURE v1.3. [20], which enables an unsupervised clustering of large numbers of samples and allows the incorporation of each individual cattle breed into a mixture of clusters. In ADMIXURE, a modelbased estimation of individual ancestry was applied for a range of prior values of K defined by the user. To elicit the true number of genetic populations, i.e. K clusters among nine cattle breeds; a cross validation (CV) approach was used to determine the most likely number of populations (K) in the SNP data. The best possible number of ancestral populations (K) was inferred through 3 to 11 pre-assumed populations. For each tested value of K in ADMIXURE, the proportion of each individual's genotype was estimated for clustering, showing a preferable value of K with a low CV error compared with other K values.

Principal component analysis
Principal component analysis (PCA) determines breed relationships that are based directly on allele frequencies by using a multivariate method, in which the information from a large number of alleles and loci is condensed into a few synthetic variables known as principal components (PC) [21]. PCA was carried out to infer relationships between the nine cattle populations by using PLINK v1.9.

Genetic distance
Phylogenic analysis on the basis of SNP data has become an important tool for studying evolutionary history of any organism. In this study, phylogenic tree was constructed by calculating Nei's genetic distances (DA) [22] with Poptree 2 [23] program. Neighbor-Joining (NJ) method was applied for measuring Nei's genetic distance (D A ), which can be defined as 8 history of any organism. In this study, phylogenic tree was constructed by calculating Nei's gene 172 distances (DA) [22] with Poptree 2 [23] program. Neighbor-Joining (NJ) method was applied f 173 measuring Nei's genetic distance (DA), which can be defined as where xij and yij are the frequencies of the i-th allele at the j-th locus in populations X and Y, respective 178 mj is the number of alleles at the j-th locus, and r is the number of loci used.

180
Linkage disequilibrium 181 To measure the extent of LD, Lewontin's D′ [24] and Hill's r 2 [25] are widely used. However, r 2 182 preferred for association studies due to its robustness, simplicity and not sensitivity to changing ge 183 frequency and effective population size [26]. The r 2 estimator represents a squared correlation coefficie 184 (r) between two variables (alleles) at two separate SNP marker loci [27]. PLINK software [15] was us 185 for estimation of r 2 parameter: where, Pab represents the frequency of haplotypes consisting of allele a at the first locus and allele b at t 190 second locus [28]. Ther 2 values were calculated for each chromosome in each breed, as well as acro 191 breeds, so that correlations between all possible SNPs were tested and there was no r 2 threshold, in ord where x ij and y ij are the frequencies of the i-th allele at the j-th locus in populations X and Y, respectively, m j is the number of alleles at the j-th locus, and r is the number of loci used.

Linkage disequilibrium
To measure the extent of LD, Lewontin's D′ [24] and Hill's r 2 [25] are widely used. However, r 2 is preferred for association studies due to its robustness, simplicity and not sensitivity to changing gene frequency and effective population size [26]. The r 2 estimator represents a squared correlation coefficient (r) between two variables (alleles) at two separate SNP marker loci [27]. PLINK software [15] was used for estimation of r 2 parameter: where xij and yij are the frequencies of the i-th allele at the j-th locus in populations X and Y, resp 178 mj is the number of alleles at the j-th locus, and r is the number of loci used.

180
Linkage disequilibrium 181 To measure the extent of LD, Lewontin's D′ [24] and Hill's r 2 [25] are widely used. Howe 182 preferred for association studies due to its robustness, simplicity and not sensitivity to chang 183 frequency and effective population size [26]. The r 2 estimator represents a squared correlation co 184 (r) between two variables (alleles) at two separate SNP marker loci [27]. PLINK software [15] w 185 for estimation of r 2 parameter: where, Pab represents the frequency of haplotypes consisting of allele a at the first locus and allel 190 second locus [28]. where, P ab represents the frequency of haplotypes consisting of allele a at the first locus and allele b at the second locus [28]. Ther2 values were calculated for each chromosome in each breed, as well as across breeds, so that correlations between all possible SNPs were tested and there was no r 2 threshold, in order to encapsulate all possible linkage interaction between SNPs per chromosome. To display the decay of LD, distances of pair-wise SNPs were binned into twenty types of intervals (0 to 1 kb, 10 kb intervals starting from 1kb up to 100 kb and 100 kb interval starting from 100 kb up to 1 Mb). For each chromosome, r 2 values were then sorted by inter-SNP distance, and were averaged across the afore-mentioned intervals to observe possible r 2 patterns with increasing inter-SNP distances.

Effective population size (N e )
N e was estimated using SNeP v1.1 by Barbato et al [29], which was based on LD data with the following formula suggested by Corbin et al [30], Effective population size (Ne) 199 Ne was estimated using SNeP v1.1 by Barbato et al [29], which was based on LD data with the following 200 formula suggested by Corbin et al [30], where N T(t) represents the past effective population size with t generations ago, C t represents the recombination rate, t generations ago in the past for specific physical distance between markers calculated by the SNeP tool, r 2 adj represents LD value adjusted for sample size, and α = (1, 2, 2.2) is the correction factor (constant) for the occurrence of mutation. The recombination rate was calculated using the following equation suggested by Sved [31], The data sets for each sub-population, as well as the merged dataset, were grouped into 20 distance bins of 10 to 100 kb each. N e estimates were subsequently obtained from the r 2 values for the average distance of each distance bin. Table 1  In a structured population, the fixation index, F represents the degree of reduction in heterozygosity relative to Hardy-Weinberg expectation. F IS measures the heterozygosity of individuals (I) relative to the subpopulation (S) represented by non-random mating (inbreeding), so that negative F IS value means less inbreeding. The F IS ranged from -0.018 in Black Wagyu to -0.118 in Brown Wagyu cattle. JJBC had the F IS value of -0.076, while -0.025 in Hanwoo ( Table 1).

Analysis of molecular variance and population differentiation
The average Wright's F-statistics that were estimated with 20,000 bootstraps over loci were of values, F ST = 0.173, F IS = -0.030 and F IT = 0.148 (Table 2). F IT measures the genetic differentiation within individuals of the total population. F ST is the measure of the genetic differentiation between breeds, for which the value of 0.173 indicates population differentia-tion with statistical significance (p<0.01). AMOVA indicated that almost 17% of the variation was estimated for variation among the populations, while 85% of the variation was accounted due to within individual variation (Table 3). Genetic differentiations between the nine cattle breeds that were based on pairwise F ST are displayed in Figure 1. The F ST ranged from 0.085 (Hanwoo and Brown Wagyu) to 0.376 (Nelore and Brown Wagyu). The genetic distance (D A ) between Hereford and Nelore (0.107) was relatively greater than between JJBC and Hanwoo (0.028). JJBC had great distances with Nelore (0.084) and Brahman (0.078), while small distances with Hanwoo (0.028) and Black Wagyu (0.042) cattle breeds.
The Nei's genetic distances (D A ) matrix of the nine cattle breeds were also used to construct phylogenetic trees with NJ method [22]. Our results showed clear separation of taurine and indicine cattle into two groups (branches). After diverging into two branches of Bos Taurus and Bos Indicus, the NJ trees showed two sub-branches within the taurine branch, i.e. one for JJBC and the other for Hanwoo, two Japanese breeds and three European taurine breeds (Figure 2).

Principal component analysis
The first and second PC accounted for 27.9% and 24.9% of the total variation respectively, while the third and fourth PC accounted for 19.0% and 16.7% of the total variation, respectively. Thus, the first five PC accounted almost 100% variation across the breed populations. JJBC were uniquely located, even if Hanwoo breed is more closely positioned to JJBC than the other cattle breeds (Figure 3). Nelore and Brahman formed a distinct cluster due to large variation between taurine and indicine cattle. Hanwoo cattle formed a closer cluster with Brown Wagyu and Black Wagyu than Holstein   or other European taurine breeds (Figure 3).

Population structure analysis between eight cattle breeds
The results of proportion of individuals into each of the nine breeds that were inferred by the ADMIXTURE are presented in Figure 4. The lowest CV error values were expected when K = 9. However, the lowest CV error estimator was found when K = 11 (data not shown). Thus, K = 11 was taken as the most probable number of inferred populations.

Effective population size (N e ) over the past generations
N e is needed to determine the accuracy of genomic selection [28], and the N e estimates in the nine cattle breeds are shown  in Figure 5 and Table 4 at t generation ago. In general, all the nine breeds showed a marked decrease over time as expected.

DISCUSSION
Genetic characterization of breeds or animals based on genomic data has become an attractive method due to easy access of high throughput data derived from microarray SNP chip technology. In genetic diversity analysis, SNP markers have many advantages over microsatellite markers due to higher level of resolution, despite a set of microsatellites being suggested by the FAO to assess genetic diversity of farm animals and endangered species [10,32]. In the most recent years, genomic characterization using SNP markers have been studied in a variety of cattle breeds such as Irish Carry cattle [33], Tyrol Grey [34], Spanish beef cattle breeds [35], Canchim [36], Chinese Yiling yellow cattle [37] and many other indigenous and exotic cattle breeds raised in different countries worldwide [38][39][40][41][42][43][44]. Among Korean cattle breeds, Hanwoo was paid much more attention due to its incorporation into the national breeding program since 1970s [2]. In this study, we emphasized on the genetic characterization of JJBC, because these cattle are raised in Jeju Island in Korea and thus have unique characteristics that are different from the breeds on the mainland of Korea.
JJBC showed lower level of genetic variability (H E = 0.21) than Hanwoo and Holstein in Korea (H E = 0.218). Sharma et al [11,45] demonstrated a different level of heterozygosity in JJBC (H E = 0.39 and 0.25), while Struken et al [2] reported H E = 0.29. Heterozygosity level in our study is close to Sharma et al [11,45]. However, different results might be due to the use of various genotyping platforms, markers, and quality control criteria [2]. The lower level of heterozygosity in JJBC than Hanwoo and Holstein might be due to small population sizes, or few breeding males with more chance for increased inbreeding. However, Makina et al [40] stated that allele frequencies might be a poor estimate of inbreeding.
Inbreeding level (F IS ) in JJBC was estimated to be -0.076, which was lower than Hanwoo (-0.025) and Holstein (-0.026) in Korea. Genetic variability was lowest in Nelore (H E = 0.15) and low in Brahman (H E = 0.17). Indicine breeds might have less genetic variability than taurine breeds as reported by Lin et al [46]. Analysis of molecular variance that enables partitioning of genetic variation into overall fixation indices (F ST ), within population inbreeding (F IS ) and total inbreeding (F IT ), showed that 85% of total genetic variation was due to within populations across nine cattle breeds. This value was lower than the within populations genetic variation observed in South African cattle populations (92%) [40], but higher than those for Iranian cattle (82.9%) [42] and Ethiopian cattle populations (84.0%) [39]. Total inbreeding estimate (F IT ) and estimate of population differentiation (F ST ) was0.148 and 0.173 respectively, with statistical significance (p<0.001) (  Figure 2). PCA analysis also showed that Hanwoo and Wagyu breeds aremuch closer than JJBC ( Figure 3A).
Unsupervised hierarchical clustering of our data implemented with ADMIXTURE analysis revealed that 95% of Hanwoo individuals were assigned to cluster ten, whereas 39%, 38%, 23%, and 1% of JJBC individuals were assigned to four different clusters, i.e. five, one, six, and ten, respectively. This means that JJBC shared its genome only with Hanwoo cattle. Japanese Brown Wagyu (100%) stands alone in cluster three, while Black Wagyu (96%) individuals were assigned in cluster eleven and the rest of the 4%were assigned to cluster three, nine and ten (Table 4). This level of admixture between JJBC and Hanwoo and between Brown Wagyu and Black Wagyu might be due to co-ancestry regarding the origin of these two breeds.
Phylogenic tree analysis also showed a distinct branch for JJBC, while Hanwoo and Wagyu breeds shared the same branch more recently than JJBC (Figure 2), suggesting that Hanwoo and Wagyu breeds descended from a more close common ancestors than JJBC, and that JJBC was evolutionally diverged for a longer time than the breeds in inland Korea and Japan. This result supports that JJBC has been adapted in Jeju island, to have genetically unique characteristics as one cattle breed. PCA results also showed that Hanwoo individuals formed close clusters with those of the two Wagyu breeds than the JJBC individuals ( Figure 3A), supporting the results of phylogenic tree analysis (Figure 2).
JJBC have an effective population size of 38 in the nearest 13 generation ago, whereas Sharma et al [11] reported the N e estimate of 67. Sudrajad et al [5] reported that N e in JJBC was estimated to be 60 until 11 generation ago, and Struken et al [2] reported the N e estimate of 11 at nearest generation. These N e estimates were much smaller than the N e of Hanwoo, i.e. 209. Differences of N e estimates in the various reports might be caused by many factors such as sample size, SNP quality control measures and models used to study LD and N e [5].
The small N e value of JJBC seems to be sufficient for maintaining genetic diversity for short term species management as suggested by Frankham et al [47], but not enough at long term level. This result indicates that careful implementation of breed conservation is needed to keep genetic diversity of JJBC while increasing the population, as well as in a breeding program for genetic improvement of economically important traits in beef cattle.

CONCLUSION
This study supports genetic uniqueness of JJBC breed that has been evolved differently from Hanwoo and Wagyu breeds, as well as from indicine and western taurine breeds, although a small amount of genetic components in terms of allele sharing exists between JJBC and Hanwoo. However, further indepth study with whole genome sequencing and scanning using high density markers with larger samples would help us to accurately measure genetic parameters for establishment of JJBC as a unique breed.

AUTHOR CONTRIBUTIONS
JJK conceived, designed the experiment and revised the manuscript critically for important intellectual content. HJS and YML carried out the experiments, laboratory analyses, statistical analyses and interpretation of the data, YML assisted statistical analysis and contributed as part of the laboratory team. MZA wrote the manuscript, equally contributed to the data analysis with HJS and YML. All authors read and approved the final manuscript. All other authors including LHH, DR, HM, SS and SPP contributed by providing DNA sample, carefully read and approved the final manuscript.

CONFLICT OF INTEREST
We certify that there is no conflict of interest with any financial organization regarding the material discussed in the manu-script.