INTRODUCTION
Chinese indigenous sheep breeds can be classified by their tail morphology into the following categories: fat-tailed, fat-rumped and thin-tailed sheep breeds.
Large-tailed Han sheep are primarily found throughout the hinterland of the North Plain of China. This region has a typical temperate continental monsoon climate with obvious seasonal changes for each of its four seasons. The hinterland is cold and dry in the winter and hot and rainy in the summer. Large-tailed Han sheep in this region have large, long, fan-shaped fat tails that hang down to their hocks. Their peach-shaped tail tips are upturned and hang near their tail grooves.
Altay sheep are primarily distributed throughout the Fuhai and Fuyun counties in the Altay region of the Xinjiang Uygur Autonomous Region. The central production area of these sheep has a typical continental climate, with an annual average temperature of 4.0°C, an extreme minimum temperature of −42.7°C, an annual snow cover of 200 to 250 days and a snow thickness of 15 to 20 cm. The fat deposited on the buttocks of Altay sheep causes the sheep to have fat, rounded hips, which are wide, straight and large. In the middle of the lower edges of these hips, shallow grooves divide the hips into two symmetrical halves.
Tibetan sheep are native to the Qinghai-Tibet Plateau and are primarily distributed throughout the Tibet Autonomous Region and Qinghai. The central production area of these sheep is located at 26°50′-36°53′ north latitude and 78°25′-99°06′ east longitude, which is in the southwestern part of the Qinghai-Tibet Plateau, and has an average elevation of over 4,000 m. The climate is characterized by a long sunshine duration, strong radiation, low temperatures, a large temperature difference, clear and wet sky conditions, long night rains, a dry winter and spring, a high wind pressure, a low air pressure and a low oxygen content. The tails of Tibetan sheep are short and are shaped like flat cones.
Worldwide, more than 25% of sheep breeds are fat-tailed or fat-rumped and accumulate a significant amount of fat in their tails [
1]. Fat-tailed sheep are the result of breeding by humans and natural selection and appeared approximately 5,000 years ago [
2]. The fat in the tails of fat-tailed and fat-rumped sheep has been used by humans as a high-energy food source during periods of drought and famine. In addition, fat-tailed and fat-rumped sheep are breeds that have adapted to extreme environments. Their fat serves as an energy reserve to support their migration and survival during cold winters.
Copy number variations (CNVs) are DNA segments with sizes ranging from 1 kilobase (Kb) to several megabases (Mb) in which duplication or deletion events have occurred [
3]. CNV is a major source of phenotypic diversity and genetic variation [
4]. For example, CNV in the KIT proto-oncogene, receptor tyrosine kinase gene leads to a white coat color in pigs [
4]. In addition, the phenotype of a white and gray coat in goats is influenced by a CNV in the agouti signaling protein gene [
5]. Zhu et al [
6] detected CNV in sheep with different tail types, which included genes associated with fat deposition.
Artificial selection has played a significant role in the domestication of livestock. Domestication has reshaped the behavior, morphology and genetics of many livestock species. Selection has various effects on the genome itself. Positive selection can increase advantageous allele frequencies and fix them within a population [
7]. Consequently, polymorphism at a selection site is then reduced in the population. Selection signatures can be detected through variation in the allele frequency and the decay of linkage disequilibrium [
6]. With the development of high-density single nucleotide polymorphism (SNP) chips and high-throughput genotyping technology, a number of statistical methods have been used to explore the selection signatures in genes and the genome. Several selection signatures in sheep are associated with regions showing evidence of introgression from Asian breeds. A comparison of European sheep breeds and wild boars showed genomic regions with high levels of differentiation in both animals that were found to harbor genes related to bone formation, growth and fat deposition [
8]. F-statistics (F
ST) [
9] are extensively used in identifying selection signatures, which have been used primarily to determine differences in the selection signatures between populations.
The X chromosome undergoes more drift than an autosome, as its effective population size is three-quarters that of an autosome [
10]. The X chromosome is more specialized than an autosome and plays an important role in the evolution of humans and animals. The X chromosome has a high gene density and thus may be a good target for detecting CNV and selection signatures. Several studies reported the presence of selection footprints on the X chromosome in pigs and sheep and determined that genes relevant to meat quality, reproduction and the immune system were found in potential selection regions [
11,
12]. In addition, Zhu et al [
6] detected selection signatures on the X chromosome in several sheep genes correlated with reproduction. Rubin et al [
13] noted the X chromosome should be analyzed only for the identification of selection signatures. Furthermore, they suggested only sows should be used in selection signature studies because sex chromosomes and autosomes for different genders are subjected to different selective pressures and have different effective population sizes.
The purpose of this study was to identify CNV and selection signatures on the X chromosome in Chinese indigenous sheep breeds with different tail types. PennCNV software and a between-population method (FST) were employed to analyze high-density 600K SNP genotype data. Subsequently, a stream of analysis was performed to explain fat deposition in the tail, including gene search and functional enrichment analysis methods.
MATERIALS AND METHODS
Ethics statement
All animal experiments were approved by Gansu Agricultural University (Lanzhou, China). All procedures for animals were performed in strict accordance with the guidelines proposed by the China Council on Animal Care and the Ministry of Agriculture of the People’s Republic of China.
DNA sample collection
In total, 120 individuals from three breeds, including 40 large-tailed Han sheep (20 rams and 20 ewes), 40 Altay sheep (20 rams and 20 ewes), and 40 Tibetan sheep (20 rams and 20 ewes), were collected from Liaocheng in Shandong Province, Altay in Xinjiang Province and Tianzhu in Gansu Province, respectively. All specimens were randomly selected. After a principal component analysis (PCA) was performed to identify population structure and the relatedness of animals, 60 females, comprising 20 large-tailed Han, 20 Altay, and 20 Tibetan sheep, were chosen for detecting CNV in and identifying selection signatures on the X chromosome.
Genomic DNA samples were extracted from blood using the TIANamp Blood DNA Kit (Tiangen Biotech Co. Ltd., Beijing, China). The purity and concentration of genomic DNA were measured using a NanoVue spectrophotometer.
Genotyping and quality control
The genomic DNA of each specimen was genotyped using an Illumina Ovine SNP 600 BeadChip, which contained 604,715 SNPs that spanned the ovine genome with an average distance of 4.28 kb.
PLINK software (v1.07;
http://pngu.mgh.harvard.edu/purcell/plink) was used to control the quality of the X chromosome genotype data. An individual was removed if the call rate was less than 90% or if the sample was a duplicate. An SNP locus was excluded if i) its SNP call rate was less than 90%; ii) its minor allele frequency was less than 0.05; or iii) it did not obey Hardy-Weinberg equilibrium (p value<10
−6). After quality control, BEAGLE software [
14] was used to impute the missing genotypes and infer haplotypes.
To increase the accuracy of CNV inference, the following stringent quality control criteria were applied: i) an individual call rate greater than 95% and a call frequency greater than 90%; ii) a log R ratio (LRR) standard deviation less than 0.30; iii) a B allele frequency (BAF) drift less than 0.01; and iv) a default waviness factor.
Detection of copy number variations
PennCNV software [
15] was employed to detect CNV in only the X chromosome. The PennCNV algorithm incorporated multiple information sources, including the total signal intensity of the LRR, BAF, and population frequency of the B allele (PFB). The LRR and BAF of each SNP for every sample were exported from Illumina GenomeStudio software. The PFB was generated based on the BAF of each SNP marker. Genomic waves were adjusted with the sheep GCmodel file, which was generated by calculating the GC content of 1-Mb genomic regions surrounding each marker (500 kb on each side), after the program argument ‘gcmodel’ was used to adjust the results. According to previous research using high-density SNP chips to detect CNV in humans [
16,
17], the CNV filter was based on three criteria: i) the CNV must contain 10 or more consecutive SNPs; ii) the length of the CNV must be at least 100K; and iii) the CNV must be present in at least one animal.
Finally, after following the method reported by Redon et al [
18], CNVRs were obtained by merging CNVs with overlapping regions that had been identified in all samples.
Global FST
To better understand the population genetic differentiation among the three breeds studied herein, F
ST was used to detect signatures of diversifying selection based on genetic polymorphism data. F
ST was calculated as described by MacEachern et al [
19]:
where HT represents the expected heterozygosity for the overall total population such that
where p and q denote the frequencies of alleles A1 and A2 over the total population.
In
Equation 2,
HS represents the expected heterozygosities in subpopulations and is calculated as follows:
where H expi denotes the expected heterozygosity and ni denotes the sample size in subpopulation i.
Identifying selection regions on the X chromosome
A boxplot strategy was used to determine the upper and lower threshold values to confirm the FST outlier values for each SNP locus.
First, the interquartile range (Q) of the FST empirical distribution on the X chromosome was calculated as follows:
where FU and FL represent the upper and lower interquartile ranges, respectively. Second, the upper (UL) and lower (LL) threshold values were calculated as follows:
All values greater than the UL threshold value or less than the LL threshold value were defined as outliers.
Gene detection and functional analysis
Genes harbored in the previously identified CNVRs and selection regions were obtained from the Ensembl Genes 64 Database using BioMart software based on the
Ovis aries gene sequence assembly (Oar_v3.1). In the selection region, the outlier or selection footprint was extended approximately 100 kb in the upstream and downstream directions. The Database for Annotation, Visualization and Integrated Discovery (DAVID) (
http://david.abcc.ncifcrf.gov/) was used to perform gene ontology (GO) [
20] enrichment analysis and Kyoto encyclopedia of gene and genome (KEGG) [
21] pathway analysis.
To better understand the functions of genes within the detected CNVRs and selection regions, the Ovis aries Ensembl gene IDs were converted to human ortholog Ensembl gene IDs using BioMart because annotation of the sheep genome was limited.
DISCUSSION
In livestock, CNV is one of the primary contributors to phenotypic variation [
23]. Several algorithms for inferring CNVs and identifying selection regions in the genome based on SNP chip data have been developed. The selection signature is an imprint left on the genome. To-date, a series of methods have been developed to identify selection signatures on the genome, including the F
ST and integrated haplotype score, among others.
In this study, one algorithm was employed to detect CNV in the X chromosome because a number of studies have indicated the reliability of the PennCNV software for detecting CNV is higher than that of several other algorithms [
24]. Although only one algorithm was used to detect CNVs in this study, the following two strict criteria were adopted to reduce the risk of false-positive results: an LRR standard deviation of less than 0.30 and a BAF of 0.01. Jiang identified nine CNVRs in the X chromosome in Holsteins using an Illumina High-Density Bovine SNP BeadChip, which contained 777,692 SNPs [
25]. In this study, an Illumina Ovine SNP600 BeadChip was used to identify six, four and 22 CNVRs in fat-tailed, fat-rumped and thin-tailed sheep, respectively. These results could be explained by the higher density of the microarray itself, which might have allowed for the detection of a larger number of CNVRs. A general existing belief is the first wild sheep were thin-tailed sheep, and fat-tailed sheep evolved from these thin-tailed sheep via selection by humans and by nature [
26]. Thin-tailed sheep had a larger number of CNVRs in their X chromosomes than either fat-tailed or fat-rumped sheep. Altogether, these results indicated natural and artificial selection for fat deposition in sheep tails could lead to alterations in CNV.
In this study, F
ST was used to identify selection signatures on the X chromosome in three different breeds of sheep. Previously, Zhu et al [
27] used F
ST to identify 49, 34, and 55 candidate selection regions respectively associated with reproduction, the immune system and biosynthetic related pathways in German Mutton, Dorper, and Sunit sheep.
In this study, the BioMart data management system was employed to identify overlapping genes or genes within CNVRs, which resulted in the identification of 3,130 genes. Furthermore, the DAVID Bioinformatics Resources System 6.7 was implemented for GO and KEGG pathway analyses. The GO and KEGG pathway enrichment analyses showed the functions of the genes in the identified CNVRs were involved in biological processes, molecular functions and cellular components, including the regulation of transcription; DNA-templated, cytoplasmic translation; rRNA processing; protein serine/threonine kinase activity; viral nucleocapsids; cytosolic large ribosomal subunits; positive regulation of transcription from the RNA polymerase II promoter; Herpes simplex infection; viral carcinogenesis; and the Ras signaling pathway.
Qiu et al [
28] used cDNA microarrays to demonstrate the calcium voltage-gated channel subunit alpha1 F gene was expressed in rat adipose tissue in a well-characterized rat model of high-fat diet- (HFD-) induced obesity. Xu et al [
29] reported the serine/arginine-rich specific kinase 3 gene might be an important gene for skeletal muscle development and provided basic molecular information useful for further studies on its roles in porcine skeletal muscle. Xiong et al [
30] generated liver-specific
FoxO1/3/4-knockout mice. After being fed a high-fat diet, wild-type mice developed type 2 diabetes; however, lymphotoxin-knockout-mice remained euglycemic and insulin-sensitive.