Metagenome Analysis of Protein Domain Collocation within Cellulase Genes of Goat Rumen Microbes
Article information
Abstract
In this study, protein domains with cellulase activity in goat rumen microbes were investigated using metagenomic and bioinformatic analyses. After the complete genome of goat rumen microbes was obtained using a shotgun sequencing method, 217,892,109 pair reads were filtered, including only those with 70% identity, 100-bp matches, and thresholds below E−10 using METAIDBA. These filtered contigs were assembled and annotated using blastN against the NCBI nucleotide database. As a result, a microbial community structure with 1431 species was analyzed, among which Prevotella ruminicola 23 bacteria and Butyrivibrio proteoclasticus B316 were the dominant groups. In parallel, 201 sequences related with cellulase activities (EC.3.2.1.4) were obtained through blast searches using the enzyme.dat file provided by the NCBI database. After translating the nucleotide sequence into a protein sequence using Interproscan, 28 protein domains with cellulase activity were identified using the HMMER package with threshold E values below 10−5. Cellulase activity protein domain profiling showed that the major protein domains such as lipase GDSL, cellulase, and Glyco hydro 10 were present in bacterial species with strong cellulase activities. Furthermore, correlation plots clearly displayed the strong positive correlation between some protein domain groups, which was indicative of microbial adaption in the goat rumen based on feeding habits. This is the first metagenomic analysis of cellulase activity protein domains using bioinformatics from the goat rumen.
INTRODUCTION
Goats have an extremely varied diet including the tips of woody shrubs, trees, and lignocellulosic agricultural byproducts. Symbiont microbes in the rumen of these herbivores play key roles in providing the hosts with various nutrients. Enzymes secreted by rumen microbes are essential for the conversion of cellulose and hemi-cellulose into simple sugars, which are metabolized to volatile fatty acids by rumen microbes. Produced volatile fatty acids serve as energy sources for ruminants. Many studies have investigated the symbiotic microorganisms in the rumen because of their link to economically or environmentally important traits such as feed conversion efficiency, methane production (Hegarty, 1999; Guan et al., 2008; Hess et al., 2011). There have been various studies about the correlation between rumen microbiota and their role in nutrients digestion for sheep and cattle. Especially, information for the microbial digestion consortia in goat rumen was expected to provide its species distinct characteristics compared to those of other ruminant animals (McAllister et al., 1994).
A key challenge in this study was identifying rumen microbial profiles, which are associated and potentially predictive of these traits. Thus, methods for profiling the rumen microbial population should be relatively inexpensive and efficient to allow a large number of individuals to be profiled (Ross et al., 2012). Untargeted rumen bacterial communities contain numerous novel gene sequences based on deep sequencing of pooled samples of true biological variation. The rumen metagenome profile included the counts of reads that aligned to each contig, which could be analyzed using metagenomic tools and correlation plots. The composition of the microbial population differs between goat species and based on their diet. Analysis of microorganisms in the rumen fluid of different herbivores revealed bacteria (1010 to 1011 cells/ml, representing more than 50 genera), ciliate protozoa (104 to 106/ml, from 25 genera), anaerobic fungi (103 to 105 zoospores/ml, representing six genera), and bacteriophages (108 to 109 ml). These numbers represented only a small fraction of the microbial species in rumens of animals on fiber-based diets since less than 10 to 20% of microbial populations are cultivable on synthetic media (Zhou et al., 2011). However, metagenomic research has generated genetic information on the entire microbial community, which is important because 99% of microbes cannot be isolated or cultured. The metagenomic method provides a global microbial gene pool without the need to culture of the microorganisms. In this study, we analyzed the complete genome of goat rumen microbes obtained using a shotgun sequencing method. This differed from previous studies on microbes based on 16 rRNA. Also, our results were filtered under strict conditions and provided high-quality results on the rumen microbe community and cellulase activity protein domains.
MATERIALS AND METHODS
Sampling and extraction of genomic DNA
Rumen fluid was collected from a 1-yr-old Korean native goat and Saanen hybrid raised on Timothy (Phleumpratense) hay at a private goat farm in the Cheonan City area and slaughtered at a local slaughter house. Rumen fluid was filtered through four layers of cheesecloth. Genomic DNA was isolated from rumen fluid using the Wizard Genomic DNA Purification Kit (Promega, US) according manufacturer’s protocol. Gel electrophoresis was performed with 1% agarose gel at 50 V for 2 h to check both quality and quantity of isolated genomic DNA.
DNA shotgun paired-end library preparation
Random DNA fragmentation was performed using the Covaris S2 System, and the DNA library was prepared using TruSeqDNA Sample Prep. Kit (Illumina, US). Briefly, DNA fragments were repaired to blunt-ended DNA by fill-in and exonuclease after A-tailing was conducted to prevent the formation of adapters, dimers, and concatemers. Adaptors were ligated to genomic DNA inserts at a molar ratio of 10:1. The DNA samples were then amplified via polymerase chain reaction (PCR) using two universal primers. One primer contained an attachment site for the flow cell and the other contained sequencing sites for the index read. After gel electrophoresis of the PCR product, 600 to 700-bp fragments (including the insert and adapter) were selected and purified for genomic sequencing.
Genomic sequencing
Genomic DNA sequences were generated using the Illumina Hiseq2000 platform. Briefly, only library fragments with proper adapters at both ends were amplified using P5 and P7 primers on the flow cell. Clonal clusters were generated using TreSeq PE Cluster kitV3-cBot-HS (ILPE-401-3001; Illumina). Using the HiSeq2000 platform with TruSeq SBS Kit v3-HS (200 cycles; ILFC-401-3001; Illumina) 435,784, 218 reads were obtained.
Metagenomic bioinformatics application
Each pair read, scaffold, and contig of the shotgun sequencing of goat rumen microbes was summarized in Supplementary Figure 1 and Supplementary Table 1. Whole genomic DNA of collected goat rumen microbes were extracted for Illumina sequencing without DNA targeting. This shotgun sequencing generated 217,892,109 pair reads, which were filtered based on 70% identity, more than a 100-bp match, and a threshold below E−10 based on METAIDBA (Peng et al., 2011). These filtered 1,373,011 scaffolds were assembled and annotated to 114,031 contigs using blastN against the NCBI nucleotide database. The domains of these 201 protein sequences were assigned to the cellulase (EC.3.2.1.4) database and translated using the HMMER package with threshold E values below 10−5. Finally, these annotated genomic sequences were assigned for both identification of microbial species and cellulase-like protein domains
RESULTS AND DISCUSSION
Microbial community structure in goat rumen
The isolated genes in rumen fluid were classified into a total of 1,704 organisms, among which each 181 and 1431 ID corresponded to plant and bacteria, respectively. Using the METAIDBA metagenomic bioinformatic program, 114,031 sequences were classified into 1431 species; their population structure at the species level is graphically depicted in Figure 1. Prevotella ruminicola 23 bacteria and Butyrivibrio proteoclasticus B316 bacteria were the dominant populations, accounting for 16% and 11%, respectively.
The majority of goat rumen bacteria identified in this study have been previously reported in the rumens of cow or lamb, such as Prevotella ruminicola 23, Butyrivibrio proteoclasticus B316, and Butyrivibrio fibrisolvens (Bryant and Small, 1956; Van Gylswyk and Van Der Toorn, 1986; McKain et al., 1992; Moon et al., 2008). Also, some microorganisms such as butyrate-producing bacterium SS3/4 have been identified in the human colon. Previous studies have revealed the detailed rumen metabolism of Fibrobacter succinogenes subsp. Fibrobacter succinogenes S85 and Selenomonas ruminantium (Heinrichova et al., 1989; Chow and Russell, 1992).
Protein domains with cellulase activity
Cellulase protein ID was obtained from the enzyme.dat file provided by the NCBI database. As a result, 201 sequences related with cellulase activity were obtained through blast searches using the NCBI BLAST program. In total, 28 protein domains with cellulase activity are summarized in Table 1. For other ruminant animals, Toyoda and coworkers analyzed the cellulose-binding proteins from sheep rumens, which consisted of endo-glucanases, proteins from fiber degrading bacterium and exo-glucanases, respectively (Toyoda et al., 2009). For cattle, constructed metagenomic library and identified 22 clones with distinct hydroylic activities such as 12 esterases, nine endo-β-1,4-glucanases and one cyclodextrin (Ferrer et al., 2005). Considering the close correlation between rumen microbial ecology and its enzymatic functions according to the other ruminal livestock (Krause et al., 2013), list of cellulase-like protein domain list of this study can provide a clue to the characterization of Korean native goat rumen.
Profile of protein domains with cellulase activity
After 28 protein domains with cellulase activity were identified, the richness of each domain was analyzed (Supplementary Figure 2). Some of protein domains were overlapped to same part of sequences and also counted. The dominant bacteria had a larger number of protein domains, which suggested that strong cellulase activities were related to bacterial survival in the goat rumen. Protein domains with high richness such as lipase GDSL, cellulase, and Glyco hydro 10 were also identified in the goat rumen microbes. Both lipase GDSL and lipase GDSL_2 have been reported to have molecular function of cellulose binding (Galagan et al., 2005; Wortman et al., 2009). This is speculated as one of the main reason for its high detection. Next, the number of protein domains in each microbe was investigated (Supplementary Figure 3). Prevalent bacteria such as Prevotella ruminicola 23 bacteria and Butyrivibrio protein domain ratio in each bacterial species, of which proteoclasticus B316 contained a large number of cellulase definition was the portion of cellulase-like protein domain protein domains, implying that these bacteria play a role in to the assembled and annotated contigs, was analyzed to the degradation of cellulose in the goat rumen. Finally, the evaluate the richness of protein domain with cellulase activity in the dominant bacterial species (Supplementary Figure 4). The dominant bacteria showed a ratio greater than 1, suggesting that they have high cellulase activity. A correlation plot among 28 protein domains (Figure 2) confirmed the strong positive correlation between some protein domain groups. For example, CHB_HEX_c and CHB_HEX_c-1, CHB_HEX_c and fn3 asso, and CHB_HEX_c -1 and fn3 asso had a positive correlation greater than 0.99.
Another group of lipase GDSLs, lipase GDSL_2, also showed a positive correlation greater than 0.99. To determine whether the goat rumen microbe profile was predictive of the rumen fluid metagenome profile, we correlated every rumen metagenome profile with every cellulase activity protein domain. We then determined whether the correlations were higher for samples from the same animal than for between animal samples. The results suggested that rumen fluid samples had strong correlations with each protein domain. Microbial community structure and specific protein domains with cellulase activity in the goat rumen have been identified using metagenomic analysis with both shotgun sequencing and bioinformatics. This study demonstrated that specific dominant bacterial species and protein domains have strong positive correlations, suggesting adaption to the unique feeding habits of goats.
CONCLUSIONS
In this study, microbial community structure and specific protein domains with cellulase activity in the goat rumen were identified using metagenomic analysis with both shotgun sequencing and bioinformatics. As a result, the presence of both specific dominant bacterial species such as Prevotella ruminicola 23, Butyrivibrio proteoclasticus B316, and Butyrivibrio fibrisolvens were identified among 1,431 bacteria in rumen fluid. At the same time, 28 protein domains with cellulase-like activity such as lipase GDSL, cellulase, and Glyco hydro 10 were identified with strong positive correlations, suggesting adaption to the unique feeding habits of goats.
Supplementary Materials
Each pair read, scaffold and contig from the shotgun sequencing of goat rumen microbes.
Richness of each protein domain with cellulase activity in the goat rumen microbial population.
Protein domains with cellulase activity were present in over 1% of the dominant microbe species.
Ratio of protein domain in each bacteria species.
Acknowledgements
This work was supported by a grant PJ0081912013 from Bio-Green 21 Program, Rural Development Administration, Republic of Korea.