16S rRNA extraction of assembled genome bins
1
0
Entering edit mode
5.9 years ago
luyang1005 ▴ 20

Hi, community, I am new in the world of (draft) genome bins analysis. I have multiple assembled genome bins from shotgun metagenomic data analysis. Currently, I want to extract the 16S rRNA sequences from the genome bins. Therefore, I use software CheckM, barrnap to find the 16S rRNA sequences in the genome bins. Both software uses hmmer search to achieve the result. As a result, for some genome bins, I can find two 16S rRNA hits sequences, If I use it archaea and bacteria mode, some fragment of reads can be classified as archaea hits, while also can be classified as bacteria hit in some individual genome bin. For example, one of the bin found two 16S hits of archaea and also two hits of bacteria. The header of the hits are >16S_rRNA::NODE_2_length_100533_cov_5.789665:250-1687(-) and >16S_rRNA::NODE_8_length_10807_cov_5.393508:10362-10807(-) in bacteria output. The header of the hits are >16S_rRNA::NODE_2_length_100533_cov_5.789665:251-1678(-) and >16S_rRNA::NODE_8_length_10807_cov_5.393508:10363-10803(-) And I blast both fasta hits to RDP classifier, and the archaea hits outputs are 16S_rRNA::NODE_2_length_100533_cov_5.789665:251-1678(-);+;Bacteria;100%;"Bacteroidetes";98%;"Bacteroidia";96%;"Bacteroidales";96%;"Rikenellaceae";38%;Mucinivorans;33% 16S_rRNA::NODE_8_length_10807_cov_5.393508:10363-10803(-);+;Bacteria;99%;Firmicutes;70%;Clostridia;61%;Clostridiales;61%;Ruminococcaceae;43%;Hydrogenoanaerobacterium;14% Also bacteria hits outputs are 16S_rRNA::NODE_2_length_100533_cov_5.789665:250-1687(-);+;Bacteria;100%;"Bacteroidetes";98%;"Bacteroidia";94%;"Bacteroidales";94%;"Rikenellaceae";34%;Mucinivorans;24% 16S_rRNA::NODE_8_length_10807_cov_5.393508:10362-10807(-);+;Bacteria;99%;Firmicutes;78%;Clostridia;53%;Clostridiales;53%;Ruminococcaceae;40%;Hydrogenoanaerobacterium;14% So my question are - (1) The result of bacteria and archaea are the same, both are bacteria. Why they are classified into two parts, bacteria and archaea? (2) The two hits came from one genome bin, why they can be predicted and have two 16S with different taxonomy classification? Any one can do me a favor? Appreciate it!

RNA-Seq genome gene • 2.7k views
ADD COMMENT
1
Entering edit mode

They're not classified as archaea, the genes just match the archaeal 16S hmm well-enough to produce a hit. You would probably get hits against eukaryotic 18S and mitochondrial 16S hmms as well. Why? Because it's the same gene in all the cases and matches the model well enough

ADD REPLY
0
Entering edit mode

Thank you so much! That makes sense! So may I have more suggestion on how to retrieve the 16S sequences from the assembled genome bins? Appreciate your help!

ADD REPLY
0
Entering edit mode
5.7 years ago
Asaf 10k

Usually the 16S from short reads will be a mess. The assembler will collapse similar sequences together and since the 16S is long and contains a lot of highly conserved regions it will either fail to assemble or will be untrusted. Try to cut the 16S sequence in two and run each part in RDP, you might get different answers.

ADD COMMENT

Login before adding your answer.

Traffic: 1366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6