Hi, community,
I am new in the world of (draft) genome bins analysis. I have multiple assembled genome bins from shotgun metagenomic data analysis. Currently, I want to extract the 16S rRNA sequences from the genome bins. Therefore, I use software CheckM, barrnap to find the 16S rRNA sequences in the genome bins. Both software uses hmmer search to achieve the result.
As a result, for some genome bins, I can find two 16S rRNA hits sequences, If I use it archaea and bacteria mode, some fragment of reads can be classified as archaea hits, while also can be classified as bacteria hit in some individual genome bin. For example, one of the bin found two 16S hits of archaea and also two hits of bacteria. The header of the hits are >16S_rRNA::NODE_2_length_100533_cov_5.789665:250-1687(-)
and >16S_rRNA::NODE_8_length_10807_cov_5.393508:10362-10807(-)
in bacteria output. The header of the hits are >16S_rRNA::NODE_2_length_100533_cov_5.789665:251-1678(-)
and >16S_rRNA::NODE_8_length_10807_cov_5.393508:10363-10803(-)
And I blast both fasta hits to RDP classifier, and the archaea
hits outputs are 16S_rRNA::NODE_2_length_100533_cov_5.789665:251-1678(-);+;Bacteria;100%;"Bacteroidetes";98%;"Bacteroidia";96%;"Bacteroidales";96%;"Rikenellaceae";38%;Mucinivorans;33% 16S_rRNA::NODE_8_length_10807_cov_5.393508:10363-10803(-);+;Bacteria;99%;Firmicutes;70%;Clostridia;61%;Clostridiales;61%;Ruminococcaceae;43%;Hydrogenoanaerobacterium;14%
Also bacteria hits outputs are 16S_rRNA::NODE_2_length_100533_cov_5.789665:250-1687(-);+;Bacteria;100%;"Bacteroidetes";98%;"Bacteroidia";94%;"Bacteroidales";94%;"Rikenellaceae";34%;Mucinivorans;24% 16S_rRNA::NODE_8_length_10807_cov_5.393508:10362-10807(-);+;Bacteria;99%;Firmicutes;78%;Clostridia;53%;Clostridiales;53%;Ruminococcaceae;40%;Hydrogenoanaerobacterium;14%
So my question are -
(1) The result of bacteria and archaea are the same, both are bacteria. Why they are classified into two parts, bacteria and archaea?
(2) The two hits came from one genome bin, why they can be predicted and have two 16S with different taxonomy classification?
Any one can do me a favor?
Appreciate it!
They're not classified as archaea, the genes just match the archaeal 16S hmm well-enough to produce a hit. You would probably get hits against eukaryotic 18S and mitochondrial 16S hmms as well. Why? Because it's the same gene in all the cases and matches the model well enough
Thank you so much! That makes sense! So may I have more suggestion on how to retrieve the 16S sequences from the assembled genome bins? Appreciate your help!