Question

Metagenomics vs metatranscriptomics

0

Entering edit mode

5.7 years ago

bioinfo ▴ 840

Hi all,

This is more of a st of "technical questions" from biology and computational science but still part of bioinformatics issues to select right approaches at the beginning of your project.

(1) I want to get an overall "microbial community" picture including bacteria, viruses (both DNA and RNA viruses), fungi, nematodes etc.from tissue samples. Metagenomics, where the starting material is DNA can give me an overview of everything, including DNA viruses but not RNA viruses. To pick up those missing RNA viruses from metagenomics approach, I'm thinking to use RNA-based metagenomics (i.e. metatranscriptomics) separately. This mean I have to spend double to get all the required info. Is there any better way to solve this issue in one go?

(2) Do you think k-mer based kraken2 or Kaiju would perform better than metaphlan2 (using a selection or markers -clade specific) or other similar marker-based software to get my analysis for both types of datasets (metagenomics and metatranscriptomics) done for all organisms (except viruses since viruses don't have marker genes)?

(3) Is it better to use short reads or assembled reads for taxonomic classification? I have a feeling that during assembly process we might miss some reads that cant be part of any contigs but they can be searched separately against reference for taxonomic classification in parallel to taxa classification of contigs.

(4) How can we detect novel organisms (e.g. novel viruses) from these omics datasets. Is there any other approach than alignment based approaches (e.g. BLAST or VSEARCH) can detect them? Long contigs with low similarity to existing organisms in the reference might indicate novel organism since short reads would be less valuable for detecting anything novel?

Please share with views and opinions.

metagenomics metatranscriptomics assembly RNA-Seq • 1.7k views

ADD COMMENT • link updated 5.7 years ago by Mensur Dlakic ★ 29k • written 5.7 years ago by bioinfo ▴ 840

score 0 · Answer 1 · 2019-09-09

1) Sounds like a reasonable approach for what you need. You may get the answer to your question in a single go by doing just meta-transcriptomics, but in the end you will not have complete genome sequences for your DNA-based species.

3) There is no doubt that taxonomic classification works better from assembled contigs than from short reads.

4) You already made good points here. I would add that in my experience long contigs where most proteins have no matches or hit other uncharacterized proteins tend to be viral. Another way of detecting novelty is to build some kind of k-mer-embedding classifier based on known organisms, and compare this gold-standard distribution against your experimental data. In the image below gold color is for Archaea, light purple for Bacteria and navy for viruses. On the right side is a metagenome of interest. Circled regions indicate populations of sequences that are present or over-represented in either gold-standard data or in our metagenome.

enter image description here