Hi all,
This is more of a st of "technical questions" from biology and computational science but still part of bioinformatics issues to select right approaches at the beginning of your project.
(1) I want to get an overall "microbial community" picture including bacteria, viruses (both DNA and RNA viruses), fungi, nematodes etc.from tissue samples. Metagenomics, where the starting material is DNA can give me an overview of everything, including DNA viruses but not RNA viruses. To pick up those missing RNA viruses from metagenomics approach, I'm thinking to use RNA-based metagenomics (i.e. metatranscriptomics) separately. This mean I have to spend double to get all the required info. Is there any better way to solve this issue in one go?
(2) Do you think k-mer based kraken2 or Kaiju would perform better than metaphlan2 (using a selection or markers -clade specific) or other similar marker-based software to get my analysis for both types of datasets (metagenomics and metatranscriptomics) done for all organisms (except viruses since viruses don't have marker genes)?
(3) Is it better to use short reads or assembled reads for taxonomic classification? I have a feeling that during assembly process we might miss some reads that cant be part of any contigs but they can be searched separately against reference for taxonomic classification in parallel to taxa classification of contigs.
(4) How can we detect novel organisms (e.g. novel viruses) from these omics datasets. Is there any other approach than alignment based approaches (e.g. BLAST or VSEARCH) can detect them? Long contigs with low similarity to existing organisms in the reference might indicate novel organism since short reads would be less valuable for detecting anything novel?
Please share with views and opinions.