I have metagenomic data so I am using kraken2 which processed my reads and mapped them against various databases to see what species I have. This is great and what I wanted.
But, I need to compare the results using metaSPADES, but all I can find in metaSPADES is how to get the contigs.fasta file which I understand is my main output file. What is the next step? Do I simply BLAST these contigs (which would take sooo long because there are millions of contigs)? I think I should remap my reads against the contigs to get coverage but how do I get taxonomy information? Does metaSPADES have a builtin database like kraken2 does that I am just somehow missing here (I don't see any flags or options when using the help function).
Any tips on this would be great!
Oh okay so it just assembles the data. Does MEGAN do binning then of the contigs? I guess my goal would be to group all similar contigs per sample and see what species show up in each sample type. I can't imagine doing it for each individual contig (hundreds of thousand contigs) for 100s of samples; seems like more processing would be needed?
Also, metaSPADES just produces contigs so how do I get it to produce MAGS that I can analyze? Is this something MEGAN can do? I'll have to read up on it more.
MEGAN does a taxonomic (or functional) binning of your contigs using a lowest common ancestor (LCA) algorithm
MEGAN is not a tool for the reconstruction of putative genomes (MAGs) from contigs. For MAGs ,I use CONCOT, MetaBAT2 and MaxBin2 and a final step with metaWRAP for bin refinement. Usually, binning tools for MAGs reconstruction do not use the taxonomic information for binning contigs into MAGs
Taxonomic binning and MAGs are two different things
I see...okay so please tell me if my thought process here makes sense. If I just want to find out the species in my samples and their relative abundance, would reconstructing MAGS make more sense? My understanding is that with the MAGs, I would then have created my "reference" genomes that I can assemble my trimmed reads against. This makes sense to me as it is somewhat using a de novo approach to create a "reference" for me to use...but I am unsure how that would give me abundance. Unless, each mapped read would be a "count" and that is my abundance. This is my thinking I hope I am on the right track.
mOTUs does something like that but using single copy marker genes predicted from your contigs or MAGs.
Keep in mind that not all your contigs will be binned into MAGs. Binning algorithms for MAGs requires contigs with a lenght > 2.5 Kbp. Second, certain taxa can be much easier to bin in MAGs as comapred to others. Therefore, if you focus only on medium/high-quality MAGs (completness > 50% and contamination < 10%), you could underestimates the complexity of your community.
I am not saying that focusing only MAGs to calculate taxa abundance and diversity is wrong, but you should understand what the limitation are. There are a lot of high quality papers about this topic.