Entering edit mode
3.1 years ago
robert.murphy
▴
90
In an attempt to increase the quality of our metagenomic assembly and ensure we capture low abundance species we are co assembling ~20 samples. However after the assembly we need to perform some analysis based on the samples, thus I need to map contigs back to a samplewhere reads map to that contig from said sample. Is there a tool that can do this (I cant find one) or will I have to do it myself? If so at that coverage of a contig should I assign said contig to a sample?
minimap2 should do the job
so just any old aligner and then above a certain coverage threshold assign that contin to that sample? What coverage should I use though?
Sorry
maybe you edited the question
Help me to understand, you did a co-assembly from 20 samples which resulted in
n contigs
. Now you want to know from which sample each contig was assembled? Is that right?Also,
You are looking for a threshold that can be used to assign contigs to each sample. Is that right?
Apologies I did edit it! I should have noted that in the question.
Yes both of your summaries are correct.
Since you did a co-assembly of 20 samples, the most likely scenario is that the final contigs are the result of reads coming from multiple samples. By using any short read aligner you can easily calculate the coverage per sample of each contig.
Unfortunately, I am not aware of any method/tool that use a coverage threshold to assign contigs back to samples. If that was your primary goal perhaps the co-assembly strategy was not the best choice.
edit: the following link can be helpful
Do you know of another method by which we can do this?
We are trying to recover MAGs from the metagenomes and unfortunately have a lot of host in the raw reads so this was our attempt to get good MAGs. I will try assemble by sample also though!
Most binning tools use differential coverage and other stats to cluster contigs into bins and eventually MAGs. Right know you have everything you need to recovers MAGs from the co-assembly.
I suppose you did not remove the host reads because you do not have a good reference genome. If that is the case, binning tools 'should' be able to discriminate host contigs from the rest
We do de-host. it is just that a lot of the coverage we paid for ends up being on the host so if we assemble and bin by sample we are worried we will miss a lot of low abundance species.
Unfortunately for the purpose of the study a by sample analysis is needed as it is a comparison across samples. e mapping a contig to a sample should not confound this though I don't think?