To Coassemble or Not?
0
0
Entering edit mode
2.0 years ago

Hi folks,

I have around 60 metagenomic read sets from 60 different individuals (one sample per person). I want to find areas of differential mapping between these samples. In other words, do sample A and B both share similar mapping patterns to contig A? I have tried assembling samples separately, then combining assemblies and deduplicating the assembly using dedupe.sh from BBMap. This still left a lot of similar contigs and negatively impacted mapping quality due to duplicated sequences in the assemblies. I can play around with increasing the minimum sequence identity for the deduplication step, but the extent that I needed to deduplicate removed a good amount of sequence, which could confound our analysis.

I have now started wondering if co-assembling our samples together would be a better approach than assembling samples separately. My concern is that co-assembling possibly disparate samples would be create assemblies based on non-real read combinations from different samples.

Please let me know if you folks have any advice!

metagenomics assembly Co-assembly • 1.3k views
ADD COMMENT
0
Entering edit mode

I have tried assembling samples separately, then combining assemblies and deduplicating the assembly using dedupe.sh from BBMap.

Did you turn the following option on when you ran dedeupe.sh?

cluster=f             (c) Group overlapping contigs into clusters.

You may want to try CD-HIT for removing redundancy as well.

ADD REPLY
0
Entering edit mode

The cluster option in dedupe.sh keeps running out of memory, even when it ran using 9 relatively short (<20kb) contigs and 70 Gb of RAM.

ADD REPLY
0
Entering edit mode

Depending on the size of your data there may be no way around that. You may run into the same issue with other software.

ADD REPLY
0
Entering edit mode

I think we should be able to assemble just fine with the samples that we have. My concern is that the assembly might not be representative.

ADD REPLY
0
Entering edit mode

While the representativeness is an important issue it sounds like the quality of the assemblies you are ending up with is a more pressing problem since you seem to have duplication that you have not been able to address.

ADD REPLY
0
Entering edit mode

The duplication is on the level of contigs, which I am trying to address. Do you not think co-assembly would remove this issue?

ADD REPLY
0
Entering edit mode

That is difficult to answer. You best bet may be to try that out.

ADD REPLY

Login before adding your answer.

Traffic: 1374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6