Question

To Coassemble or Not?

0

Entering edit mode

2.0 years ago

joshua.kirsch • 0

Hi folks,

I have around 60 metagenomic read sets from 60 different individuals (one sample per person). I want to find areas of differential mapping between these samples. In other words, do sample A and B both share similar mapping patterns to contig A? I have tried assembling samples separately, then combining assemblies and deduplicating the assembly using dedupe.sh from BBMap. This still left a lot of similar contigs and negatively impacted mapping quality due to duplicated sequences in the assemblies. I can play around with increasing the minimum sequence identity for the deduplication step, but the extent that I needed to deduplicate removed a good amount of sequence, which could confound our analysis.

I have now started wondering if co-assembling our samples together would be a better approach than assembling samples separately. My concern is that co-assembling possibly disparate samples would be create assemblies based on non-real read combinations from different samples.

Please let me know if you folks have any advice!

metagenomics assembly Co-assembly • 1.3k views

ADD COMMENT • link updated 2.0 years ago by GenoMax 147k • written 2.0 years ago by joshua.kirsch • 0

0

Entering edit mode

I have tried assembling samples separately, then combining assemblies and deduplicating the assembly using dedupe.sh from BBMap.

Did you turn the following option on when you ran dedeupe.sh?

cluster=f             (c) Group overlapping contigs into clusters.

You may want to try CD-HIT for removing redundancy as well.

ADD REPLY • link 2.0 years ago by GenoMax 147k

0

Entering edit mode

The cluster option in dedupe.sh keeps running out of memory, even when it ran using 9 relatively short (<20kb) contigs and 70 Gb of RAM.

ADD REPLY • link 2.0 years ago by joshua.kirsch • 0

0

Entering edit mode

Depending on the size of your data there may be no way around that. You may run into the same issue with other software.

ADD REPLY • link 2.0 years ago by GenoMax 147k

0

Entering edit mode

I think we should be able to assemble just fine with the samples that we have. My concern is that the assembly might not be representative.

ADD REPLY • link 2.0 years ago by joshua.kirsch • 0

0

Entering edit mode

While the representativeness is an important issue it sounds like the quality of the assemblies you are ending up with is a more pressing problem since you seem to have duplication that you have not been able to address.

ADD REPLY • link 2.0 years ago by GenoMax 147k

0

Entering edit mode

The duplication is on the level of contigs, which I am trying to address. Do you not think co-assembly would remove this issue?