dedupe.sh from bbmap excludes contigs with insufficient overlap
0
0
Entering edit mode
7.4 years ago
biomagician ▴ 410

Hi,

I am using the following command:

grep '>' celegans.fa | wc -l >> numberOfContigsBefore.txt
dedupe.sh -Xmx28g in=celegans.fa out=merged.fa minoverlap=100000000 maxedits=50 minidentity=98 overwrite=t > mergedNotes.txt
    grep '>' merged.fa | wc -l >> numberOfContigs.txt

celegans.fa contains a de novo C. elegans assembly. The genome of C. elegans is 100 MB. I am testing dedupe.sh with an extreme value of overlap to test how it works. My starting number of contigs is 127. Dedupe removes contigs to end up with 109 contigs. I do not understand how this is possible given that I require a minimum overlap of 100 MB and that no contig is that long.

Can anybody explain this to me, please?

Thanks. Best, C.

dedupe bbmap contig assembly • 1.7k views
ADD COMMENT
1
Entering edit mode

You could try to ask to Brian Bushnell, he is the author of BBTools / BBMap.

ADD REPLY
1
Entering edit mode

Looks like the documentation is incorrect; minoverlap is only used when looking for non-containment overlaps (for clustering). I'll either change the behavior or the documentation.

ADD REPLY
0
Entering edit mode

Hi Brian,

Can you please advise me on how to best use your tool? I would like to remove small contigs that are contained within other contigs and possibly also merge contigs that overlap.

Best,

C.

ADD REPLY

Login before adding your answer.

Traffic: 1656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6