Entering edit mode
7.4 years ago
biomagician
▴
410
Hi,
I am using the following command:
grep '>' celegans.fa | wc -l >> numberOfContigsBefore.txt
dedupe.sh -Xmx28g in=celegans.fa out=merged.fa minoverlap=100000000 maxedits=50 minidentity=98 overwrite=t > mergedNotes.txt
grep '>' merged.fa | wc -l >> numberOfContigs.txt
celegans.fa contains a de novo C. elegans assembly. The genome of C. elegans is 100 MB. I am testing dedupe.sh with an extreme value of overlap to test how it works. My starting number of contigs is 127. Dedupe removes contigs to end up with 109 contigs. I do not understand how this is possible given that I require a minimum overlap of 100 MB and that no contig is that long.
Can anybody explain this to me, please?
Thanks. Best, C.
You could try to ask to Brian Bushnell, he is the author of BBTools / BBMap.
Looks like the documentation is incorrect; minoverlap is only used when looking for non-containment overlaps (for clustering). I'll either change the behavior or the documentation.
Hi Brian,
Can you please advise me on how to best use your tool? I would like to remove small contigs that are contained within other contigs and possibly also merge contigs that overlap.
Best,
C.