Entering edit mode
10.1 years ago
fhsantanna
▴
620
Hi.
I have utilized Spades software to assemble my pair-ended Miseq data from four different bacteria (multiple species: Lysobacter, Bacillus, Paenibacillus and Rhizobium). In general, this software generated around 100-900 contigs for these bacteria.
But I have noticed that all different assemblies have contigs containing only C's (or A's) with the length of ~130 nts. Also, in a specific species there are long G stretches (100-200 nts) immersed in their contigs.
Do you know how could I discard automaticallly these artifacts?
Thanks in advance.
I utilized the --careful option and much of these sequences were removed, even so there are some contigs with this problem.