Entering edit mode
6.6 years ago
bgbrink
▴
60
I have data set where the genome is known to contain about 10% telomeric repeats. However, when I blast a sequence of 4 x TTAGGG against my reads, less than 1% show a hit. This makes me wonder if reads with low complexity are removed by the basecalling pipeline and don't end up in the subreads.fastq.
Here is my blast command, to make sure I didn't do anything wrong on my side. I also tried to use less stringend values for reward/penalty and gap costs (5/-4, 10/6), but the result remains the same.
blastn -db "all_reads.fasta" -query "telo_sequence.fasta" -word_size 6 -dust no -soft_masking false -outfmt 6