Does the PacBio basecaller filter out reads with low complexity?
0
0
Entering edit mode
6.6 years ago
bgbrink ▴ 60

I have data set where the genome is known to contain about 10% telomeric repeats. However, when I blast a sequence of 4 x TTAGGG against my reads, less than 1% show a hit. This makes me wonder if reads with low complexity are removed by the basecalling pipeline and don't end up in the subreads.fastq.

Here is my blast command, to make sure I didn't do anything wrong on my side. I also tried to use less stringend values for reward/penalty and gap costs (5/-4, 10/6), but the result remains the same.

blastn -db "all_reads.fasta" -query "telo_sequence.fasta" -word_size 6 -dust no -soft_masking false -outfmt 6
sequencing • 953 views
ADD COMMENT

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6