How to search the human genome for sequences that differ from a given sequence by a set number of mismatches?
3
0
Entering edit mode
2.4 years ago
azhai ▴ 10

Part of a program I'm writing involves taking in an input query sequence that is between 18-21 bp and finding all, or as many as possible, sequences that differ from that sequence by 1, 2, 3, and 4 bp. Basically any alignments with 1, 2, 3, or 4 mismatches. I've been trying to use blastn to do this so far, but the problem is that I'm only getting 1 hit, the actual place where the query sequence aligns. It seems that BLAST doesn't let you specify how many mismatches you want to allow in the sequence. Is there maybe a way to set the score cutoff for what is considered a hit to be lower? I'm also open to suggestions on other ways to do this not using BLAST, since I've experimented a lot with it and had little success. Thanks in advance for any advice people give!

BLAST Alignment • 773 views
ADD COMMENT
0
Entering edit mode
2.4 years ago
Mensur Dlakic ★ 28k

BLAST operates with E-value thresholds. If there are multiple hits for your sequences that satisfy the E-value threshold, with or without mismatches, BLAST would identify them. I think the default threshold is E=10. Depending on the E-value of your perfect match, you may want to increase the threshold to 100 or so. It is possible that there are no hits with 1-4 mismatches that BLAST can identify with its default parameters, so you may have to play with word size as well. If I remember correctly, smaller word sizes will make the search slower but more sensitive to short matches.

ADD COMMENT
0
Entering edit mode
2.4 years ago

See this answer.

ADD COMMENT
0
Entering edit mode
2.4 years ago

Simple and stupide solution in C. Not really tested.

compilation:

gcc -Wall -o biostar9528296 biostar9528296.c

usage:

./biostar9528296 ATATCATCGTACTAGCGATGTTTAGGGGA 2 < in.fa
ADD COMMENT

Login before adding your answer.

Traffic: 1617 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6