Best Tool For Aligning Short Sequence Against Genome
3
0
Entering edit mode
11.2 years ago
BruceB ▴ 340

I have a short sequence (34bp) that I would like to align against the mouse genome. Probably a bit of an odd question, so let me explain in more detail...

At the moment, I can predict regions where this sequence could be present (based on experimental data from our lab). So I take the reference genome from these regions, use ClustalW2 and align my 34bp sequence. It aligns where I expected it to. The alignment is poor: there are mismatches and gaps but this is to be expected as the purpose of this is to troubleshoot a problem in our targeted resequencing.

Now I've exhausted the regions we know/think this sequence occurs in and would like to generate a list of other positions where this sequence could also be found. The alignment doesn't need to be perfect, I'm after an indication of where these sequences are found.

Of course, ClustalW2 isn't good for this alignment as the reference is simply too large. What I'm looking for is a tool that can performed the gapped and mismatched alignment I'm getting from ClustalW2 but across the whole genome.

Is there such a tool and does anyone have any experiences with doing something similar?

short sequence alignment • 9.8k views
ADD COMMENT
2
Entering edit mode
11.2 years ago

Hello: As NicoBxl suggested you can do local blast on your computer. A handy way for getting results when you blast a very short sequences (like yours) against a long database of DNA is using the parameter "-task blastn-short" or/and making the "-evalue" very high (10 or more). At least these work fine for me. I hope I helped.

ADD COMMENT
2
Entering edit mode
11.1 years ago
Prakki Rama ★ 2.7k

You can use BLAT as well. The standalone is available here

ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

Wonderful suggestion. I was using the default parameters which weren't dealing with the mismatches or gaps very well. A bit of tweaking has given me a list of positions that almost perfectly predict the regions where said sequencing problem has occurred. A little more tweaking and I should have a really good list of potentially problematic regions.

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6