Question

Best Tool For Aligning Short Sequence Against Genome

0

Entering edit mode

11.1 years ago

BruceB ▴ 340

I have a short sequence (34bp) that I would like to align against the mouse genome. Probably a bit of an odd question, so let me explain in more detail...

At the moment, I can predict regions where this sequence could be present (based on experimental data from our lab). So I take the reference genome from these regions, use ClustalW2 and align my 34bp sequence. It aligns where I expected it to. The alignment is poor: there are mismatches and gaps but this is to be expected as the purpose of this is to troubleshoot a problem in our targeted resequencing.

Now I've exhausted the regions we know/think this sequence occurs in and would like to generate a list of other positions where this sequence could also be found. The alignment doesn't need to be perfect, I'm after an indication of where these sequences are found.

Of course, ClustalW2 isn't good for this alignment as the reference is simply too large. What I'm looking for is a tool that can performed the gapped and mismatched alignment I'm getting from ClustalW2 but across the whole genome.

Is there such a tool and does anyone have any experiences with doing something similar?

short sequence alignment • 9.8k views

ADD COMMENT • link updated 11.0 years ago by Prakki Rama ★ 2.7k • written 11.1 years ago by BruceB ▴ 340

score 2 · Answer 1 · 2013-10-03

Hello: As NicoBxl suggested you can do local blast on your computer. A handy way for getting results when you blast a very short sequences (like yours) against a long database of DNA is using the parameter "-task blastn-short" or/and making the "-evalue" very high (10 or more). At least these work fine for me. I hope I helped.

score 2 · Answer 2 · 2013-10-28

2

Entering edit mode

11.0 years ago

Prakki Rama ★ 2.7k

You can use BLAT as well. The standalone is available here

ADD COMMENT • link 11.0 years ago by Prakki Rama ★ 2.7k

score 1 · Answer 3 · 2013-10-03

1

Entering edit mode

11.1 years ago

Nicolas Rosewick 11k

Try a good old Blast : http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

ADD COMMENT • link 11.1 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Wonderful suggestion. I was using the default parameters which weren't dealing with the mismatches or gaps very well. A bit of tweaking has given me a list of positions that almost perfectly predict the regions where said sequencing problem has occurred. A little more tweaking and I should have a really good list of potentially problematic regions.

ADD REPLY • link 11.1 years ago by BruceB ▴ 340