blastn for very short sequences
1
0
Entering edit mode
7.6 years ago
ashish ▴ 680

I have some promoter motifs which range from 4 to 10 bp. I want to perform blastn search with these short motifs using 2000bp gene upstream sequences as query. the problem is that I get zero hits in local blast. I've tried using blastn-short, decreasing the word size to 4 but still i am not getting any hits. How do i proceed with this? Thank You

blastn local blast blastn-short • 5.6k views
ADD COMMENT
1
Entering edit mode

How about using e.g. Bowtie2 instead of blast?

ADD REPLY
0
Entering edit mode

this is really nice idea, i'll download bowtie2 and try this

ADD REPLY
1
Entering edit mode

in fact for very short sequences botwtie (v1) is the best option rather than bowtie2 because it allows mismatches :).

ADD REPLY
1
Entering edit mode

I agree. Bowtie1 is a lot more sensitive than bowtie2 for such as short query sequences.

Notice that bowtie1 will give you matched sequences that are not fully identical to your subject. You will need to play with grep looking for the NM tag to control the number of truly identical annealed bases (e.g, a NM:i:0 means 100% identity, whereas NM:i:2 means that two bases are different. The concept of matching sequences and identity of these matched sequences are two different concepts in this mapping environment

ADD REPLY
0
Entering edit mode

I think as suggested by Jean-Karim Heriche, building HMM profiles will be a better idea than using bowtie.

ADD REPLY
0
Entering edit mode

Thanks for the information Buffo

ADD REPLY
1
Entering edit mode
7.6 years ago

For short sequences, you need to also increase the E-value threshold because short sequences are more likely to occur by chance. However, I don't think blast is the right tool for sequences this short. Maybe some short read aligner would do or even regular expression depending on what you're trying to get.

ADD COMMENT
0
Entering edit mode

can you suggest any such tool?

ADD REPLY
0
Entering edit mode

I am not even sure short read aligners can handle sequences so short. They're mostly designed for the length of NGS reads which are typically longer and for alignment on large sequences e.g. genomes. If you're trying to match 4-10 bp to a 2000 bp sequence, you could just use standard pattern matching functions from your favorite scripting language.

ADD REPLY
0
Entering edit mode

Hey thanks for the suggestion. Can you also tell me how the regulatory motif finding tools available online perform this kind of search.

ADD REPLY
0
Entering edit mode

You have to be more precise. There are many online tools with different approaches. It also depends on what the goal is e.g. discovering motifs de novo or finding known motifs and what the data is. You haven't told us exactly what kind of problem you're trying to solve. I suspect a case of the XY problem (i.e. you're asking about the solution you tried which failed instead of about the actual problem).

ADD REPLY
0
Entering edit mode

I am trying to find cis acting regulatory elements(CAREs) in the promoter region of some selected genes in a plant species. I took 2000 bp upstream sequence for each of these genes to search for these elements. To do this I downloaded sequences of all the CAREs ever published in plants. These CARE sequences are between 4 to 10 bases. So I want to find known motifs in promoter region of some genes. Also, if its important, genome sequence of the plant I am working on is available with very good coverage.

ADD REPLY
1
Entering edit mode

There are three standard ways to go about this: one I already mentioned is about finding patterns, another is to use profiles (i.e. using a position-specific weight matrix) and the last is to use HMMs.

ADD REPLY
0
Entering edit mode

I have sequences of same motifs from different plant species, so I think creating HMM profiles for each motif using their multiple sequence alignments will be good.

ADD REPLY

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6