BLAST not showing any hits
1
0
Entering edit mode
8.2 years ago

Starting from a nucleotide sequence, I'm trying to find all genes in the human genome that contain this sequence. However whenever I run the BLAST, I never get any hits. I tried increasing the Expect value because I saw documentation saying that's a good parameter to change if you're BLASTing a short sequence. If anybody can help me with this I'd be most appreciative. Below is a screenshot of what I'm entering.

enter image description here

blast ncbi • 5.9k views
ADD COMMENT
2
Entering edit mode

7 bp is too short for search using blast. You are better off getting the sequence of the genes you are interested in and then using a pattern search program (e.g. fuzznuc from EMBOSS) or just grep as a quick way.

ADD REPLY
0
Entering edit mode

Thank you for the reply genomax2. What I'm trying to do is see which genes in the entire human genome contain this sequence, so I don't have any particular genes of interest. Do you have any recommendations for how I may go about doing this?

ADD REPLY
1
Entering edit mode

You can get the RefSeq mRNA from UCSC or fasta sequences from GENCODE and then use fuzznuc.

ADD REPLY
1
Entering edit mode
8.2 years ago
Marge ▴ 320

While the suggestions you received are perfectly valid, I think you can still use BLAST to do what you want.

What is probably limiting your search in NCBI BLAST webservice is the parameter called "word size for seeding alignments". I think the minimum you can set for BLASTN is 7 via the NCBI website, while you would use something smaller to seed alignments that will have max length 7.

If you try using the BLASTN version hosted at Ensembl (http://www.ensembl.org/Multi/Tools/Blast?db=core) you can reduce the word size used for seeding. I did a quick test with your query sequence: setting the word size to 5, the E-val filter to max available and ticking off the filtering options for low complexity regions you do get a number of exact matches within genes (and if you are interested only in matches within genes make sure to select cDNAs as database against which to perform the search).

Hope this helps.

ADD COMMENT
1
Entering edit mode

Interesting. While this does seem to work I wonder if fuzznuc would find more hits.

Subject name    Gene hit    Subject start   Subject end Subject ori Genomic Location    Orientation Query name  Query start Query end   Query ori   Length  Score   E-val   %ID
ENST00000628916 RP11-468N14.10  549 555 Forward CHR_HSCHR4_1_CTG9:68855516-68855522 [Sequence]  Reverse Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000619770 RP11-147B8.1    162 168 Forward CHR_HSCHR15_1_CTG8:28469235-28469241 [Sequence] Reverse Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000426903 UBTFL8  882 888 Reverse 3:5082810-5082816 [Sequence]    Reverse Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000458502 ABHD17AP6   601 607 Forward 17:20841515-20841521 [Sequence] Forward Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000444082 NBPF13P 170 176 Forward 1:147056418-147056424 [Sequence]    Reverse Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000532318 RP11-15D14.2    284 290 Forward 11:13489344-13489350 [Sequence] Forward Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000621765 ENPP7P14    908 914 Forward 16:5110919-5110925 [Sequence]   Reverse Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000447604 OR2I1P  300 306 Reverse CHR_HSCHR6_MHC_MCF_CTG1:29553329-29553335 [Sequence]    Forward Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000575506 CCDC92B 347 353 Forward 17:2724859-2724865 [Sequence]   Reverse Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ENST00000454844 AGAP10P 1110    1116    Forward 10:45699224-45699230 [Sequence] Forward Query_1 1   7   Forward 7 [Sequence]    14.4    27119   100.00 [Alignment]
ADD REPLY
1
Entering edit mode

Not sure whether fuzznuc would find more hits. I tried a quick grep on Ensembl fasta and there seems definitely to be much more than few hundred, so I guess yes.

ADD REPLY
0
Entering edit mode

Would it be possible for you to post that command? I very rarely work with the shell, I'm mainly a python/R user.

ADD REPLY
1
Entering edit mode

If you have the sequence of the transcripts of interest in a fasta file called transcripts.fa the plain grep command would be

grep TGCGCAC transcripts.fa 

The big limitation is that you only get the line of the fasta that contains the match, not the whole entry and therefore you miss the corresponding gene ID. It's OK to get an idea of how many matches you would find but something extra (or just something else) should be done to get the actual genes.

ADD REPLY
0
Entering edit mode

Are these all the hits for my query sequence? Ensembl found 288 with the parameters given by Marge.

ADD REPLY
1
Entering edit mode

It's probably just because of the max number of hits to report filter that is automatically chosen in Ensembl (i.e. 10).

Important note: BLASTN gives you both direct and reverse matches. If, as you stated in the question, you are only interested in genes containing your query sequence then remember to filter out unwanted matches.

ADD REPLY
0
Entering edit mode

Thanks. I'm curious, BLASTN doesn't seem to run when the word size is 4, 3, or 2, but runs fine with 5. Do you know why this is?

ADD REPLY
0
Entering edit mode

Hello, what do you mean by 'it doesn't work'? You get no hits?

ADD REPLY
0
Entering edit mode

It isn't that I don't get hits, the job says "Failed"

ADD REPLY
0
Entering edit mode

Just for curiosity: I tried now and it seems to work. Maybe it was only a temporary problem.

In general for "Failed" runs I would suggest asking the support of the website.

ADD REPLY
0
Entering edit mode

Would you mind posting the fuzznuc command you used genomax2? Thank you, I'm not much of a shell user.

ADD REPLY
0
Entering edit mode

This link in my original post was for the help page for fuzznuc. It shows you how to run the program.

ADD REPLY
0
Entering edit mode

Could you possibly guide me on how to install fuzznuc on CygWin?

ADD REPLY

Login before adding your answer.

Traffic: 2767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6