How To Use Blast To Find Exact Matches Of Short Sequences?
1
2
Entering edit mode
11.4 years ago
Free Man ▴ 180

Hi, I am using tblastn (under blast 2.2.25+) for exact peptide mapping (no gaps).
I want to map few peptides (about 6 to 50 AAs in length) to genome.
However, as I test a known peptide of 6 AAs,tblastn failed to mapped this peptide.
I have read the doc of blast, but failed to find a solution. What did I miss?
Thank you!
PS. I have also tried PGM (ProteogenomicMapping). This tool can map the known peptide tested above correctly, but it's slow in my computer which is impossible for large scale mapping.

blast • 15k views
ADD COMMENT
3
Entering edit mode
11.4 years ago
SRKR ▴ 180

In the command line BLAST there is an option -perc_identity. You can use this, keep it as 100 and then run the blast. With that setting you will be able to get hits only if there is 100% identity. The command would be like this:

blastn -db dbname -query input_file -out output_file -perc_identity 100

you can try this and I believe it will work. you can also use word-size to get hits with even shorter peptides, like your case. It's value should be a minimum of 2 in case of tblastn

-word_size 3

you can always get to know all the options available by typing -h (brief) or -help (detailed) after the blast type

tblastn -help

hope this helps...

ADD COMMENT
1
Entering edit mode

Hi, I got error: "Error: (CArgException::eInvalidArg) Unknown argument: "perc_identity"".
I did not find something like 'perc_identity' in the help doc for tblastn. It seems it is only avaliable for blastn. So what version are you using?

ADD REPLY
1
Entering edit mode

yeah I am sorry, just now noticed that -perc_identity is not available with tbalstn. The best option that seems to be the case is to use -ungapped, which will avoid gaps, but still it might result in mismatches.

ADD REPLY
0
Entering edit mode

What is your genome size? If it isn't too big a script can be useful to you to get the positions. Just have to six frame translate the genome and search for your amino acid sequences in the translates. You will get the positions all through the genome.

ADD REPLY
0
Entering edit mode

Thanks for you suggestion! After tedious attempts using various parameters, I got the solution for my project:
Key parameters: -comp_based_stats 0 -ungapped -matrix PAM 30 -seg no

ADD REPLY

Login before adding your answer.

Traffic: 1956 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6