Hi everyone,
I am using the BLASTn program (command line) to analyze alignment of different nucleotide sequences against a custom database (containing around 450 fasta sequences). For most of the query sequences I had good hits, and for some of the sequences I did not get any hits to my database. I have analyzed manually those that did not give any hits and some of them should, as I have done the blastn manually and get the hits.
To simplify and look for possible reasons, I took one of the fasta sequences (sequence1.fasta) that did not give me any hit when querying it against the custom database with command line and did the blast through the web (blastn with default parameters) using the same database (by aligning 2 or more nucleotide sequences).
This is the result
Description Max Score Total Score Query Cover E value Per. Ident Accession
HPV226 222 222 100% 3e-61 84.50% Query_49167
Range 1: 5419 to 5618GraphicsNext MatchPrevious Match
Alignment statistics for match #1
Score Expect Identities Gaps Strand
222 bits(245) 3e-61 169/200(85%) 0/200(0%) Plus/Plus
Query 1 ACACTGAAAATCCTGCTAATTATCAAAAAgggggggCTAAGGACACTCGTCAGAATGTGT 60
|||| |||||||||||| ||||||||||||||||| |||||||||||||| ||||| |
Sbjct 5419 ACACAGAAAATCCTGCTGCATATCAAAAAGGGGGGGCAAAGGACACTCGTCAAAATGTAT 5478
Query 61 CCCTGGATCCCAAACAAACTCAGTTGTTTGTTGTAGGCTGTACCCCTTGTAAGGGTGAGC 120
| ||||||| |||||||| ||||| |||||||| || ||||||||||| |||||||| |
Sbjct 5479 CTTTGGATCCTAAACAAACCCAGTTATTTGTTGTGGGGTGTACCCCTTGCAAGGGTGAAC 5538
Query 121 ATTGGGATGTTGCTACTGCTTGTTCCAGGCTTAACAAGGGTGATTGCCCTCCTATACAGC 180
|||||||||| || |||| ||||| | |||| ||| || || ||||||||||| ||||
Sbjct 5539 ATTGGGATGTGGCCCCTGCCTGTTCTAAGCTTGGCAAAGGGGACTGCCCTCCTATTCAGC 5598
Query 181 TTGTGCCTTCTGTAATTGAG 200
||||| | ||||| ||||||
Sbjct 5599 TTGTGTCCTCTGTTATTGAG 5618
So I see that my query is very similar to HPV 226 sequence from my custom database.
I create a database with only that sequence (HPV226), and use my query again to blast it against the HPV226 with command line and I get no hit.
blastn -db HPV226.fasta -query gi1185315504gbKY063012.1HumanpapillomavirusisolateCT14majorcapsidproteinL1genepartialcds.fasta -out result.out
As I saw the small letters when doing blastn via web, I thought that it could be due to some masking, and I have tried -dust no and -soft_masking false, but still dont get the hit. Any idea what I am missing here? I have read through the forum and did not get my answer :( Thanks a lot!
Your query sequence above produces this single perfect hit at NCBI.
Yes, that is my query sequence (Human papillomavirus isolate CT14 major capsid protein (L1) gene, partial cds), and I am blasting it against a database that has only complete HPV genome sequences, no partial cds. It is most related to HPV 226. Why dont I get the hit with blast when doing command line then?
can you give it a try by simply blasting those two sequence to each other, using the bl2seq approach?
likely not related (let alone the cause of all this) but that is a severely long fasta file name you have there. Out of curiosity, can you also try when using a much shorter filename?
I changed the fasta file names and made it shorter. Still no hits :(
Code:
blastn -query HPVisolateCT14.fasta -subject HPV226.fasta
Result no hits: Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14.
hmm, ok , I see (tried the same and got indeed the same output :/ )
a bit of trial and error later: try the command again but this time add
-task blastn
to it ... this will invoke the classic blastn approach rather than the megablast one that is default nowadays when calling blastnIt worked!! Thank you very much! Really thankful for this! Best