Find All Similar Sequences In Genbank
2
1
Entering edit mode
12.2 years ago
Whetting ★ 1.6k

Hi,
I would like to get a feel for how many sequences related to my virus of interest are available on GenBank. I am thinking about using a recursive BLAST approach. Start with one genome, BLASTn, and add everything above a certain cutoff to a list. BLASTn the next hit in the list and keep going until no more unique sequences can be found.
Does this sound reasonable? I would take any and all suggestions/improvements.

thanks

blast • 2.5k views
ADD COMMENT
0
Entering edit mode

Did you decide on a tool/strategy for doing this? I'm trying to do something similar on enzyme sequences, but am trying not to reinvent the wheel. If you found a tool that's helpful, I'd like to take a look at it.

ADD REPLY
0
Entering edit mode

no...the idea got moved to the back-burner. Please let me know if you come up with anything

ADD REPLY
2
Entering edit mode
12.2 years ago
Rm 8.3k

You can directly search NT database: which includes everything and play with -v and -b options (to a very big numbers) to export as many similar hits you would expect from different genomes.

ADD COMMENT
1
Entering edit mode
12.2 years ago
Ketil 4.1k

I don't think it sounds reasonable at all. Remember that BLAST is local alignment, you'd find a plethora of sequences with some small region of similarity, and in the next iteration, you'd find sequences that match completely different parts of the first round results. If you are looking for remote relationships, you are better off using a more sensitive alignment method, or stochastic models like HMMs.

ADD COMMENT

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6