Hi,
I would like to get a feel for how many sequences related to my virus of interest are available on GenBank. I am thinking about using a recursive BLAST approach. Start with one genome, BLASTn, and add everything above a certain cutoff to a list. BLASTn the next hit in the list and keep going until no more unique sequences can be found.
Does this sound reasonable? I would take any and all suggestions/improvements.
thanks
Did you decide on a tool/strategy for doing this? I'm trying to do something similar on enzyme sequences, but am trying not to reinvent the wheel. If you found a tool that's helpful, I'd like to take a look at it.
no...the idea got moved to the back-burner. Please let me know if you come up with anything