Question

How To Get Sequence Around Blast Result?

5

Entering edit mode

11.8 years ago

dustar1986 ▴ 380

Hi,

I blasted a serires of inquiry sequences against the a pre-built blast-database file using blastall and the xml result was parsed by biopython.

I got the genomic coordinates information of the inquiry sequences.

Now I also want to know the 500bp sequences upstream and downstream of each inquiry sequences.

I know biopython can achieve this by extracting sequences from fasta file of each chromosome or from online database.

But I don't want to do this because I don't have enough spaces to keep fasta file of each chromosome locally. And online fetching is too slow when the inquiry file is large.

Is that possible to fetch such sequences using genomic coordinates from the pre-built blast-database file (So I can just keep this database file on the disk for each species)?

biopython • 8.3k views

ADD COMMENT • link updated 11.8 years ago by Leszek 4.2k • written 11.8 years ago by dustar1986 ▴ 380

score 21 · Answer 1 · 2013-02-08

21

Entering edit mode

11.8 years ago

Andrzej Zielezinski 11k

Of course, it's very easy using BLAST+, so install it right away!

First, make sure you use the -parse_seqids parameter while creating blast database:

makeblastdb -in tair10.fa -dbtype prot -parse_seqids

Then, use blastdbcmd to fetch sequences with a specific range.

blastdbcmd -db tair10.fa -dbtype prot -entry AT1G50920.1 -range 1-10

The output is.

>AT1G50920.1:1-10 | Symbols:  | Nucleolar GTP-binding protein | chr1:18870555-18872570 FORWARD LENGTH=671
MVQYNFKRIT

ADD COMMENT • link 11.8 years ago by Andrzej Zielezinski 11k

1

Entering edit mode

Thanks Zielezinski. This is really helpful.

ADD REPLY • link 11.8 years ago by dustar1986 ▴ 380

0

Entering edit mode

I'm glad I could help!

ADD REPLY • link 11.8 years ago by Andrzej Zielezinski 11k

0

Entering edit mode

What if we have a query with 100 sequences? Manual parsing may not be feasible.

ADD REPLY • link 5.3 years ago by adhirajnath14 ▴ 40

1

Entering edit mode

If you have a query with multiple accession numbers, you provide them in a text file (e.g., query.txt):

AT1G50920.1
AT1G50930.1
AT1G50940.1
AT1G50950.1

and you run the command:

blastdbcmd -db tair10.fa -dbtype prot -entry_batch query.txt

You will get FASTA sequences for the accession numbers from the query.txt file. However, when you use -batch_entry argument, you can't use the -range argument.

ADD REPLY • link 5.3 years ago by Andrzej Zielezinski 11k

0

Entering edit mode

Thank you so much. What if we want to parse 50 nt flanks with the aligned sequence?

ADD REPLY • link 5.3 years ago by adhirajnath14 ▴ 40