Hi,
I blasted a serires of inquiry sequences against the a pre-built blast-database file using blastall and the xml result was parsed by biopython.
I got the genomic coordinates information of the inquiry sequences.
Now I also want to know the 500bp sequences upstream and downstream of each inquiry sequences.
I know biopython can achieve this by extracting sequences from fasta file of each chromosome or from online database.
But I don't want to do this because I don't have enough spaces to keep fasta file of each chromosome locally. And online fetching is too slow when the inquiry file is large.
Is that possible to fetch such sequences using genomic coordinates from the pre-built blast-database file (So I can just keep this database file on the disk for each species)?
Thanks Zielezinski. This is really helpful.
I'm glad I could help!
What if we have a query with 100 sequences? Manual parsing may not be feasible.
If you have a query with multiple accession numbers, you provide them in a text file (e.g.,
query.txt
):and you run the command:
You will get FASTA sequences for the accession numbers from the
query.txt
file. However, when you use-batch_entry
argument, you can't use the-range
argument.Thank you so much. What if we want to parse 50 nt flanks with the aligned sequence?