Entering edit mode
9.8 years ago
biotech
▴
570
I would like to extract fasta nucleotide from BLAST hits. I'm trying to get the synthetic region of a protein.
I'm BLASTing via BLASTn the query gene and their 5' and 3' genes against a database of bacterial genomes. The database contains 200 genomes and I need to have fasta sequence of the query gene and their 5' and 3' genes in a separated file for each strain.
Some problems are that this protein is sometimes unassembled, so hits will be in two contigs and thus I will have to parse also the second best BLAST hit.
I'm open to new alternatives is this one seems not appropriate.
Thanks!
Maybe an alternative would be to parse selected genomic regions from a multiple genome alignment of the 200 strains? A caveat is that an alignment like this one will sure crash.
That's what I would do. Tabular output and then with blastdbcmd..
Here hits were parsed in fasta format but using BLASTp and only one protein as a query. The difference is that I'm trying to get nucleotide sequence for adjacent 5' and 3' genes too. Taking Only Aligned Sequences In A Blast