I have the nucleotide sequence of 59 genes from strain NC_000913.3 that I want to blast to extract from 20 other strains.
To do this, I am running the below code in a loop in the terminal, where $que is one of the 59 genes and ${i} is one of the 20 strains.
blastn -query $que -db ${i} -outfmt '6 sseq' -max_target_seqs 1
This works fine for the most part, but I am getting issues where the stop codons of NC_000913.3 and the other references are not the same - as they can be TAG/TGA/TAA. I tried removing the stop codons from the query genes, but this caused some of the results to be missing a character.
As an example; I have blasted the sequence for gene cyaA in NC_000913.3 with and without the TGA stop codon against strain AE014075. The sequence without the stop codon also removes the last nucleotides in the alignment Query: G and Sbjct: T (see below). Is there a way to do this and retain the whole sequence apart from the stop codon (or even better; is there a way for me to do the blast but ignore different stop codons?)
End of alignment with stop codon:
Query 2521 CCGCTATTACAGCAATATTTTTCGTGA 2547
` ||||| || ||||| |||||||| |||
Sbjct 4493873 CCGCTGTTGCAGCAGTATTTTTCTTGA 4493899
End of alignment without stop codon (missing TGA but also G/T):
Query 2521 CCGCTATTACAGCAATATTTTTC 2543
||||| || ||||| ||||||||
Sbjct 4493873 CCGCTGTTGCAGCAGTATTTTTC 4493895
This should be marked as answer!