Entering edit mode
7.1 years ago
bio90029
▴
10
Hi, I have performed blast on the same gene but from different bacterial strain. Now, I would like to curate the extracted sequences. Some of the genes have missing bases at the beginning of the sequence and others at the end of the sequence. I am trying to work out how I can find out if the missing bases are at the beginning or the end of the sequence so I can added correctly. This is a sample of my gene files, I have 93 sequences from the same gene by from different bacterial strains.
>gene_3 ['recombination protein F'] XXX_gnl|BL_ORD_ID|99 NODE_100_ seq length: 1045
TTTCCGCAATATCGAAAACGCGGATCTCGCTTTATCCCCTGGCTTTAATTTCCTGGTTGG
CGCGAACGGCAGCGGCAAAACCAGCGTGCTTGAGGCCATCTACACGCTCGGCCATGGCCG
GGCGTTTCGCAGTCTGCAAATTGGCCGCGTCATTCGCCACGAACAGGAAGCCTTTGTTCT
GCACGGGCGTTTGCAGGGCGAGGAGCGTGAAACGGCCATCGGTCTGACCAAAGACAAGCA
GGGCGACAGCAAGGTTCGTATCGACGGTACTGACGGCCACAAAGTGGCTGAGCTCGCGCT
GCTGATGCCGATGCAGCTGATTACGCCGGAGGGGTTTACTTTACTCAATGGCGGCCCCAA
ATACAGAAGAGCCTTCCTTGACTGGGGATGCTTTCACAACGAAGCCGGTTTCTTTAACGC
CTGGAGCAACCTGAAGCGCCTGCTTAAGCAGCGTAACGCTGCACTGCGCCAGGTGACACG
CTACGCCCAGCTGCGCCCGTGGGACAAGGAATTAATTCCCCTTGCGGAACAAATCAGCTG
CTGGCGTGCCGAATACAGCGCGGGTATCGCCGACGATATGGCCGACACCTGCAAACAGTT
TTTACCTGAATTCTCTCTCACCTTCTCCTTCCAGCGCGGCTGGGAGAAAGAGACAGATTA
TGCCGAAGTGTTAGAGAGAAATTTCGAGCGCGACCGCATGCTGACCTACACCGCACATGG
CCCGCACAAGGCGGATTTCCGCATTCGTGCCGACGGGGCGCCGGTGGAAGACACGCTGTC
GCGTGGGCAGCTCAAGCTTTTGATGTGCGCGCTGCGCCTGGCGCAGGGAGAGTTTTTGAC
CCGTGAGAGCGGGCGACGCTGCCTGTACCTGATAGATGATTTTGCCTCGGAACTTGACGA
CGCGCGGCGCGGGCTGCTTGCCAGCCGCTTAAAAGCCACGCAGTCACAGGTTTTCGTCAG
TGCGATTAGCGCTGAACACGTTATAGACATGTCGGACGAAAATTCGAAGATGTTTACCGT
GGAAAAGGGTAAAATAACGGATTAA
>gene_3 ['recombination protein F'] YYY_gnl|BL_ORD_ID|178 NODE_183 seq length: 1074
ATGTCGCTCACCCGTCTGTTGATCCGCGACTTTCGCAATATCGAAAGCGCGGATCTCGCT
TTATCCCCTGGCTTTAACTTCCTGGTTGGCGCGAACGGCAGCGGCAAAACCAGCGTGCTG
GAAGCCATCTATACGCTCGGCCACGGCCGGGCGTTTCGCAGTTTGCAGATTGGTCGCGTG
ATTCGCCACGAGCAGGAATCTTTTGTTCTGCACGGGCGTTTGCAGGGCGCAGAGCGGGAA
ACCGCCATCGGCCTGACCAAAGACAAGCAGGGCGACAGCAAGGTGCGCATTGACGGCACC
GATGGCCACAAGGTGGCGGAGCTGGCGCTGCTGATGCCGATGCAGCTGATTACGCCCGAG
GGGTTTACTTTACTCAACGGCGGCCCCAAATACAGAAGAGCGTTCCTCGATTGGGGATGC
TTTCACAATGAAGCCGGTTTCTTTAACGCCTGGAGCAACCTGAAGCGTCTGCTTAAACAG
CGTAACGCCGCATTGCGCCAGGTCACGCGCTACGCTCAGCTGCGTCCGTGGGACATGGAA
CTCATCCCTCTTGCGGAACAAATCAGCCGCTGGCGTGCCGAATACAGCGCAGGTATCGCC
GAAGACATGGCCGACACCTGCAAACAGTTTTTACCCGAGTTCTCTCTCACCTTCTCTTTC
CAGCGTGGCTGGGAAAAAGAGACGGATTATGCCGAGGTGTTAGAGAGAAGCTTCGAGCGC
GATCGCATGTTGACCTACACCGCGCACGGCCCGCACAAGGCGGATTTCCGCATTCGTGCC
GACGGTGCGCCGGTGGAAGACACGCTGTCGCGCGGGCAGCTGAAGCTCCTGATGTGCGCG
CTGCGCCTGGCGCAGGGGGAGTTCCTCACTCGAGAGAGCGGGCGACGCTGCCTGTACCTG
ATAGATGATTTTGCCTCGGAACTTGACGACGCGCGGCGCGGGCTGCTTGCCAGCCGCTTA
AAAGCCACGCAGTCGCAGGTTTTCGTCAGCGCCATTAGCGCTGAACACGTTATAGACATG
TCGGACGAAAATTCGAAGATGTTTACCGTGGAAAAGGGTAAAATAACGGATTAA
enter code here
Help will be really appreciate as I am quite loss with this part.
Use the start/end position of the blast results.
Thanks, but how do I know if the bases are missing at the end or the beginning of the sequences? What part of the blast xml allow me to get that information? Thanks