added missing bases at the beginning of a sequences
0
0
Entering edit mode
7.2 years ago
bio90029 ▴ 10

Hi, I have performed blast on the same gene but from different bacterial strain. Now, I would like to curate the extracted sequences. Some of the genes have missing bases at the beginning of the sequence and others at the end of the sequence. I am trying to work out how I can find out if the missing bases are at the beginning or the end of the sequence so I can added correctly. This is a sample of my gene files, I have 93 sequences from the same gene by from different bacterial strains.

>gene_3 ['recombination protein F'] XXX_gnl|BL_ORD_ID|99 NODE_100_  seq length: 1045
TTTCCGCAATATCGAAAACGCGGATCTCGCTTTATCCCCTGGCTTTAATTTCCTGGTTGG
CGCGAACGGCAGCGGCAAAACCAGCGTGCTTGAGGCCATCTACACGCTCGGCCATGGCCG
GGCGTTTCGCAGTCTGCAAATTGGCCGCGTCATTCGCCACGAACAGGAAGCCTTTGTTCT
GCACGGGCGTTTGCAGGGCGAGGAGCGTGAAACGGCCATCGGTCTGACCAAAGACAAGCA
GGGCGACAGCAAGGTTCGTATCGACGGTACTGACGGCCACAAAGTGGCTGAGCTCGCGCT
GCTGATGCCGATGCAGCTGATTACGCCGGAGGGGTTTACTTTACTCAATGGCGGCCCCAA
ATACAGAAGAGCCTTCCTTGACTGGGGATGCTTTCACAACGAAGCCGGTTTCTTTAACGC
CTGGAGCAACCTGAAGCGCCTGCTTAAGCAGCGTAACGCTGCACTGCGCCAGGTGACACG
CTACGCCCAGCTGCGCCCGTGGGACAAGGAATTAATTCCCCTTGCGGAACAAATCAGCTG
CTGGCGTGCCGAATACAGCGCGGGTATCGCCGACGATATGGCCGACACCTGCAAACAGTT
TTTACCTGAATTCTCTCTCACCTTCTCCTTCCAGCGCGGCTGGGAGAAAGAGACAGATTA
TGCCGAAGTGTTAGAGAGAAATTTCGAGCGCGACCGCATGCTGACCTACACCGCACATGG
CCCGCACAAGGCGGATTTCCGCATTCGTGCCGACGGGGCGCCGGTGGAAGACACGCTGTC
GCGTGGGCAGCTCAAGCTTTTGATGTGCGCGCTGCGCCTGGCGCAGGGAGAGTTTTTGAC
CCGTGAGAGCGGGCGACGCTGCCTGTACCTGATAGATGATTTTGCCTCGGAACTTGACGA
CGCGCGGCGCGGGCTGCTTGCCAGCCGCTTAAAAGCCACGCAGTCACAGGTTTTCGTCAG
TGCGATTAGCGCTGAACACGTTATAGACATGTCGGACGAAAATTCGAAGATGTTTACCGT
GGAAAAGGGTAAAATAACGGATTAA


 >gene_3 ['recombination protein F']    YYY_gnl|BL_ORD_ID|178 NODE_183 seq length: 1074
ATGTCGCTCACCCGTCTGTTGATCCGCGACTTTCGCAATATCGAAAGCGCGGATCTCGCT
TTATCCCCTGGCTTTAACTTCCTGGTTGGCGCGAACGGCAGCGGCAAAACCAGCGTGCTG
GAAGCCATCTATACGCTCGGCCACGGCCGGGCGTTTCGCAGTTTGCAGATTGGTCGCGTG
ATTCGCCACGAGCAGGAATCTTTTGTTCTGCACGGGCGTTTGCAGGGCGCAGAGCGGGAA
ACCGCCATCGGCCTGACCAAAGACAAGCAGGGCGACAGCAAGGTGCGCATTGACGGCACC
GATGGCCACAAGGTGGCGGAGCTGGCGCTGCTGATGCCGATGCAGCTGATTACGCCCGAG
GGGTTTACTTTACTCAACGGCGGCCCCAAATACAGAAGAGCGTTCCTCGATTGGGGATGC
TTTCACAATGAAGCCGGTTTCTTTAACGCCTGGAGCAACCTGAAGCGTCTGCTTAAACAG
CGTAACGCCGCATTGCGCCAGGTCACGCGCTACGCTCAGCTGCGTCCGTGGGACATGGAA
CTCATCCCTCTTGCGGAACAAATCAGCCGCTGGCGTGCCGAATACAGCGCAGGTATCGCC
GAAGACATGGCCGACACCTGCAAACAGTTTTTACCCGAGTTCTCTCTCACCTTCTCTTTC
CAGCGTGGCTGGGAAAAAGAGACGGATTATGCCGAGGTGTTAGAGAGAAGCTTCGAGCGC
GATCGCATGTTGACCTACACCGCGCACGGCCCGCACAAGGCGGATTTCCGCATTCGTGCC
GACGGTGCGCCGGTGGAAGACACGCTGTCGCGCGGGCAGCTGAAGCTCCTGATGTGCGCG
CTGCGCCTGGCGCAGGGGGAGTTCCTCACTCGAGAGAGCGGGCGACGCTGCCTGTACCTG
ATAGATGATTTTGCCTCGGAACTTGACGACGCGCGGCGCGGGCTGCTTGCCAGCCGCTTA
AAAGCCACGCAGTCGCAGGTTTTCGTCAGCGCCATTAGCGCTGAACACGTTATAGACATG
TCGGACGAAAATTCGAAGATGTTTACCGTGGAAAAGGGTAAAATAACGGATTAA
enter code here

Help will be really appreciate as I am quite loss with this part.

python biopython • 1.4k views
ADD COMMENT
2
Entering edit mode

Use the start/end position of the blast results.

ADD REPLY
0
Entering edit mode

Thanks, but how do I know if the bases are missing at the end or the beginning of the sequences? What part of the blast xml allow me to get that information? Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6