Hi there,
I am using illumine sequencing to whole genome sequence avian avulaviruses. My output is a fasta file of the whole gneome,in 1 long sequence. As part of my analysis I want to be able to define a genotype but this is done from only 1 gene, or even just a portion of this gene, the fusion gene.
Does anyone have any suggestions of how to get out of my full genome just the section I want so I can use that in an alignment and tree to determine genotype.
I want to do this from the command line in a script, the rest of my script is a bash/shell script so it would be nice if this was also in that format.
The section of gene I want is not always in exactly the same place, there can be different lengths of genome, and different length genes within those genomes.
My first thought is that I could do an alignment of my full genome against a selection of the gene I am interested in, then maybe count the number of - characters that are added by the alignment before the gene in one sequence of the alignment, and count the - characters after the gene. Then use those numbers to delete that number of characters at the beginning and end of my full genome, if that makes any sense? Does anyone know how to do that, or a better way of doing this?
So for instance I have
>Avian-Avualavirus-full-genome
AGATCCCTGGGGGCGGAACCCTAGAGTATAAAGTAAATTTTGTCTCTTTGACTGTGGTGCCGAGAAGGGATGTCTACAGGATCCCAACTGCAGTATTAAAAGTATCTGGTTCAAGCCTATACAATCTTGCGCTCAATGTCACTATAGACGTGGATGTGGATCCGAAAAGCCCATTGGTCAAGTCCCTTTCTAAGTCTGATAGCGGATACTATGCGAATCTTTTTTTGCATATCGGGCTTATGTCCACTGTAGACAAGAAAGGAAAGAAAGTGACGTTTGACAAGATAGAGGAAAAGATAAGAAGACTCAATCTATCTGTCGGGCTCAGTGACGTGCTCGGACCCTCTGTGCTTGTAAAGGCGAGAGGTGCACGGACCAAGCTACTCGCCCCCTTCTTCTCTAGCAGTGGGACAGCCTGCTATCCTATAGCAAATGCCTCTCCTCAGGTTGCCAAGATACTCTGGAGCCAAACCGCGCACCTGCGGAGTGTGAAAGTCATCATTCAGGCCGGCACCCAGCGTGCTGTTGCAGTGACCGCTGATCACGAGGTAACCTCCACTAAAATAGAGAAGAAGCACGCCATTGCAAAATACAATCCTTTCAAGAAGTAAGTTGCTTCTTTAAAGCTGCGATTCACCTGTTTTCTTGAATCACCATGATACCAGATAACGATTCACCTCAACTGCTTATAGTTAGCTCACCTGTCTAGCAAATTAGAAAAAACACGGGTAGAAGAGTCTGGATCCCAACTAGTACATTCTAGGCGTAACATGGGCTCCAAACCTTCTATCAGGATCCCGGTACCTCTGATGCTGATCACTCAGATCATGCTGATATTAAGTTATATCTGTCTGACAAGCTCTCTTGACGGCAGGCCTCTTGCAGCTGCAGGGATTGTAGTAACAGGAGATAAGGCAGTCAATGTATACACCTCATCTCAGACAGGGTCAATCATAGTCAAGTTGCTCCCGAATATGCCTAAGGATAAAGAGGCGTGTGCAAAGGCCCCATTGGAGGCATACAACAGAACACTGACTACCTTGCTCACTCCCCTTGGAGATTCCATCCGTAAGATCCAAGGGTCAGTGGCCACGTCCGGAGGAAGGAGACAGAAACGCTTTATAGGTGCCGTTATTGGCAGTATAGCTCTCGGGGTTGCGACAGCGGCACAGATAACAGCAGCTGCGGCCTTAATACAAGCCAACCAAAATGCCGCCAACATCCTCCGGCTTAAAGAGAGCATTGCTGCAACCAATGAAGCTGTGCACGAAGTTACCGATGGATTATCACAACTATCAGTGGCAGTTGGAAAGATGCAACAATTCGTCAATGACCAGTTTAATAATACGGCGCGAGAATTGGACTGTATAAAGATTACACAACAAGTTGGTGTAGAACTCAACCTATACCTAACTGAGTTGACTACAGTATTCGGGCCACAGATCACTTCCCCTGCCTTAACTCAGCTGACTATACAGGCACTCTACAATCTAGCTGGCGGCAATATGGATTACTTGTTAACTAAGTTAGGTGTAGGGAACAATCAACTCAGCTCATTAATTGGTAGCGGCTTGATCACCGGCTACCCTATATTGTATGACTCACAGACTCAACTCTTAGGCATACAAGTAAATTTGCCCTCAGTCGGGAACCTAAATAATATGCGTGCCACCTACTTGGAGACTTTATCTGTAAGTACAACCAAGGGGTTTGCTTCAGCACTTGTCCCGAAGGTGGTGACACAAGTTGGTTCCGTGATAGAAGAACTTGACACCTCTTACTGTATAGAATCTGATCTGGATCTATATTGTACAAGAATAGTGACATTCCCCATGTCCCCAGGTATTTACTCTTGCTTGAGCGGTAACACATCAGCTTGCATGTATTCAAAGACTGAAGGCGCACTCACTACGCCATATATGGCCCTTAAGGGCTCAGTTATTGCCAATTGTAAGATAACAACATGTAGATGTGCAGACCCCCCTGGTATCATATCGCAAAATTATGGAGAAGCTGTATCTCTGATAGACAGACATTCGTGCAATGTCTTATCATTAGACGGGATAACTCTGAGGCTCAGTGGGGAATTTGATGCAACTTATCTCAAGAATATCTCAATACTAGATTCTCAAGTCATCGTGACAGGCAATCTCGATATATCAACTGAGCTTGGTAATGTCAACAATTCAATCAGCAATGCCTTGGACAAGTTGACAGAAAGCAACAGCAAGCTAGACAAAGTCAATGTCAGACTAACCAGTACATCTGCTCTCATCACCTATATCGCTCTAACTGTCATTTCTCTTTTCTTCGGTGTACTTAGTCTGGGTTTAGCATGTTACCTGATGTACAAGCAGAAGGCACAACAAAAGACCTTGCTATGGCTTGGGAATAATACTCTCGATCAGATGAGAGCCACCACAAGAGCGTGAATGTAGATAAGAGGTAGATGTCCACCCAGCTGCCGCCCGTGCGCTAACTCTTACGGCCTGTCAAGTAGAAGACTTAAGAAAAAACTGCTGGGTACAAGCGACCAAAGAACGATACACGGGTAGAACGGTCAGAGGATCCACCCTTCAGTCGGAAGCCAGGCTTCACAAAATCCGTTCTACCGCATCGCCAGCCACAGAGGCCAGCCATGAGCCGCGCGGTCAACAGAGTCATGCTAGAAAATGAGGAAAGGGAAGCAAAGAACACATGGCGCTTGATTTTCCGGATCGCGGTCCTACTTTTAATGATAATGATTCTAGCTATCTCCGCAGCTGCCTTGGCATACAGCATGGGGACCAGTACGCCGCGAGACCTCACAAGCATATCGATAGCGATCTCCAAGACAGAGGATAAGGTCACATCTTTACTCAGTTCAAGTCAAGATGTGATAGATAGGATATATAAGCAGGTGGCTCTCGAATCTCCGCTGGCGCTACTAAACACTGAGTCTATAATTATGAATGCTATAACCTCTCTCTCCTATCAAATCAACGGGGCCGCGAATAATAGCGGGTGTGGGGCGCCTGTTCATGACCCAGATTATATCGGGGGGATAGGCAAAGAACTCATAGTAGACGACACGAGTGATGTCACATCATTTTATCCTTCTGCCTATCAAGAACACTTGAATTTCATCCCAGCACCTACTACAGGATCCGGTTGCACTCGGATACCCTCATTCGACATGAGCACCACTCACTACTGTTACACTCACAATGTGATATTATCTGGTTGCAGAGATCACTCACATTCACATCAATACTTAGCACTTGGTGTGCTTCGGACATCTGCAACGGGGAAGGTATTCTTCTCTACTCTGCGTTCTATCAATTTAGATGACACCCAAAACCGGAAGTCCTGCAGTGTGAGTGCAACCCCTTTAGGCTGTGATATACTGTGCTCTAAGGTCACAGAGACTGAGGAGGAGGATTACAAGTCAGTTACCCCCACATCAATGGTGCACGGAAGGTTAGGGTTTGACGGTCAATACCATGAGAAGGACTTAGACACCACAGCCTTATTCAAGGATTGGGTGGCAAATTACCCAGGAGTGGGAGGTGGATCTTTTGTTGACGAGCGTGTATGGTTCCCAGTTTATGGAGGGCTCAAACCCAATTCACCCAGTGACACTGCGCAAGAAGGGAAATATGCAATATATAAGCGCTATAATGATACATGCCCCGATGAACAAGATTACCAAATTCGGATGGCTAAGTCTTCATATAAACCTGGGCGATTTGGTGGAAAGCGCGTACAGCAAGCCATCTTATCTATCAAAGTGTCAACGTCCTTAGGTGAGGACCCAATGCTGACTATTCCACCTAATACAATTACACTCATGGGGGCCGAAGGCAGAATTCTCACGGTAGGGACATCTCACTTCTTGTACCAACGAGGGTCTTCATATTTCTCCCCCGCTTTATTATACCCCATGACAATATTTAACAAAACAGCTACTCTTCATAGCCCTTATACATTTAATGCCTTCACTCGGCCAGGGAGTGTCCCTTGCCAGGCATCAGCAAGATGCCCCAACTCATGCATCACTGGAGTCTATACTGATCCATATCCCTTGATCTTTCATAGGAATCATACCCTACGAGGGGTTTTCGGGACGATGCTTGATGATGGGCAAGCAAGACTTAACCCTGTATCTGCAGTATTTGACAACATATCCCGCAGTCGTGTAACCCGGGTGAGTTCAAGCAGCACCAAGGCAGCATACACAACATCGACATGTTTTAAAGTTGTCAAGACCAATAAAACTTATTGTCTTAGTATCGCAGAAATATCCAATACCCTATTCGGGGAGTTTAGGATTGTTCCTTTACTAGTTGAGATCCTCAAGGATAATAGGGCTTAAGAAGCTAGGCTTGGCCGACCGAGTCAGCCACAAGACAGTCGGAAGGATGACACCGCACCAATCCTCTCCCACGATGCACAGAGACAGGCCGAGTATTAACATGAGCCAGGATCCCATGCTGCCAGGCAGCCACAATTCGACAACGCTGACATGATTAATTTGAGTCCCGTCTACAGTCACTTTATTAAGAAAAAATAACAAAAGCAGTGAGATACAAGAGAAAACAACCCTCAGAAGAAAGCACGGGTAGGACATGGCGGGCTCCGGTCCCGAAAGGGCAGAGCACCGGATTATCCTACCAGAGTCACATCTATCTTCCCCATTGGTCAAGCACAAATTGCTCTATTATTGGAAATTAACTGGGCTGCCGCTTCCTGACGAATGCGACTTTGATCATCTCATTATAAGCAGGCAATGGAAAAAAATACTGGAATCGGCCACTCCTGACACGGAAAGAATGATCAAACTCGGACGGGCAGTGCACCAGACCCTCAACCACAATTCCAAGATAACCGGAGTGCTCCATCCCAGGTGTTTAGAAGAACTGGCTAGTATTGAAATCCCTGACTCAACCAACAAATTTCGGAAGATTGAGAAGAAGATCCAAATTCATAATACAAGGTATGGAGAATTGTTCACAAAACTGTGCACGCATGTTGAAAAGAAATTGCTAGGATCATCCTGGTCTAACAATGTCCCACGATCAGAGGAATTCAGCAGCATCCGTACGGATCCGGCATTCTGGTTTCACTCAAAGTGGTCCAAAGCCAAGTTCGCATGGCTCCATATAAAACAGGTCCAAAGGCATCTGATTGTAGCAGCAAGAACAAGGTCTGCAGTCAACAAGTTAGTAACATTAACTCATAAGGTAGGCCACGTCTTTGTCACCCCTGAGCTTGTCATTGTGACACATACAGATGATAACAAGTTCACATGCCTCACCCAGGAACTTGTATTGATGTATGCAGATATGATGGAAGGCAGGGACATGGTCAACATAATATCTTCTACAGCGGCACATCTTAGGAACCTATCCGAGAAAATTGATGACATCCTGCGGTTAGTAGATGCCCTGGCAAGGGATTTGGGTAATCAAGTCTATGATGTTGTAGCATTAATGGAGGGATTCGCATACGGTGCCGTTCAGCTGCTTGAGCCTTCAGGTACATTTGCAGGAGATTTTTTTGCATTCAACCTACAGGAGCTCAAGGACACTCTAATCGAACTTCTCCCAAACAATATAGCGGAATCAGTAACTCACGCAATCGCCACTGTGTTCTCTGGCTTAGAACAGAATCAAGCAGCAGAGATGCTATGCTTGCTGCGTTTGTGGGGTCATCCACTGCTTGAGTCCCGTAGTGCAGCAAGAGCGGTCAGGAGCCAGATGTGCGCACCAAAGATGGTAGACTTCGATATGATCCTCCAGGTATTATCCTTCTTTAAAGGAACAATAATCAATGGATATAGAAAGAAGAACTCAGGTGTGTGGCCACGTGTCAAAGTAGATACAATATACGGGAATGTCATTGGGCAGCTGCATGCTGATTCAGCAGAGATCTCACATGAGGTCATGTTAAGGGAGTACAGGAGTTTATCTGCCCTTGAATTTGAGCCATGTATAGAGTATGACCCTGTTACCAATCTAAGCATGTTTCTAAAAGATAAGGCAATCGCACATCCGAATGATAACTGGCTTGCCTCGTTTAGGCGGAACCTTCTCTCTGAGGACCAGAAGAAACAGATAAAGGAGGCGACCTCAACTAACCGCCTCCTGATAGAGTTTTTAGAGTCAAATGATTTTGATCCATACAAAGAGATGGAATACCTGACAACCCTTGAGTATCTAAGAGATGATAATGTGGCAGTATCGTACTCACTCAAAGAGAAGGAGGTGAAAGTGAATGGGCGAATTTTTGCTAAGCTAACAAAGAAACTAAGGAACTGCCAGGTGATGGCAGAAGGAATTCTAGCTGACCAGATTGCACCTTTCTTCCAGGGGAATGGTGTCATCCAAGATAGCATATCCTTGACTAAGAGTATGTTAGCAATGAGTCAACTGTCCTTTAACAGCAATAAGAAACGTATCACCGACTGCAAGGAAAGGGTTTCCTCAAACCGCAATCATGATCCAAAAAACAAGAATCGTCGAAGGGTTGCCACTTTTATCACGACTGACTTGCAAAAGTATTGTCTTAACTGGAGATATCAGACAGTAAAATTATTCGCCCATGCCATCAATCAGCTGATGGGCCTGCCCCACTTTTTTGAGTGGATTCATCTTAGATTAATGGACACTACGATGTTTGTAGGGGATCCTTTCAATCCTCCGAGTGACCCGACTGATTGTGACTTATCAAGAGTCCCAAATGATGACATATATATTGTCAGTGCTAGAGGGGGCATTGAGGGACTCTGCCAGAAGCTATGGACAATGATCTCAATTGCTGCAATCCAACTTGCTGCGGCAAGAGCTCATTGTCGAGTTGCCTGCATGGTACAAGGTGACAATCAAGTAATAGCTGTAACGAGAGAGGTAAGATCTGACGACTCCCCGGAAATGGTGTTGACACAGTTACATCAAGCTAGTGATAATTTCTTCAAGGAATTGATCCACGTCAATCATCTGATCGGCCATAACCTGAAGGATCGTGAAACCATCAGATCAGACACATTCTTTATATACAGCAAGCGAATATTCAAAGATGGAGCAATACTCAGTCAGGTTCTCAAGAACTCATCTAAGTTGGTGCTAATATCAGGCGACCTTAGCGAAAACACTGTAATGTCCTGTGCCAATATTGCATCCACTGTAGCAAGACTTTGTGAGAACGGGCTTCCTAAGGATTTCTGCTACTATTTGAACTACCTAATGAGTTGCGTGCAGACATACTTTGATTCAGAATTTTCTATTACCCACAGCACTCAACCAGATTCCAACCAATCCTGGATCGAGGATATCTCTTTCGTACACTCATACGTGTTAACTCCTGCCCAGCTGGGGGGATTGAGCAACCTTCAATACTCAAGGCTCTACACAAGGAATATTGGTGATCCAGGGACTACTGCTTTCGCAGAGGTCAAGCGATTAGAAGCAGTAGGGTTGCTGAGTCCTAGCATTATGACTAACATCTTAACCAGACCACCTGGCAACGGAGACTGGGCCAGCCTGTGCAATGATCCGTACTCCTTCAATTTTGAGACTGTTGCAAGCCCCAACATTGTCCTCAAGAAACATACACAGAAAGTCTTATTCGAGACTTGCTCAAACCCCTTATTATCTGGGGTACACACAGAGGACAATGAGGCTGAAGAGAAAGCATTGGCTGAATTCTTACTCAACCAAGAAGTGATTCACCCACGTGTCGCACATGCTATCATGGAAGCAAGCTCTGTAGGTAGAAGAAAGCAAATTCAAGGGCTCGTTGACACAACGAACACTGTGATTAAGATTGCACTGACTAGGAGGCCCCTCGGTATTAAAAGGCTGATGCGGATAATCAATTACTCAAGCATGCATGCAATGTTATTCAGAGATGATATTTTCTTATCCAATAGATCCAACCACCCATTGGTTTCTTCCACTATGTGCTCGCTGACGCTTGCAGACTATGCCCGGAACAGAAGCTGGTCACCCCTGACAGGGGGCAGGAAAATACTGGGTGTATCCAACCCCGATACCATAGAACTTGTGGAGGGAGAGATTCTCAGTGTCAGTGGAGGGTGCACAAAGTGTGACAGTGGAGATGAGCAGTTTACTTGGTTCCATCTTCCAAGCAATATAGAGCTGACTGACGACACCAGCAAAAATCCCCCAATGAGAGTGCCGTATCTCGGGTCGAAGACTCAAGAGAGGAGAGCTGCCTCACTTGCGAAAATAGCTCATATGTCACCACATGTGAAAGCAGCGCTAAGGGCATCATCCGTGTTAATCTGGGCTTATGGGGACAACGAAGTAAACTGGACTGCTGCTCTTAATATCGCAAGATCTCGATGCAACATAAGCTCAGAGTATCTTCGGCTATTGTCACCCCTGCCCACAGCTGGGAATCTCCAACATAGATTGGATGATGGCATAACCCAGATGACATTTACCCCTGCATCTCTCTACAGAGTATCACCTTATGTTCACATATCCAATGATTCTCAAAGACTATTCACCGAAGAAGGGGTCAAAGAGGGGAATGTGGTTTATCAACAAATCATGCTCTTGGGTTTATCTCTAATTGAGTCGCTCTTCCCAATGACAACGACCAGAACGTATGATGAGATCACATTACACCTCCACAGCAAATTTAGCTGCTGTATCCGGGAAGCGCCTGTTGCGGTCCCCTTTGAACTCCTTGGGCTGGCACCAGAATTAAGGATGGTAACCTCAAATAAGTTCATGTATGATCCCAGTCCTATATCAGAGAAAGATTTTGCGAGACTTGACTTAGCTATCTTCAAGAGTTACGAGCTTAATCTGGAATCATATCCTACGCTGGAGCTAATGAACATCCTTTCAATATCTAGCGGGAAGTTGATTGGTCAGTCCGTGGTTTCTTACGATGAGGACACTTCTATAAAGAATGATGCTATAATAGTATATGACAACACACGAAATTGGATTAGTGAGGCTCAGAATTCAGACGTGGTCCGCCTATTCGAGTATGCGGCACTTGAAGTGCTCCTTGACTGTTCCTACCAACTCTACTATCTGAGGGTGAGGGGCCTAAACAACATCGTCTTGTACATGAATGACTTATATAAGAACATGCCAGGAATCCTACTCTCCAATATTGCGACTACGATATCCCACCCCATCATTCACTCAAGGTTGAATGCAGTCGGCTTAATTAACCATGACGGGTCACACCAGCTTGCAGATATAGATTTCATTGAGATGTCGGCAAAATTGTTAGTCTCTTGCACTCGACGTGTGGTCTCAGGCTTATATGCAGGGAATAAGTACGATCTGCTGTTTCCGTCTGTCTTAGATGATAACCTAAATGAGAAGATGCTTCAGCTGATTTCCCGATTGTGCTGTCTATACACAGTGCTCTTTGCTACAACAAGGGAAATCCCAAAAATAAGAGGCCTATCAGCAGAAGAAAAATGCTCAGTACTCACTGAGTACCTACTTTCAGATGCTGTGAAACCATTGCTTAGGTCCGAACAATTGAGCTCTGTCATGTCTCCTAACATAATTACGTTCCCAGCGAATCTATATTACATGTCTAGAAAGAGCCTTAATTTGATCAGGGAACGCGAGGACAGAGATACTATCTTGTCGTTGTTGTTCCCTCAGGAACCACTGCTTGAACTTCGTCCAGTACAAGACATTGGTGCTCGAGTGAAAGACCCGTTTACCAGGCAACCAGCATCATTCATACAAGAGCTAGATTTGAGCGCCCCAGCAAGGTATGACGCATTCACATTTAATAGGGCTTGCTTCGAGCACACATTACCGAACCCAAGGGAAGATCACCTAGTACGGTACTTGTTCAGAGGAATAGGAACTGCCTCATCTTCTTGGTACAAGGCGTCTCATCTTCTTTCCGTACCCGAGGTCAGATGTGCAAGACATGGGAACTCCTTATACTTGGCGGAAGGAAGCGGAGCTATCATGAGTCTTCTTGAATTGCATATACCGCATGAGACTATCTATTATAATACGCTTTTCTCGAATGAGATGAACCCTCCACAGCGACATTTCGGACCTACGCCAACACAATTTCTAAATTCAGTCGTTTATAGGAATCTACAAGCGGAAGTGCCATGTAAAGACGGATATGTCCAGGAGTTTTGCCCACTATGGAGAGAGAATGCAGAAGAAAGCGACCTGACATCAGATAAAGCAGTTGGTTATATCACATCTGTGGTACCCTACAGGTCTGTATCATTACTACATTGTGACATTGAAATTCCTCCGGGGTCCAATCAAAGCTTATTAGATCAACTAGCTACTAATATATCTCTGATTGCCATGCATTCTGTGAAGGAGGGCGGGGTAGTGATCATCAAAGTACTGTATGCAATGGGGTACTACTTTCATCTACTCATAAACTTATTCACTCCATGTTCCACAAAAGGATATATACTCTCCAACGGCTACGCCTGTAGAGGGGATATGGAGTGTTACCTGATCTTTGTGATGGGCCACTTAGGCGGGCCTACATTCGTGCATGAAGTGGTAAGGATGGCAAAAACTCTAATACAGCGACACGGTACACTCCTATCCAAATCAGATGAAATCACACTGACTAAGCTATTTACCTCACAGCAGCGTCGTGTAACAGATATCCTATCCAGCCCCTTACCGAGGCTAATGAAGTTCTTGAGGGAAAATATTGATGCTGCATTAATTGAAGCCGGGGGACAGCCCGTCCGTCCATTCTGTGCAGAAAGTTTAGTGAGCACGCTAACAGATATGACCCAGACGACTCAGATCATTGCCAGCCACATTGACACAGTCATTCGCTCCGTAATTTACATGGAAGCTGAGGGTGACCTTGCCGACACAGTATTTTTATTTACACCCTACAATCTCTCTACAGACGGTAAAAAGAGAACATCACTTAAACAGTGCACAAGACAGATCTTGGAAGTCACAATACTGGGCCTCAGAGCCAAAGATGTCAATAAGGTCGGCAATCTAATTAGCTTGGTACTCAAAGGTGCGGTTTCTCTAGAGGACCTTATCCCATTAAGGACATATCTGAAGCGCAGTACCTGCCCTAAGTACCTGAAGGCAGTCCTAGGTATCACAAAACTCAAAGAAATGTTCACAGATACCTCCTTACTCTACTTGACTCGTGCTCAACAAAAATTCTACATGAAAACCATAGGTAATGCTACCAAGGGGTATTACAGTAATAATGACTCTTAAAGGCAATCGCATGCCAATAAACTATCTCCTTAACTGATTATTCCCTCATTGACCTAATTATACCAGATTAGAAAAAAGTTGGACTCCGACTCCTTGGAACTCGTACTCGGATTCAGTTAGTTAACTTTAAACAGGAGTGCGCGTAGTTGTCCCTAGTTATAGTCCTGTCGTTCACCAAATCTCTGTTTGGT
but I want
>AvianavulaVirus-partial-fusion-gene
ATGGGCTCCAAACCTTCTATCAGGATCCCGGTACCTCTGATGCTGATCACTCAGATCATGCTGATATTAAGTTATATCTGTCTGACAAGCTCTCTTGACGGCAGGCCTCTTGCAGCTGCAGGGATTGTAGTAACAGGAGATAAGGCAGTCAATGTATACACCTCATCTCAGACAGGGTCAATCATAGTCAAGTTGCTCCCGAATATGCCTAAGGATAAAGAGGCGTGTGCAAAGGCCCCATTGGAGGCATACAACAGAACACTGACTACCTTGCTCACTCCCCTTGGAGATTCCATCCGTAAGATCCAAGGGTCAGTGGCCACGTCCGGAGGAAGGAGACAGAAACGCTTTATAGGTGCCGTTATTGGCAGTAT
Many thanks James
not clear: how do you know where is the gene in the fasta ? do you have the coordinates ? do you just have the sequence ?
I just have sequences from other isolates which are the short section of gene I want. I would normally do an alignment with my new full genome and several short target sequences of previous isolates, then trim manually using something like mega. I'm hoping to find a way to automate this.
Since its not necessarily always going to be in the exact same position, you will have to align the sequences, e.g. via blast, to get the coordinates of the best match.