Hello, I have a simple question but actually I have troubles finding how to solve it. I have many alignment files. And I want for each of them get rid in the alignment of overhang bases. The twist is that I want this done relatively to a specific sequence, which is always the first of the alignment and is always named "Scaffoldxxxx" (where x are numbers and others). Here is a pic
As you see, I want to trim everything that is upstream and downstream of the start and end of the first sequence. That's easy in a sequence editor such as Jalview. But as I have thousands of alignments, I need to automate it. Surprisingly I don't even know where to start.
Many thanks for any help or insight.
As requested, here is a sample alignment :
>Scaffold_2:57492774-57492872
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
-------------------------------cttgagctggggtctggccatggggtaaa
gaagcagcagcagagacagaccaatgccaatgaggattccatactgcacacagtcacaag
catgggtta---------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
-----------
>_R_TRINITY_DN28760_c0_g2_i7
tgtttgtgtgtcttgatttacaaaaatgatgcacagtaaatgttgataatttttacgact
gctgaggagatacaaggaacatggtaattgtgtaatgaagacaatgccagcttactaaat
gtattactttctgctgtgtgacaatgatacttacacgggtgcggcataaaactatctcaa
ctcctttccgttccccttccaacaccatgcttcgtataccatgtagactggagaagtcga
ggcccgatatgtggaccatgtccaggatgacggccctggggacgtcctcctccagggcca
ggcccatgaccttgtcctccaggtactctatggctggaaagaacaggccctggtcaggct
ggatcagcaccagccccggctgctccttcaccttgagttgtggccgggccatggggtaga
gcaggagcaacaaggacagcccgataccgatcaggatcccatactcaatgcccacacata
gtgaacccaaaaaggtggccacatgcacaaacagatcccatttattggtgcgccacaaaa
cggggatgattttgtagtcgaccatctgcatgacggccatgatgatgaccgcggccagcg
ctgacttggggatgtagtaacagtagggcaccaggaaggccagtactaacaggatcaggg
accctgtgaaaagaccattcgccggtgttcttacaccgctctgtgagttgacagcagttc
tggaaaaactgccggtgacaggataggaatgaacaaaggaactgagaatgttggcagtac
ctatagctatcaactcttgtgtaggatcaatcttatagttattcacacgagctgcaacat
aaacaaaagtgtcatgaatttcatatcagcgacaaaaacttttacctaataaataaagtt
ttaaaaaggag
I would need to trim this alignment so that its length is the length of the first sequence. I don't want in the second sequence what is upstream and downstream of the bases that match the first sequence.
it would help better to understand the issue if you post the data instead of images.
I have updated accordingly.
first sequence is not in the second sequence.