For example, this is my multiple sequence alignment,
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
I want to trim this alignemnt based on the location of first sequence, like this,
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
and finally, I want to de-align all the sequences. Plese suggest some perl or R scripts or already existing tools. Remeber, I like to use this in programming pipeline, so, scripts are more suitable.
Thanks in advance!
Since your alignments are all identical it is not clear what the problem specification actually is, the solution to what your inputs and outputs are can be achieved with a simple slice of the string. But that is almost certainly not what you need.
Yes, in this alignment, there is no variation. But in many cases, there will be a lot of variation among aligned sequences. My interest is to cut the alignment, dealing the sequences and look for the variation in trimmed regions. I have a separate script for the scanning variation. All I need is sequences in multi-fasta file with the regions of interest.
The problem is that when you give an overly simplistic example then the solutions may only solve this simple case. In this case a solution could be had by just simply doing a
line[start:end]
orline.replace('-', '')
But those solutions would not work for a more complex case.
It is essential to provide typical examples of your multiple sequence alignments and not just rely on someone else producing them. The programs need to be tested and verified.