how to trim multiple sequence alignmnet and de-aling the sequences?
1
0
Entering edit mode
10.7 years ago
second_exon ▴ 210

For example, this is my multiple sequence alignment,

TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT
TTCTTCTGTGGATAGCTCAGTTCCCTCATTCT---CTGATGA---GACTAAAAGCAATCT

I want to trim this alignemnt based on the location of first sequence, like this,

TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG
TTCT---CTGATGA---GACTAAAAG

and finally, I want to de-align all the sequences. Plese suggest some perl or R scripts or already existing tools. Remeber, I like to use this in programming pipeline, so, scripts are more suitable.

Thanks in advance!

alignment sequence • 3.2k views
ADD COMMENT
0
Entering edit mode

Since your alignments are all identical it is not clear what the problem specification actually is, the solution to what your inputs and outputs are can be achieved with a simple slice of the string. But that is almost certainly not what you need.

ADD REPLY
0
Entering edit mode

Yes, in this alignment, there is no variation. But in many cases, there will be a lot of variation among aligned sequences. My interest is to cut the alignment, dealing the sequences and look for the variation in trimmed regions. I have a separate script for the scanning variation. All I need is sequences in multi-fasta file with the regions of interest.

ADD REPLY
0
Entering edit mode

The problem is that when you give an overly simplistic example then the solutions may only solve this simple case. In this case a solution could be had by just simply doing a line[start:end] or line.replace('-', '')

But those solutions would not work for a more complex case.

It is essential to provide typical examples of your multiple sequence alignments and not just rely on someone else producing them. The programs need to be tested and verified.

ADD REPLY
0
Entering edit mode
10.7 years ago
Torst ▴ 980

The HMMER toolkit comes with a set of tools (called EASEL) for manipulation multiple sequence alignments, and they can do all the things you want plus more:

http://selab.janelia.org/people/eddys/blog/?p=394#comments

ADD COMMENT

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6