Rmoving - from sequence data fasta file
2
0
Entering edit mode
6.7 years ago
James ▴ 20

Hi

Please can anyone help with this problem.

I have been using Aliview to look at and edit fasta files. When saving the fasta file again Aliview adds gap characters '-' to the end of sequence data to make all sequences the same length.

Is there an easy way of getting rid of these again? There are too many to do it by hand and I don;t want to remove all - using a text editor as that will also change my sequence headers.

Please help

Thank you
James

fasta aliview • 3.8k views
ADD COMMENT
0
Entering edit mode

If they are not present anywhere else but in places you want to remove them you could do sed 's/\-//g' your_file > new_file.

ADD REPLY
1
Entering edit mode
6.7 years ago
h.mon 35k

The following will not remove - from header lines, and remove all - from sequences:

sed '/^>/! s/\-//g' test.fas

The following will not remove - from header lines, and remove - only from the end of lines:

sed '/^>/! s/\-$//g' test.fas

But AliView page states is can:

Delete all gaps in all sequences: Degap the whole alignment.

ADD COMMENT
0
Entering edit mode

Thank you for this, works perfectly and doesn't touch the header names. I have used the delete all gaps in all sequences of aliview, maybe I'm doing something wrong but it still seems to put the - at the end of all shorter sequences to make them as long as the longest. Thanks again!

ADD REPLY
0
Entering edit mode

so if i break those commands down to english the s/-//g is saying to replace all - with nothing? And the /^>/! is saying to ignore all lines that have a > in them. Is that right

ADD REPLY
1
Entering edit mode

Breaking in parts:

/^>/!

skip (//!) lines that start (^) with >.

s/\-$//g

subtitute (s///) hyphens (\-, the \ is an escape character) at the end of lines ($) for nothing, globally (g). Without g, it would replace only the first instance of -. Without $, it would replace anywhere in the line.

ADD REPLY
0
Entering edit mode

Thanks so much for explaining, I'm trying to learn how all this works.

ADD REPLY
1
Entering edit mode
6.7 years ago
Hugo ▴ 380

Hi James, you can use the "Undo alignment" function of SEDA (http://www.sing-group.org/seda/). Regards.

ADD COMMENT

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6