Remove ambiguous bases from alignment sequence
0
0
Entering edit mode
7.4 years ago
a.moner • 0

Hi, could you please help me with this I have very big alignment file and there are many ambiguous bases,

1- I want to replace them (Y, W, S, R, M and K) with ( _ )

2- remove the entire column that includes this ambiguity and keep just the four bases A, T, C, G and remove all gaps

thanks here an example

S1 CCGCCGCCGCCTCC

S2 CCGCCGCCGWCTCC

S3 CCGCCMCCGCCTCC

S4 CCGCCTCCGCCTCC

I want the output for the first question like this

S1 CCGCCGCCGCCTCC

S2 CCGCCGCCG_CTCC

S3 CCGCC_CCGCCTCC

S4 CCGCCTCCGCCTCC

I want the output for the second question like this

S1 CCGCCCCGCTCC

S2 CCGCCCCGCTCC

S3 CCGCCCCGCTCC

S4 CCGCCCCGCTCC

genome next-gen sequencing alignment sequence • 2.7k views
ADD COMMENT
2
Entering edit mode

t_coffee provides very good alignment reformatting options here. For example to change all A to 1 and T to a gap

t_coffee -other_pg seq_reformat -in=input.aln -output=clustalw_aln -out=output.aln -action  +convert 'A1' 'T-'

The command for removing gapped columns is here

t_coffee -other_pg seq_reformat -in=a.aln -output=clustalw_aln -out=test.aln -action  +convert +rm_gap  n

n after rm_gap has to be set accordingly.

ADD REPLY

Login before adding your answer.

Traffic: 2810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6