Hide positions in alignment with 99% "–" characters to ignore single sequence insertions?
2
0
Entering edit mode
7 months ago
Broccoli • 0

I have two group of strains that I'm comparing. Each group has ~100 sequences. I align the groups together with MAFFT. The issue I have when I then visualize the alignment, is that there are numerous sites where one or two sequences have insertions, causing an insertion for the entire alignment. This makes it difficult for me to assess the overall differences between the groups. I'm wondering if there is a way to solve this? I'm thinking if there is a way to hide positions with xx % "–" characters? That way, I can ignore positions where a single sequence has an insertion.

alignment gaps • 580 views
ADD COMMENT
2
Entering edit mode
7 months ago
Jesse ▴ 850

seqmagick has a --squeeze-threshold option that does just this. For example with an MSA of five sequences with one long one (80% gaps at those positions):

$ seqmagick convert --squeeze-threshold 0.9 aln.fa -
>seq1
ACGT----ACGT
>seq2
ACGT----ACGT
>seq3
ACGT----ACGT
>seq4
ACGTGTACACGT
>seq5
ACGT----ACGT
$ seqmagick convert --squeeze-threshold 0.5 aln.fa -
>seq1
ACGTACGT
>seq2
ACGTACGT
>seq3
ACGTACGT
>seq4
ACGTACGT
>seq5
ACGTACGT
ADD COMMENT
0
Entering edit mode
7 months ago

You can try https://github.com/inab/trimal

ADD COMMENT

Login before adding your answer.

Traffic: 2382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6