Hi all,
I have a file listing site-specific substitution rates (file.rate; please see below). In several steps (à 4,000 sites), I would like to remove the fastest evolving sites (shown in column "Rate") from my AA alignment (file.ali). I would be very happy and grateful if you could help out (maybe in python, perl, awk).
Site Rate Cat C_Rate
1 2.74582 4 2.74582
2 0.31646 2 0.31554
3 0.77656 3 0.88431
4 0.08958 1 0.05433
5 ... ...
The alignment file is a normal fasta (one-liner):
>Spec1
TDKCKPKKCHLECKKNCPIVKTGKSSKIAFISEMLCIGCGICVK
>Spec2
TDKCKPK----EKKKNCPIVGTGKSSKIAFISEMLCIGCGICVK
>Spec3
????KKCHLECKKNCPIVKTGK-----IAFISEMLCIGCGICVK
Thanks, but how is this removing sites from the alignment?
Ok, I understand now. How is your AA file? I think that you should use this awk filter first and then filter the remaining sites. Please, show us your AA file to see how this can be achieved.
I have updated my answer, hope this can be useful to you.
No, this is very cumbersome as I would have to find all the different thresholds to use by myself etc. Anyway, thanks for your suggestions.