Entering edit mode
3.1 years ago
ramsha
•
0
I want to remove col that contains gaps in the MSA file...
Any sort of python code that helps me???
I want to remove col that contains gaps in the MSA file...
Any sort of python code that helps me???
Not sure if it's python code but I know that trimAL can be used for this.
Why python code, specifically? Unless you want to practice your programming skills there are good tools to do that out there. Also, do you want to remove all gaps (un-align) or remove a certain portion of gaps (e.g. columns with > 50% gaps) or uninformative columns? Still it is nice to have all the options.
trimal -nogaps
or trimal -noallgaps
should work either way (can be installed via conda), it can also clip your sequence identifiers into a shorter compatible format. Some older phylogenetic software (phylip and thereby prottest3 - max. 10 characters sequence id, mrbayes, no length restriction, but sub-string 1:15 must uniquely indentify sequence) is darn picky about these, and it looks like you might run into problems with your identifiers. I have a perl-script though, that also attempts to keep the identifiers unique and readable, let me know if you need that too. sed '/^[^>]/s/-//g' input_file
should also do as a quick command-line hack without any installation, however that will leave you with unequal length fasta lines which most tools are completely fine with, or pipe the output through EMBOSS seqret
to fix the outputUse of this site constitutes acceptance of our User Agreement and Privacy Policy.
Actually, I want to apply the complete deletion on the MSA file. complete deletion means sites containing missing data or alignment gaps are removed before the analysis begins.