Hi all,
This is in continuation to my previous inquiries on POCP (percentage of conserved protein) and related command lines.
In order to calculate POCP, certain criteria must be met first, which are: "(i) E value of less than 1e5; (ii) a sequence identity of more than 40% and (iii) an alignable region of the query protein sequence of more than 50%".
I've managed to clear off (i) and (ii) by using blastp software and awk command, respectively. But now I'm currently struggling with (iii). According to https://github.com/wum5/JaltPhylo, it seems like I have to use 'python blast_to_mcl <file>' to filter off those region with less than 50% of alignment.
I try to search about 'blast_to_mcl', it looks like some sort of software but I am not too sure of what it is and where to obtain it.
I would be appreciated if you could leave a comment, if you have any knowledge on this.
Thanks!