Hello,
I am trying to use MCL to cluster together a group of sequences. I am following this protocol http://micans.org/mcl/
I first took my ~1000 sequences and ran CD-Hit, and I used the Cluster65 file, which reduced total number of clusters to 164. I then did a blastp all-against-all and I used that to run MCL.
My problem is that no matter the inflation value I use (.4 to 20), I always get 1 cluster with 164 values. It always shows the exact same "INFO" values:
efficiency=0.84025 massfrac=1.00000 areafrac=1.00000 source=out.K02405_MCL.mci.I14 clusters=1 max=164 ctr=164.0 avg=164.0 min=164 DGI=1 TWI=0 TWL=164 sgl=0 qrt=0
I looked at the blasted file, and while most of the sequences are very similar (smaller than E-10), some have values has high as 0.3. What am I doing wrong? Are these sequences so similar that they cannot be parsed up further?
Have you considered that there may not be enough structure in your data ? Try hierarchical clustering to visualise your data.
http://postimg.org/image/ciq2ckjox/
There is a neighbor joined tree of the data. Its very frustrating that I can't get MCL to work with it. These were the commands I used:
BLAST makeblastdb -in K02405.65.fa -dbtype prot -out K02405.65DB blastp -query K02405.65.fa -db K02405.65DB -outfmt 6 -out K02405_Blasted
MCL
cut -f 1,2,11 K02405_Blasted > K02405_Blasted.abc
mcxload -abc K02405_Blasted.abc -write-tab K02405_MCL.dict -o K02405_MCL.mci --stream-mirror --stream-neg-log10 -stream-tf 'ceil(200)'
mcl K02405_MCL.mci -I 1.4 mcl K02405_MCL.mci -I 2 mcl K02405_MCL.mci -I 6