Hello everyone, I'm trying to use the function cluster of the kmer package in order to obtein a dendogram of a large set of protein sequence already aligned (fasta). The cluster function requires an input in the format AAbin, I used as_AAbin and it seem the values is not what it's expected for cluster. I used the example data "woodmouse" and It seems this is an array, it works; but in my case, "cadcbin" is not an array. Could you help?
library(seqinr)
cadcalignfil<- read.fasta("CadCfilalign.fasta", seqtype = "AA")
library(bioseq)
cadcv <- aa(cadcalignfil)
cadcbin <- as_AAbin(cadcv)
library(ape)
cluster(cadcbin, k=4)
Converting to Dayhoff(6) compressed alphabet for k > 3 Classes: AGPST, C, DENQ, FWY, HKR, ILMV
Error in kcount(x, k = k, residues = residues, gap = gap, named = FALSE) : minimum sequence length is less than k
my best regards Pam
I'm not familiar with this package but generally speaking for k-mer based clustering you want to input raw sequence, not sequence alignments..