Entering edit mode
11.1 years ago
Nicolas Rosewick
11k
Hi,
I've a little question about consensus sequences. So I've a bunch of small overlapping sequences from different groups and I want to regroup sequences from the same group together and compute the consensus sequences. The thing is that I don't know from which group sequences are coming
A little example is better to understand (here there are thre groups - line 1-4 : g1 ; line 5-8 : g2 ; line 9-12 : g3):
AAATTTGGGCCC
AAATTTGGG
AAATTTG
TTTGGGCCCAAA
ATGCATGCAT
ATGCATGC
GCATGCATGC
TGCATGCAT
ACGTACGTACGT
ACGTACGTA
GTACGTACGTAC
And the expected output would be :
Group1 : AAATTTGGGCCCAAA
Group2 : ATGCATGCATGC
Group3 : ACGTACGTACGTAC
The problem is to cluster the sequences together to form the groups. After the consenus sequence is pretty simple to do.
Anyone has an idea ?
Thanks
N.
If you are just looking for Clustering then blastclust would do.
or maybe CD-HIT http://weizhong-lab.ucsd.edu/cd-hit/
"After processing" with what?
I edited my question.
Maybe use CAP3?