blastcust results understanding
0
0
Entering edit mode
7.6 years ago

I have to group DNA sequences according to similarity, and create a non-redundant (NR) database from it.

In the first attempt I start creating the NR database with the first sequence, and created a database with already added sequences (redundant). Before adding the next sequence, I did a BLAST against it to check whether the new sequence already exists in the database. This gave me 52 results from a total of 85.

blastn -db dna.fasta.db -query temp.fasta -evalue 1e-3 -max_target_seqs 1 -outfmt '6 qseqid sseqid sstart send evalue'

If this has a result, the sequence is ignored.

On the second attempt I used blastclust. As I've read, I should get the same result. I used the same e-value in the config file

-e 1e-3

With this command, but I obtained 71 clusters (I expected 52) from a total of 85 sequences.

blastclust -i known.numbered.fasta -o known.numbered.fasta.cluster -p F -c config

Am I missing anything from balstclust? Documentation is very vague.

blast blastclust • 1.6k views
ADD COMMENT
0
Entering edit mode

I think blastclust has length coverage threshold (default = 0.9).

ADD REPLY
0
Entering edit mode

tried with that but the same

ADD REPLY
0
Entering edit mode

The same is also strange. I expected that the number of clusters might not be 52 but would be less than 71 if we remove length coverage threshold. Anyway, I have no idea except for the extra options of blastclust.

ADD REPLY

Login before adding your answer.

Traffic: 1155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6