Blastclust Output Problem.
1
0
Entering edit mode
13.2 years ago
Pawan_K • 0

HI,

I have installed the blast standalone latest version 2.2.25 with the help of the installation guide http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/pc_setup.html. I ran the blast with a 14Mb fasta file by changing the parameters: -S 1.5 -L 0.9. But I could not get the cluster properly and I got a cluster file containing only 3 clusters and an error.log empty text file after running. Even the no. of sequences are reading while running. But When I tried online submission of some of my sequences, I got few clusters. Kindly help out the problem to get the correct clusters Using BLASTCLUST. But other different blast of the package except this are not getting correct result. Kindly help me as soon as possible.

with best regards, K. Pawankumar

blast • 4.2k views
ADD COMMENT
1
Entering edit mode
13.2 years ago
Suren ▴ 110

If you can provide the problem in little detail, it would easy to suggest the solution.

While running BLASTCLUST, If the sequence header is too long, it renames the header starting with Text like "Temp...." in the result file so make sure the header is not too long.

I hope, you know that BLASTCLUST do not produce "ready-to use" FASTA formatted file and also you understand the result output. Just for information, the number of lines denotes number of clusters. All the sequences clustered together in a cluster are described in one line. Cluster with most number of sequences will be on top and thus in decreasing order from top to down direction, if the number of sequences are equal for clusters, the alphabetical order come into effect. So before running BLASTCLUST, name your sequence header appropriately to correctly distinguish sequences falling in each cluster.

The parameters you are running is "-S 1.5 -L 0.9"

-S parameter

if < 3 then the threshold is set as a BLAST score density
(0.0 to 3.0; default = 1.75)
if >=3 then the threshold is set as a percent of identical
residues (3 to 100)

Try strict or relaxed percent identity / score density to see the change in the number of cluster returned.

You can also use CD-HIT and USEARCH for clustering sequences. Both are must faster than BLASTCLUST.

ADD COMMENT

Login before adding your answer.

Traffic: 1736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6