blastn output and methods to quantify similarities between sequences
1
0
Entering edit mode
8.9 years ago
cl10101 ▴ 80

I would like to quantify similarities between sequences using blastn from package BLAST+, but I'm not sure if it is a right tool. The output I get from blastn contains only statistically significant alignments and their p-values. Is there any way to get alignments for every sequence in database?
I have approximately 100 000 sequences for which I'm trying to quantify the similarity between each pair of sequences and my database for blastn contains only these sequences.

blast output • 2.7k views
ADD COMMENT
0
Entering edit mode
8.9 years ago
satanicodr ▴ 160

What you really want is to align all your sequences together and create a huge distance matrix (100k x 100k). Then you can use this matrix to do a clustering analysis to see how many groups you have at a defined level of similarity. Alternatively you can use a greedy algorithm such as cd-hit that will avoid the distance matrix step which is slow.

You can also use cd-hit to dereplicate your original group of sequences and only align the representative groups.

ADD COMMENT
0
Entering edit mode

Thank you for the response but how to create distance matrix for all sequences using blastn, which gives me e-value or bit-score for only significant alignments.

ADD REPLY
0
Entering edit mode

Blast is not the right tool because there is a limit on how many sequences it will report. In theory you could make one database for each sequences or divide the sequences into many small groups and then run blast for each sequence against that database but that would be inefficient.

A more efficient way is to use a program such as MOTHUR once you have the sequences aligned. Then you will get your distance matrix and will be able to play with that data. MOTHUR is used for microbiome analysis where you compare libraries of the same gene and define groups based on similarity.

ADD REPLY

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6