Question

blastn output and methods to quantify similarities between sequences

0

Entering edit mode

9.3 years ago

cl10101 ▴ 80

I would like to quantify similarities between sequences using blastn from package BLAST+, but I'm not sure if it is a right tool. The output I get from blastn contains only statistically significant alignments and their p-values. Is there any way to get alignments for every sequence in database?
I have approximately 100 000 sequences for which I'm trying to quantify the similarity between each pair of sequences and my database for blastn contains only these sequences.

blast output • 2.8k views

ADD COMMENT • link updated 9.3 years ago by satanicodr ▴ 160 • written 9.3 years ago by cl10101 ▴ 80

Ram · Answer 1 · 2016-01-07

0

Entering edit mode

9.3 years ago

satanicodr ▴ 160

What you really want is to align all your sequences together and create a huge distance matrix (100k x 100k). Then you can use this matrix to do a clustering analysis to see how many groups you have at a defined level of similarity. Alternatively you can use a greedy algorithm such as cd-hit that will avoid the distance matrix step which is slow.

You can also use cd-hit to dereplicate your original group of sequences and only align the representative groups.

ADD COMMENT • link 9.3 years ago by satanicodr ▴ 160

0

Entering edit mode

Thank you for the response but how to create distance matrix for all sequences using blastn, which gives me e-value or bit-score for only significant alignments.

ADD REPLY • link 9.3 years ago by cl10101 ▴ 80

0

Entering edit mode

Blast is not the right tool because there is a limit on how many sequences it will report. In theory you could make one database for each sequences or divide the sequences into many small groups and then run blast for each sequence against that database but that would be inefficient.

A more efficient way is to use a program such as MOTHUR once you have the sequences aligned. Then you will get your distance matrix and will be able to play with that data. MOTHUR is used for microbiome analysis where you compare libraries of the same gene and define groups based on similarity.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.3 years ago by satanicodr ▴ 160