sequence similarity
2
0
Entering edit mode
3.7 years ago
lorenzinip • 0

Hi, I have a fasta file with hundred of sequences with a 300nt length. I would like to do check what's the similarity of one sequence against all the other sequences. Any suggestion on how to approach this? Thanks

blast linux fasta • 1.3k views
ADD COMMENT
0
Entering edit mode

are you starting from a multiple sequence alignment?

You might have a look at creating distance matrices (eg for phylogenetic studies) , though the 'distance' will often not be in %similarity but will give you a measure of the similarity

ADD REPLY
0
Entering edit mode
3.7 years ago
5heikki 11k

Use e.g. cd-hit or vsearch

ADD COMMENT
0
Entering edit mode

Another tool that can be used and is much faster is MMSeqs2.

ADD REPLY
0
Entering edit mode
3.7 years ago
Joe 21k

If your sequences are already aligned and/or the same length, or you do not want to align them, you can use some simple edit distance measures like the Levenshtein distance or other kmer based method.

This will be quick but will be less accurate, and won't necessarily capture meaningful biological patterns, but depending on your use case it may be appropriate.

I keep a few examples of string comparison metrics along with some implementation code here:

https://github.com/jrjhealey/bioinfo-tools/blob/master/StringComparisons.py

ADD COMMENT

Login before adding your answer.

Traffic: 2444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6