Hi,
Is there a way to calculate the overlapping nucleotides numbers between every two DNA sequences by doing multiple sequence alignment? I would like to get a matrix output showing the overlapping nucleotide numbers. Thank you!
Hi,
Is there a way to calculate the overlapping nucleotides numbers between every two DNA sequences by doing multiple sequence alignment? I would like to get a matrix output showing the overlapping nucleotide numbers. Thank you!
Use Genedoc program that runs under Windows. It is extremely flexible and configurable, to that extend, that you will need to go to the tutorial to learn from it. But it will give you that matrix you are asking for
If you know Julia, you can use OpenGene(https://github.com/OpenGene/OpenGene.jl) to do this
using OpenGene, OpenGene.Algorithm
r1=dna("CAGCTTGAGGCTGAAGTTCTCCTTCTTCAGGTCATTGAGGTGCTGGGACAGAGTGCGATATCCATTAGACATGATGGGCAACCCATGGGGAGGAGCGTGCCCGATTGCCCCCTCAACCCAGGAACATGGTGCAGCTGCACCGCA")
r2=dna("AGCCGGCTGGCTTCATGCTGCGGTGCAGCTGCACCATGTTCCTGGGTTGAGGGGGCAATCGGGCACGCTCCTCCCCATGGGTTGCCCATCATGTCTAATGGATATCGCACTCTGTCCCAGCACCTCAATGACCTGAAGAAGGA")
offset, overlap_len, distance = overlap(r1, r2)
# it returns (19,125,0), which means 125 base pairs are overlapped
Note that this overlap function takes r1 as forward strand and r2 as reverse complement strand, which is the output of pair-end sequencing.
If you want to implement this by yourself, you can use editdistance method to do this, here I give u a piece of python code, which is used in AfterQC(https://github.com/OpenGene/AfterQC):
COMP = {"A" : "T", "T" : "A", "C" : "G", "G" : "C", "a" : "t", "t" : "a", "c" : "g", "g" : "c", "N":"N"}
def complement(base):
return COMP[base]
def reverseComplement(origin):
length = len(origin)
revCompArr = ['' for x in xrange(length)]
for i in xrange(length):
orig = origin[length - i -1]
if orig in COMP:
revCompArr[i] = COMP[orig]
else:
revCompArr[i] = 'N'
return ''.join(revCompArr)
def overlap(r1, r2):
len1 = len(r1)
len2 = len(r2)
reverse_r2 = reverseComplement(r2)
overlapped = False
overlap_len = 0
offset = 0
distance = 0
# a match of less than 10 is considered as unconfident
for offset in xrange(0,len1-10):
# the overlap length of r1 & r2 when r2 is move right for offset
overlap_len = min(len1-offset, len2)
# remind that Julia is a 1-based coordination system
distance = editDistance(r1[offset : offset+overlap_len], reverse_r2[0 : overlap_len])
if distance <= distance_threshold(overlap_len):
# now we find a good candidate
# we verify it by moving r2 one more base to see if the distance is getting longer
# if yes, then current is the best match, otherwise it's not
next = offset + 1
next_overlap_len = min(len1-next, len2)
distance2 = editDistance(r1[next : next+next_overlap_len], reverse_r2[0 : next_overlap_len])
if distance <= distance2:
overlapped = True
break
if overlapped and offset == 0:
# check if distance can get smaller if offset goes negative
# this only happens when insert DNA is shorter than sequencing read length, and some adapter/primer is sequenced but not trimmed cleanly
# we go reversely
for offset in xrange(0,-(len2-10), -1):
# the overlap length of r1 & r2 when r2 is move right for offset
overlap_len = min(len1, len2- abs(offset))
distance = editDistance(r1[0:overlap_len], reverse_r2[-offset : -offset + overlap_len])
if distance <= distance_threshold(overlap_len):
next = offset - 1
next_overlap_len = min(len1, len2- abs(next))
distance2 = editDistance(r1[0:next_overlap_len], reverse_r2[-next : -next + next_overlap_len])
if distance <= distance2:
return (offset, overlap_len, distance)
elif overlapped:
return (offset, overlap_len, distance)
return (0,0,0)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please clearify what you are meaning by 'overlap'. The term overlap is usually used to indicate the relative position of two contiguous stretches of sequence, for example the overlap between two annotated features. You may want to calculate the number of identical positions of two sequences in a multiple sequence alignment. Nearly every phylogenetical program will do this. This task is also a nice exercise if you are learning programming.