How to find if sequences perfectly overlap
0
0
Entering edit mode
3.8 years ago
gero.knittel ▴ 10

Hi all,

I'm trying to figure out a way to identify pairs of short sequences that have a perfect overlap.

This can be either one being contained in the other, such as:

  1. AAACCCTTTGGG
  2. CCTTTG

They can also be overlapping, such as:

  1. AAACCCTTTGGG
  2. TTTGGGTCGA

I want to differentiate those scenarios from situations where the match is not perfect ( a single gap or mismatch needs to directly disqualify the pairwise comparison). Basically I need to know whether they can both stem from the same template, but they don't need to be from the same position on the template.

I've been playing around with the penalty parameters of pairwise2, but I couldn't find a way that would allow me to write an if/else statement to automatically decide whether the sequences have a perfect overlap or not. The sequences differ in length, and also the overlapping regions differ, so I cannot just set a constant score threshold.

I would be great, if someone could help me out here. I'm sure, this is an easy exercise for many of you.

Best and many thanks in advance! Gero

biopython pairwise2 • 556 views
ADD COMMENT
0
Entering edit mode

I want to differentiate those scenarios from situations where the match is not perfect ( a single gap or mismatch needs to directly disqualify the pairwise comparison).

So, are you interested in retaining examples where there is a mismatch (of 1, or of 1 or greater?). If you are only interested in perfect matches this will be quite a lot easier.

ADD REPLY

Login before adding your answer.

Traffic: 1982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6