compare two list of nucleotide sequences
1
0
Entering edit mode
6.1 years ago
karthic ▴ 130

Dear all,

I have a list of nucleotide fasta sequences which has to be compared to another list of nucleotide sequences and output which two sequences are an exact match without any gaps or mismatches.

Eg.

List A

>query1
sequence
>query2
sequnce
>query3
sequence

List B

    >subject1
    sequence
    >sub2
    sequence
    >sub3
    sequence

Example output

query2 == sub3

I want to know, if the solution already exists, if not how shall I approach it

Thanks KK

blast bowtie mummer sequence • 2.6k views
ADD COMMENT
3
Entering edit mode
6.1 years ago
Joe 21k

You can do this easily with BioPython.

This is some code just off the top of my head so may not be 100% correct, but should get you started.

import sys
from Bio import SeqIO

GenesA = list(SeqIO.parse(sys.argv[1], ‘fasta’))
GenesB = list(SeqIO.parse(sys.argv[2], ‘fasta’))

matches = []
for i in GenesA:
    for j in GenesB:
        if i.seq == j.seq:
             matches.append(i)

for seq in matches:
    print seq.format('fasta')

This isn’t the fastest or most elegant way to do it, but it is safe because it doesn’t assume your lists are the same length or anything.

ADD COMMENT
1
Entering edit mode

Hello jrj.healey ,

but this would print out all sequences from GenesA that have a match in GenesB, wouldn't it. If I understood OP correct, (s)he want's a table o ids which have the same sequence. So your code should look something like this:

import sys
from Bio import SeqIO

GenesA = list(SeqIO.parse(sys.argv[1], ‘fasta’))
GenesB = list(SeqIO.parse(sys.argv[2], ‘fasta’))

matches = []
for i in GenesA:
    for j in GenesB:
        if i.seq == j.seq:
             matches.appendi.id+" == "+j.id)

for m in matches:
    print(m)

(Also just off the top of my head)

fin swimmer

Edit: There seems to be a bug in biostars. It removes the open brace here matches.append(.

ADD REPLY
0
Entering edit mode

Right you are, I didn’t read that bit properly.

OP will have access to the whole SeqRecord object inside matches and can print it out however they wish (I originally had the printing lines commented out because I wasn’t sure what output was desired (e.g. to a file or to screen etc)).

ADD REPLY
0
Entering edit mode

Hi fin swimmer

I have noticed that and added that open brace, thank you for the solution.

Thanks KK

ADD REPLY
0
Entering edit mode

Dear Jrj Healey and fin swimmer,

Thank you for the solution and the code. It is working and the results are just as I expected.

Thank you very much KK

ADD REPLY
0
Entering edit mode

Ok great, you're welcome.

Be sure to accept the answer (the check mark on the left of the post) if this is resolved for you.

ADD REPLY
0
Entering edit mode

Hi jrj healey,

Thanks for the code, I will start with that and let you know.

Thanks KK

ADD REPLY

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6