how to compare protein sequence in Python
1
0
Entering edit mode
10.4 years ago
Jason Lin • 0

Hi there,

So I have lists which contain the PDBID for protein sequence and protein sequence such as:

>102L MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL

>103L MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL

>104L MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSAAELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL

So I have about 1000 entries. I just want to make sure that all the sequence are unique. So I want to compare the sequence with different PDBID and delete or ignore the same protein sequence. I'm doing it in python. Could anyone help me?

Thank you so much guys!

sequence lists protein python • 4.3k views
ADD COMMENT
0
Entering edit mode

You need to tell us what have you tried so far? Show the python code. If you have not done anything so far, consider creating a python dictionary with PDBID as a key and sequence as a value and do some comparisons.

ADD REPLY
0
Entering edit mode

Check this post

ADD REPLY
0
Entering edit mode
10.4 years ago
João Rodrigues ★ 2.5k

Open your sequence file, create an empty list, create an empty set, iterate over your sequences, for each sequence you find, test if it isn't already in the set, if not add a tuple of sequence and identifier to the list and the sequence to the set.

Why a set? Because it is more efficient than a list for testing for membership.

Why not a dictionary? Because in this way you 1. preserve the order of your initial file, and 2. make the comparisons more efficient via the set. With a dictionary you would have to add the sequence as a key in order to test for membership, which is a bit counterintuitive.

To parse the sequences you might want to look at the BioPython project and the SeqIO module.

ADD COMMENT

Login before adding your answer.

Traffic: 2337 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6