Import All Protein Sequences In A Chromosome
1
0
Entering edit mode
12.7 years ago
User 3206 • 0

Hi guys I have managed to write up the code below in python that accesses a file with protein ids and import their sequences from genbank. I now wanted to write one that would import all the protein sequences in a given chromosome since I don't have all their ids. i.e number. to import the entire protein sequences given the chromosome number.

Any suggestions would be appreciated!

Thank you

<h6>#</h6>

from numpy import * z=genfromtxt('C:\Users\Mohammed\Desktop\ProteinIDs.txt', dtype='S12', delimiter=',', usecols=[0],unpack=True) exit

for i in range (500):

prot= '"%s"' %((z)[i])

print prot

from Bio import Entrez , SeqIO

Entrez.email = 'me@uga.edu'

handle = Entrez.efetch(db="protein", id="prot", rettype="fasta",retmode="text")

record = SeqIO.read(handle,"fasta")

String=str(record)

f= open('C:\Users\Mohammed\Desktop\protein_seqs\%s.txt' % (z)[i], 'w') for i in range (1): SeqIO.write(record, f, "fasta") print record f.close()

biopython • 2.8k views
ADD COMMENT
0
Entering edit mode

Can you indent your code? Put 4 spaces in front of it (this will make it a code block), and make sure that the for loops are indented correctly. This makes it easier for us to evaluate your code.

ADD REPLY
2
Entering edit mode
12.7 years ago
Alex ★ 1.5k

Take a look at the following code that sorts proteins by chromosome, the list of all proteins you can get for example from NCBI's ftp site.

from Bio import Entrez , SeqIO

protein_ids = [61677879, 61677880, 61677881, 60637879, 60637579]

Entrez.email = 'me@uga.edu'

chr_to_proteins = {"unknown":[]}

for pid in protein_ids:

    handle = Entrez.efetch(db="protein", id=pid, rettype="gb", retmode="text")
    record = SeqIO.read(handle, "gb")
    feature_found = False
    for feature in record.features:
        if hasattr(feature, "qualifiers"):
            if "chromosome" in feature.qualifiers:
                for chr in feature.qualifiers["chromosome"]:
                    if not feature_found:
                        chr_to_proteins.setdefault(chr, [])
                        chr_to_proteins[chr].append(record)
                        feature_found = True

    if not feature_found:
        chr_to_proteins["unknown"].append(record)

for chr, proteins in chr_to_proteins.items():
    print chr, len(proteins)
ADD COMMENT

Login before adding your answer.

Traffic: 2697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6