If GI number in list1 is present in genbank file, make new list and append location information
0
1
Entering edit mode
9.7 years ago
pawlowac ▴ 80

Hi all,

I'm fairly new to python.

I have a list of GI numbers (list 1) and my ultimate goal is to slice sequence data for 5 kb upstream and 5 kb downstream of my target CDS in a genbank file, which is also in a list (list 2). I've made an attempt to do this. I do not get any errors, however I do not get any response from printing the new list (combo_list).

I would really appreciate your help.

from Bio import SeqIO

def add_location(gi_list):
    gb_file = 'file.gb'
    gb_records = SeqIO.parse(open(gb_file,"r"), "genbank")
    combo_list = []
    for gb_record in gb_records :
        if gb_record.features :
            for feature in gb_record.features :
                if feature.qualifiers == "db_xref" :
                    for gi_list in gb_record :
                        start = feature.location._start.position
                        end = feature.location._end.postion
                        combo_list.append(start, end)
                        print combo_list
                        stop()
    return combo_list

def main():

    gi_file = "gi.txt"
    text_file = open("gi.txt", "r").readlines()
    gi_list = []
    for line in text_file :
        gi_list.append(int(line.strip()))

    location_gi_list = add_location(gi_list)

if __name__ == '__main__':
    main()

From Ram's assistance;

for genome in gbank:
    for gene in genome.features:
        if gene.type=="CDS": 
            if 'db_xref' in gene.qualifiers:
                for gi in gi_list:
                    if str(gi) in gene.qualifiers['db_xref'][0]:
python genbank biopython • 2.9k views
ADD COMMENT
0
Entering edit mode

Looks like qualifiers is a collection. == might not be the right way to compare if that's the case.

ADD REPLY
0
Entering edit mode

Thank you for pointing that out, I've now switched the line to

if "db_xref" in feature.qualifiers
ADD REPLY
0
Entering edit mode
I should mention that the code overall still does not work.
ADD REPLY
0
Entering edit mode

Documentation states that qualifiers is a key value pair. You might wanna check if the db_xref key exists and has a value. And please read the documentation - these are well documented modules you're using.

ADD REPLY
0
Entering edit mode

http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank/

db_xref does exist. It's the GI number and GeneID. I'm interested in matching GI numbers and pulling out the location associated with it in the GenBank file.

I have read the documentation.

And since I'm new at this, I want to make sure the documentation you're referring to is the biopython cookbook, right?

ADD REPLY
0
Entering edit mode

I know db_xref exists(duh), I was saying you should check if your features dict has a key named db_xref. And I'm not sure if it is the cookbook - try this: Google BioPython Bio Seq and check out Records and Features under Bio Seq GenBank.

ADD REPLY
0
Entering edit mode

How do I check if my features dict has a key named db_xref? I thought it was standard much like using 'CDS' and 'source'?

ADD REPLY
0
Entering edit mode

if key in dict is the syntax you're looking for. Please try Google, "dict has key" gives you the answer on the first link!

ADD REPLY
0
Entering edit mode

Gee, thanks for telling me so much to Google for something. Like I've already said, I'm new to this (literally started 2 days ago). If I understood the concept, I wouldn't be asking that question here, I would've googled for it...

ADD REPLY
0
Entering edit mode

Sarcasm is not a good strategy on forums, pawlowac. I stick by my statement. You learn these terms only by Googling stuff repeatedly and not making the same mistake twice. And you've written "how do I check if my dict has a key" - try typing that in google and check what it suggests.

ADD REPLY
0
Entering edit mode

Sorry, I shouldn't have been sarcastic. I was frustrated because you really weren't clarifying anything for me by just telling me to google things considering most of those websites go over my head. I guess the fault is partially mine since I am not asking specific questions to your responses. I also didn't appreciate you saying duh since it was obvious I completely misunderstood your question.

Reading back now, I think by this;

You might wanna check if the db_xref key exists and has a value.

you were talking about a particular syntax and not telling me to look if db_xref exists in general or in my genbank file. Correct?

If that's the case, what is the dict? does the code read;

if db_xref in gi_list

Since I am looking for gi number from gi_list within gb_records and want to extract a feature associated with it?

ADD REPLY
0
Entering edit mode

It's OK, pawlowac, we are all used to plenty of misinterpretations :)

Think of the GenBank object like this:

GenBank object can have any number features - 0 or more. So, you should check if the genbank object has any valid features entry.

Each feature has multiple qualifiers, where one or more of the qualifiers might have a key "db_xref". So, you're looking for something along the lines of if "db_xref" in feature.qualifiers print(feature["db_xref"]

ADD REPLY
0
Entering edit mode

Got it! Thanks for your direction Ram. (Sorry for the late response, benchwork got in the way)

for genome in gbank:
    for gene in genome.features:
       if gene.type=="CDS": 
            if 'db_xref' in gene.qualifiers:
               for gi in gi_list:
                    if str(gi) in gene.qualifiers['db_xref'][0]:
ADD REPLY

Login before adding your answer.

Traffic: 2311 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6