Python 3.4, BioPython 1.65: Specifying dictionary keys in FASTA file and writing sequences with certain key values to file
1
1
Entering edit mode
9.8 years ago

I have a multi-fasta file containing sequences with headers such as:

>AAST01014508.1|(1..6240)|LTR/Pao|ROOA_I-int:ROOA_LTR
>AAST01026747.1|c(2745..6820)|LTR/Pao|ROO_I-int

In the second section of the header where the position is given I want to index all of the sequences using this and print to file all those sequence from the C strand, e.g. >AAST01026747.1|c(2745..6820)|LTR/Pao|ROO_I-int

I was able to create a function easily which did this:

def get_compstrandTE(record):
    parts=record.id.split("|")
    assert len(parts) ==4
    return parts[1]

However, I am now stuck as to how to search through the keys in the dictionary and find only those containing the 'c' and write those to a file. I tried using the example from the BioPython manual but kept running into difficulties.

If anyone has any suggestions I would really appreciate it.

BioPython Python • 3.1k views
ADD COMMENT
2
Entering edit mode
9.8 years ago
Peter 6.0k

You don't need a dictionary for this task.

Something like this should work, loosely based on of the filtering examples from the Biopython Tutorial using a generator expression:

from Bio import SeqIO
input_file = "big_file.fasta"
output_file = "complements.fasta"

def wanted(record):
    """Returns True if name scheme suggests from complement stand."""
    parts = record.id.split("|")
    assert len(parts) == 4
    return parts[1].startswith("c")

records = (r for r in SeqIO.parse(input_file, "fasta") if wanted(r))
count = SeqIO.write(records, output_file, "fasta")
print("Saved %i records from %s to %s" % (count, input_file, output_file))
ADD COMMENT
1
Entering edit mode

Wonderful, thank again!

ADD REPLY

Login before adding your answer.

Traffic: 1616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6