Parsing FASTA file using class in Python
3
0
Entering edit mode
7.7 years ago
mrth ▴ 30

Hello, I am new to the world of Biopython and Python in general. I am try to parse a fasta file using class. I have got the following code so far:

from itertools import groupby

class OpenFastaFile:
    def __init__(self, path):
        self.path = path
        self._map = {}
        __fasta_sequences = self.__fasta_iter()

    def __str__(self):
        return self._map.__str__()

    def __fasta_iter(self):
        fh = open(self.path)
        faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
        for header in faiter:
            header = header.__next__()[1:].strip()
            seq = "".join(s.strip() for s in faiter.__next__())
            self._map[header] = seq

of = OpenFastaFile("sample.fa")
print(of)

However, I receive this output:

Traceback (most recent call last):
  File "UniProtFile.py", line 25, in <module>
    of = OpenFastaFile("sample.fa")
  File "UniProtFile.py", line 12, in __init__
    __fasta_sequences = self.__fasta_iter()
  File "UniProtFile.py", line 21, in __fasta_iter
    header = header.__next__()[1:].strip()
AttributeError: 'itertools._grouper' object has no attribute '__next__'

Process finished with exit code 1

My expected output was something along the lines of a dictionary like this: {'name' : 'ACCAGT' , 'name1' : 'ACGGCTA', etc}

Can someone please show me the error of my ways?

Python Biopython Class Parsing FASTA • 3.9k views
ADD COMMENT
0
Entering edit mode

I would guess that faiter in the following line should be header:

seq = "".join(s.strip() for s in faiter.__next__())
ADD REPLY
0
Entering edit mode

Thanks for replying! Unfortunately, it still gives the same error message.

ADD REPLY
0
Entering edit mode

Is there any reason that you're trying to implement this as a class?

ADD REPLY
3
Entering edit mode
7.7 years ago
mrth ▴ 30

For those wondering, I have solved the problem. I used this code:

from itertools import groupby
class FastaFile:
    def __init__(self, path):
        self.path = path
        self._map = {}
        self.__fasta_iter()
    def __str__(self):
        return self._map.__str__()
    def __fasta_iter(self):
        fh = open(self.path)
        faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
        for header in faiter:
            header = header.next()[1:].strip()
            seq = "".join(s.strip() for s in faiter.next())
            self._map[header] = seq
ff = FastaFile("sample.fa")
print (ff)
ADD COMMENT
2
Entering edit mode
7.7 years ago

If you want a fasta file to act like a sequence dictionary, just use pyfaidx:

import pyfaidx
fa = pyfaidx.Fasta("sample.fa")
for key in fa:
  print(key) # sequence name
  print(fa[key]) # sequence object

You'll be using an efficient method that doesn't read all of your sequences into memory unless you access them.

ADD COMMENT
0
Entering edit mode

Thanks for this information. I never knew pyfaidx existed.

ADD REPLY
1
Entering edit mode
7.7 years ago

Hi you can try my following code to generate your result:

   from Bio import SeqIO
      seqdic={}
       with open('sample.fa', 'r') as input_fasta_file:
            for seq_record in SeqIO.parse(input_fasta_file, 'fasta'):
                header = seq_record.id
                seqs = str(seq_record.seq)
                seqdic[header]=seqs
  
ADD COMMENT
1
Entering edit mode

As a comment: SeqIO.parse also takes a filename as input, not necessarily a file handle. So you could do

for seq_record in SeqIO.parse('sample.fa', 'fasta'):

You could also "simplify" your code using a dict comprehension, faster and more concise.

seqdic={seq_record.id: str(seq_record.seq) for seq_record in SeqIO.parse('sample.fa', 'fasta')}
ADD REPLY
1
Entering edit mode

I was going to post a code similar to this, but the OP's question seemed as if it was an assignment because it is overly complicated.

ADD REPLY
0
Entering edit mode

You were correct. It is for an assignment. I usually use with open to parse my files but need to try something new this time around - classes.

ADD REPLY
0
Entering edit mode

Hi there, I really appreciate the response! However, I'm trying to use class with magic methods to parse it as I need to add it somewhere within my code (for an assessment). I feel as though I am really close but yet so far away!

ADD REPLY

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6