Parsing GFF with biopython throws error
0
0
Entering edit mode
4.4 years ago
Juliofdiaz ▴ 140

I using bio python to parse a GFF file, and I am using some of the sample code I found in their website (Basic GFF parsing section).

from BCBio import GFF

in_file = "my_genome.gff"

in_handle = open(in_file)
for rec in GFF.parse(in_handle):
    print(rec)
in_handle.close()

When I run it on my system, I get the following error:

Traceback (most recent call last):
  File "/home/zoo/zool2417/anaconda2/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 42, in __init__
    self.stream = open(source, "r" + mode)
TypeError: expected str, bytes or os.PathLike object, not FakeHandle

During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "test.py", line 6, in <module>
        for rec in GFF.parse(in_handle):
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 746, in parse
        target_lines):
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 322, in parse_in_parts
        for results in self.parse_simple(gff_files, limit_info, target_lines):
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 343, in parse_simple
        for results in self._gff_process(gff_files, limit_info, target_lines):
      File "/home/zoo/zool2417/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 637, in _gff_process
        for out in self._lines_to_out_info(line_gen, limit_info, target_lines):
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 699, in _lines_to_out_info
        fasta_recs = self._parse_fasta(FakeHandle(line_iter))
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 560, in _parse_fasta
        return list(SeqIO.parse(in_handle, "fasta"))
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 627, in parse
        i = iterator_generator(handle)
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py", line 181, in __init__
        super().__init__(source, alphabet=alphabet, mode="t", fmt="Fasta")
      File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 46, in __init__
        if source.read(0) != "":
    TypeError: read() takes 1 positional argument but 2 were given

I am running python v. 3.7 and bio python v. 1.77, and bobio-gff v. 0.6.6 installed with bioconda

Any clues?

gff biopython • 2.5k views
ADD COMMENT
0
Entering edit mode

Hi,

Not related with your problem, but you forgot a bracket in print.

António

ADD REPLY
0
Entering edit mode

Thanks, fixed it in the post

ADD REPLY
0
Entering edit mode

I didn't forget the bracket (in my comment) and it still wasn't displayed in the code block. It works with extra spaces.

ADD REPLY
0
Entering edit mode

For me, this seems to work with open and without open:

from BCBio import GFF
from tempfile import NamedTemporaryFile as TempFile

gff = """
X   Ensembl Repeat  2419108 2419128 42  .   .   hid=trf; hstart=1; hend=21
"""


with TempFile() as t:
    t.write(gff.encode())
    t.flush()

    for x in GFF.parse( open( t.name ) ):
        print("OK!", len(x))

    for x in GFF.parse( t.name ):
        print("OK!", len(x))

bcbio-gff-0.6.6, biopython==1.77

ADD REPLY
0
Entering edit mode

When I try your code I get a different exception:

Traceback (most recent call last):
  File "test2.py", line 13, in <module>
    for x in GFF.parse( open( t.name ) ):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 746, in parse
    target_lines):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 322, in parse_in_parts
    for results in self.parse_simple(gff_files, limit_info, target_lines):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 343, in parse_simple
    for results in self._gff_process(gff_files, limit_info, target_lines):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 637, in _gff_process
    for out in self._lines_to_out_info(line_gen, limit_info, target_lines):
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 667, in _lines_to_out_info
    results = self._map_fn(line, params)
  File "/home/zoo/user/anaconda2/lib/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 177, in _gff_line_map
    assert len(parts) >= 8, line
AssertionError: X   Ensembl Repeat  2419108 2419128 42  .   .   hid=trf; hstart=1; hend=21
ADD REPLY
0
Entering edit mode

That's because tabs in the string gff = "..." didn't survive the copy-pasting. This line gff = "\t".join(gff.split()) would fix it.

Anyway, the message was: try without open.

ADD REPLY

Login before adding your answer.

Traffic: 2894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6