Entering edit mode
2.7 years ago
mrmrwinter
▴
30
Hi,
When trying to load a genbank file into DNA Features Viewer i get the following error message.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/File.py:72, in as_handle(handleish, mode, **kwargs)
71 try:
---> 72 with open(handleish, mode, **kwargs) as fp:
73 yield fp
TypeError: expected str, bytes or os.PathLike object, not TextIOWrapper
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Input In [6], in <module>
7 # record.plot(figure_width=12);
9 fig, (ax1, ax2) = plt.subplots(
10 2, 1, figsize=(12, 3), sharex=True, gridspec_kw={"height_ratios": [4, 1]}
11 )
---> 15 fullseq = SeqIO.read("flanking_genes_annotations_annotations.gb", "genbank")
16 graphic_record = BiopythonTranslator().translate_record(fullseq)
17 graphic_record.plot(ax=ax1, with_ruler=False, strand_in_label_threshold=4)
File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/SeqIO/__init__.py:654, in read(handle, format, alphabet)
652 iterator = parse(handle, format, alphabet)
653 try:
--> 654 record = next(iterator)
655 except StopIteration:
656 raise ValueError("No records found in handle") from None
File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py:74, in SequenceIterator.__next__(self)
72 def __next__(self):
73 try:
---> 74 return next(self.records)
75 except Exception:
76 if self.should_close_stream:
File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/GenBank/Scanner.py:516, in InsdcScanner.parse_records(self, handle, do_features)
514 with as_handle(handle) as handle:
515 while True:
--> 516 record = self.parse(handle, do_features)
517 if record is None:
518 break
File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/GenBank/Scanner.py:499, in InsdcScanner.parse(self, handle, do_features)
493 from Bio.GenBank.utils import FeatureValueCleaner
495 consumer = _FeatureConsumer(
496 use_fuzziness=1, feature_cleaner=FeatureValueCleaner()
497 )
--> 499 if self.feed(handle, consumer, do_features):
500 return consumer.data
501 else:
File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/GenBank/Scanner.py:465, in InsdcScanner.feed(self, handle, consumer, do_features)
458 return False
460 # We use the above class methods to parse the file into a simplified format.
461 # The first line, header lines and any misc lines after the features will be
462 # dealt with by GenBank / EMBL specific derived classes.
463
464 # First line and header:
--> 465 self._feed_first_line(consumer, self.line)
466 self._feed_header_lines(consumer, self.parse_header())
468 # Features (common to both EMBL and GenBank):
File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/GenBank/Scanner.py:1571, in GenBankScanner._feed_first_line(self, consumer, line)
1569 consumer.size(line.split()[-2])
1570 else:
-> 1571 raise ValueError("Did not recognise the LOCUS line layout:\n" + line)
ValueError: Did not recognise the LOCUS line layout:
LOCUS PGA_scaffold_21__1_ 23-FEB-2022
From what i can find online, this is because the LOCUS line in the genbank file is not formatted correctly, so Biopython is throwing an error.
This genbank file was created by exporting annotations from UGENE genome browser.
Is there a way around this? Is thbere a format i can jump through that will transform it properly?
Thanks
Your LOCUS line is missing some data. I am guessing that must be a part of the problem.
What other formats can you export from UGENE?
Yeah it is. My LOCUS line reads:
UGENE exports in BED, CSV, GTF, GFF, and a few other niche ones.
Depends on what you were trying to do then. Perhaps you could use GTF/GFF formats.
Agreed. I'll try and pull a .bed or .gff out and convert it to a genbank file elsewhere. Thanks
perhaps fixing up the locus line solves the problem, it just needs a few more fields
https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
This was happening due to a mismatch between the scaffold name in the genbank file and the scaffold sequence in the genbank file.