Issue reading fasta file with Biopython
0
0
Entering edit mode
3.3 years ago
Rox ★ 1.4k

Hello everyone,

This should be very easy and I know it, but I am stuck with it and I cannot pinpoint my mistake.

I wanted a boolean python function to check if a given file is in fasta format. And this, without manually checking myself the extension (.fa, .fasta etc). I have found this solution which suited me. When parsing for needed files, my python script now use this "is_fasta" function.

My problem is that for some files it works, for some others it doesn't... When it doesn't I have an error of the sort when trying to read the fasta file :

UnicodeDecodeError: 'utf-8' codec cant decode byte 0xf3 in position 551: invalid continuation byte 
#or
UnicodeDecodeError: 'utf-8' codec cant decode byte 0x87 in position 23: invalid start byte

So I understand they might be something with the encoding of the file. I usually check it using the command file, but for files that works as for files that does not works, I get "ASCII text", and when asking for more information with file -i, he just print "regular file". So I don't see anything about utf-8 or so. And my comprehension of file format kind of stop here.

I am working in a conda environment I have made with several tools, the python version inside is 3.6.10. I have added biopython with regular conda command and the channel conda-forge.

Does anyone has an advice about this issue ? Or should I just revert to my original idea to just check the file extension ?

Thank you and have a nice day,

python • 1.7k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Hmmm, indeed good lead. I forgot to say I am on MAC environment (and totally new to it). It seems my LANG variable is empty... I will try to see if playing around that idea helps solving the issue.

ADD REPLY
0
Entering edit mode

Ah sadly this was not the issue. I use a custom bunch of setting for bash (zsh), and I followed how to properly set the locale following these steps here : https://github.com/ohmyzsh/ohmyzsh/issues/7558 . But yeah, now even with my LANG fixed, it is still not working and showing encoding errors :/

ADD REPLY

Login before adding your answer.

Traffic: 1234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6