python version Python 3.5.4
Trying to process vcf file as shown in the code below:
with open(myfile, 'r') as datain:
for myline in datain:
if myline.startswith("#"):
pass
else:
do something
In some line (after processing couple of hundreds of lines) it raises this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 8077: invalid start byte
Any idea how to fix this error?
Did you try this? - https://stackoverflow.com/questions/22216076/unicodedecodeerror-utf8-codec-cant-decode-byte-0xa5-in-position-0-invalid-s
This se answer is related, but dealing with a python 2 encoding problem. python 3 behaviour is fairly different in this area.
As you mentioned it is already decoded.
@ Kevin Blighe If I used the code like:
myline = myline.encode('utf-8').strip()
It will give this error:TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Then I could usestr()
but this will not help in fixing the main issue.Are you using an European locale setting by any chance? What does
locale
produce?You may need to force the locale to be UTF-8. See this: https://unix.stackexchange.com/questions/87745/what-does-lc-all-c-doThe error message shows that python is using utf8 to decode. The question is what codec of the input