Entering edit mode
2.0 years ago
Eliza
▴
40
Hi, I downloaded some data from gnomad - https://gnomad.broadinstitute.org/downloads. it comes in the form of VCF.bgz file and I would like to read it as a vcf file. found some code to do it in python :
#!/usr/bin/env python3
import gzip
ifile = gzip.GzipFile("gnomad.genomes.r2.1.1.sites.2.vcf.bgz")
ofile = open("truncated.vcf", "wb")
LINES_TO_EXTRACT = 100000
for line in range(LINES_TO_EXTRACT):
ofile.write(ifile.readline())
ifile.close()
ofile.close()
i tried it on my data :
import gzip
ifile = gzip.GzipFile("gnomad.exomes.r2.1.1.sites.4.vcf.bgz")
ofile = open("truncated.vcf", "wb")
LINES_TO_EXTRACT = 1000
for line in range(LINES_TO_EXTRACT):
ofile.write(ifile.readline())
ifile.close()
ofile.close()
but it returned an error :
FileNotFoundError: [Errno 2] No such file or directory: 'gnomad.exomes.r2.1.1.sites.4.vcf.bgz'
although I downloaded it to my computer.
is there any way to open vcf.bgz file as vcf file in python?
your path to the file is wrong
please read about tbi files. http://www.htslib.org/doc/tabix.html
@Pierre Lindenbaum , thank you but i still dont understand what path should i use
You're trying to read in the index file
.tbi
, that's not what you want. Download and read in the file with just the.vcf.bgz
extension without the.tbi
4galaxy77 i tried this and got an error : No such file or directory: 'gnomad.exomes.r2.1.1.sites.4.vcf.bgz' but the file exist on my PC
That means you aren't in the right directory, if the file is definitely in your PC. Have a read of this.
4galaxy77 I tried what you suggested changed the directory to the correct path it still desnt work
Can't help you anymore then I'm afraid, you must being doing something wrong. If your command is right, you're in the right directory and the file exists, then it will work, so one of the three has to be wrong.
It looks like you have not saved the python file. What is the name of the python script? The python program needs to have a filename before you can run the script. If the python program is untitled.py or it has not been saved first, then the program will not find a .bgz file or write a .vcf file to your computer. So the python script needs to be saved (first) in the same folder as the .bgz file. Save the python file, and then run the program.