Hello, Can somebody tell me how to uncompress 1000 Genome vcf.gz files? I am performing an RNA-editing analysis and would like to substract annotated SNPs/INDELs. I have already done so using dbSNP data with bedtools instersect, but am still stuck with the 1000 Genome Project *.vcf.gz files. I downloaded these for each chromosome and then concatenated them. These files are in a format that gunzip/gzip -d wont recognize. I tried using this file unzipped in bedtools intersect but it wasn't reconized. Many thanks,
"I downloaded these for each chromosome and then concatenated them" - what files are you referring to here? And can you link to an example compressed file? There is no good reason why gunzip will not work on a .gz file, unless the file is corrupt or not actually a .gz file. What error message does gunzip give?
I downloaded the *.vcf.gz files per chromosome from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ I tried unzipping each file separately but failed. I therefore concatenated them all, and tried to unzip, unsuccessfully. I did download them twice to discard the files being corrupt. The error messages I get are "not in gunzip or gzip" format.
Don't concatenate zipped files, that will never work. EDIT: I'm wrong, see Pierre's comment below. The *.vcf.gz files should definitely be uncompressible by gzip.
Most of my money would be on some type of user error. How exactly are you trying to unzip them? Can you give us an example file that doesn't work for you?
I suppose there's a tiny chance you're on a strange computer system with an unhappy version of gzip. What OS/platform are you on?
"Don't concatenate zipped files, that will never work." : In fact, apart from the current problem, that could work: http://stackoverflow.com/questions/8005114/fast-concatenation-of-multiple-gzip-files
Nice! I didn't know that. I take back my overly broad claim.
It looks like it will concatenate the uncompressed contents into a single file, which is only sometimes what you want... but useful to know nonetheless.
I believe that this the property that that makes block compression BGZF (used in BAM compression) possible see BGZF - Blocked, Bigger & Better GZIP!