I received imputed dosage files from the Michigan Imputation Server (minimac3). The files are vcf format and compressed (gz). I used DosageConvertor to convert the files to plink dosage files; these files are also compressed.
When I try to use a compressed plink dosage file (for example, fileName.plink.dosage.gz) in a linear regression using plink 1.07, plink returns the error "ERROR: Bad format fdr (sic) dosage file, expecting more columns". However, if I uncompress the file and run the same analysis, plink produces no error and completes the analysis. I did include the argument --Zin with the compressed file and omitted the argument when using the uncompressed file.
I used wc to count the number of lines in the compressed file, and the returned line count was what it should be. I counted the number of “words”, and the returned count was correct. The number of columns should equal the number of words / number of lines, since each line should have the same number of words. But since plink is not counting the correct number of columns, does this mean there is a delimiter missing or a delimiter where it should not be.
I believe plink files are white space (space or tab) delimited. Nonetheless, I used sed to change each tab to a single white space and consecutive white spaces to a single white space. But plink is still unable to read the compressed file.
Of course, I can run the analysis with uncompressed files, but it would be nice to keep the files compressed. Can anyone suggest what the problem might be?
All advice is appreciated, Paul
Hi Paul, I was wondering whether you managed to solve the problem of using compressed dosage files? I am at the same stage right now, having received my dosage files from the Michigan Imputation server. I have used DosageConverter to convert the files to plink dosage and now have a set of compressed plink.dosage files. I need to perform some QC filtering on these in terms of MAF and HWE and was wondering whether it is better to uncompress the files and perform these steps.