Entering edit mode
6.8 years ago
Teresa
▴
20
Hi, I performed imputation on my GWAS data using Michigan imputation server. Now I have two output files: 1).dose.vcf.gz and 2).info.gz
Michigan imputation server use mimimac3 (--format GT,DS,GP) and in the output file ".dose.vcf.gz" are present all the three formats. I'm new on this kind of analysis, so I need some advice to manage and anayze this data.
I used DosageConvertor to convert them into PLINK dosage format using this command:
$DosageConvertor --vcfDose myfile.vcf.gz --info myfile.info.gz --prefix out_myfile --type plink --format 1
output files: 1) .dosage and 2) .fam.
To perform quality control steps on the imputed data, is this command right?
fcgene --dosage out_myfile.dosage --fam out_myfile.fam --filter-snp crate=0.9,hwe=1e-6,maf=0.05 --filter-indiv crate=0.9 --rsq 0.3 --oformat plink-dosage --out myfile_QC
Have you looked through the manual under 'Chapter 4 - Quality control and format conversion of plink-formatted data'? - see page 20 of http://www.bx.psu.edu/~giardine/tests/tmp/fcgene-1.0.7.pdf
Your filtering criteria per SNP are
Your filtering criteria per sample is:
For the
--rsq
parameter, you have to take note as it's not what would be typically expected for such a study. The manual states:I would like to work with dosage genotypes rather than hard call genotypes. It is really difficult for me to find a way to obtain a final data with dosage genotypes (calculated in the imputation process performed by Michigan imputation server) and at the same time taking into account QC steps pe- marker and per- individual and the quality of imputation.
It seems that fcgene is able to perform QC steps on dosage data but the option ---rsq does not work on my input file. Rsq is an indicator of the quality of imputation so I think is really important to consider this parameter. But I dont't know how.
Nevertheless, I found a way to work with hard call genotype (even if it is not what I wanted in the first place). Below the code I used:
QC steps using PLINK
Is there a way to perform this steps on the dosage data?
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.This comment belongs under @Kevin's answer.
Hi Teresa,
I was wondering if you have noticed that there is a discrepancy in the number of variants in the s4.bim file and when it is loaded for the next step to generate s5?
Here is the example:
I couldn't figure out where are the remaining variants (a total of 105,454). I have tried in 2 other different chromosomes and I noticed the same issue.
Was wondering if anyone had similar issue?
Hi Teresa, I am interested in if you have found a way to solve the problem with the dosage data.
I wonder what did you do with your DosageConvertor output file? How did you QC on that? Thanks!
Hello Molly and Teresa, I am wondering whether you have figured out how to perform QC steps on the dosage data received after using the DosageConvertor tool? In case you have succeeded please do let me know. Thank you !!
Can you confirm that you have first contacted Michigan Imputation Server for help, please?
Hi,
I have imputed my data using minimac4 on the Michigan imputation server. Is Dosage convertor the only way to produce plink files? I have tried doing this using plink --vcf but it provides an odd output like this (where 2nd column has chr:pos:A1:A2;snp) - basically an invalid bim file.
I would really appreciate any advice.
Please show all commands used, please. Also confirm that you have first contacted Michigan Imputation Server.
DosageConverter says nothing about producing a bim file - where did you read that? https://genome.sph.umich.edu/wiki/DosageConvertor#Convert_to_PLINK_Files
hi! Teresa, i imputed my data with michigan server imputation, and i would like to ask that if the fcgene could work to convert the plink dosage data to plink bim bed data, if not, could you please give me some advice on it. great thanks!
Hi, plink --vcf looks like normally working, but in case if you have bim file with markers like 22:16053843:G:A;rs181029838 you can use unix one liner: The command awk -F " " '{print $1,$3,$4,$5,$6,$2}' your_bim_file.bim | awk -F ";" '{print $1,$2}' | awk -F " " '{print $1,$7,$2,$3,$4,$5}' > new_bim_file.bim will keep only rs number and
awk -F " " '{print $1,$3,$4,$5,$6,$2}' your_bim_file.bim | awk -F ";" '{print $1,$2}' | awk -F " " '{print $1,$6,$2,$3,$4,$5}' > new_bim_file.bim will keep only 22:16053843:G:A parts as markers. You can replaces your .bim file with the new file.