Entering edit mode
6.0 years ago
Sharon
▴
610
Hi Everyone
I am trying to use Michigan Impute Server. I use Checkvcf first to avoid failure in the server.
python checkVCF.py -r checkVCF/hs37d5.fa -o test chr3.vcf
I got some duplicates and inconsistent ref.
> checkVCF.py -- check validity of VCF file for meta-analysis version
> 1.4 (20140115) contact zhanxw@umich.edu or dajiang@umich.edu for problems. Python version is [ 2.7.5.final.0 ] Begin checking vcfFile
> [ chr3.vcf ] Duplicated site [ 3:14187449 ] Duplicated site [
> 3:21307401 ] Duplicated site [ 3:38608045 ] Duplicated site [
> 3:39146429 ] Duplicated site [ 3:41912651 ] [ 10000 ] lines processed
> Duplicated site [ 3:48618728 ] Duplicated site [ 3:79399575 ]
> Duplicated site [ 3:95176677 ] Duplicated site [ 3:96472739 ]
> Duplicated site [ 3:99067458 ] [ 20000 ] lines processed Duplicated
> site [ 3:113876275 ] Duplicated site [ 3:120522716 ] Duplicated site [
> 3:121633904 ] Duplicated site [ 3:128622922 ] [ 30000 ] lines
> processed Duplicated site [ 3:171926373 ] Duplicated site [
> 3:183371250 ]
> --------------- REPORT --------------- Total [ 37146 ] lines processed Examine [ 33 ] VCF header lines, [ 37113 ] variant sites, [
> 378 ] samples [ 16 ] duplicated sites [ 0 ] NonSNP site are outputted
> to [ test.check.nonSnp ] [ 6995 ] Inconsistent reference sites are
> outputted to [ test.check.ref ] [ 0 ] Variant sites with invalid
> genotypes are outputted to [ test.check.geno ] [ 0 ] Alternative
> allele frequency > 0.5 sites are outputted to [ test.check.af ] [ 0 ]
> Monomorphic sites are outputted to [ test.check.mono ]
> --------------- ACTION ITEM ---------------
> * Remove duplicated sites and rerun checkVCF.py
> * Read test.check.ref, for autosomal sites, make sure the you are using the forward strand
> * Upload these files to the ftp server (so we can double check): test.check.log test.check.dup test.check.noSnp test.check.ref
> test.check.geno test.check.af test.check.mono
How can I remove this duplicate sites and inconsistent reference sites?
I tried this but it seems it excludes duplicate variants not sites:
plink --bfile snps_filtered --list-duplicate-vars ids-only suppress-first
plink --bfile snps_filtered --exclude plink.dupvar --make-bed --out snps.DuplicatesRemoved
plink --bfile snps_filtered --recode vcf --snps-only just-acgt --out snps.final
A link to where is this in plink will be okay too.
Thanks
Good catch. I should use Ghr37, I will check if this will remove the duplications. Thanks Kevin a lot. Always helpful.