Entering edit mode
20 months ago
barslmn
★
2.3k
I have an unphased VCF file with multiple samples from a WES study. I am trying to get some genotypes from the X chromosome for some calculations. I want to split the samples and change the genotypes to 0/1, 1/1 or 1/2. However, I am seeing variants with 1/0 in the genotype column. Here such example:
chrX 52862555 rs5943751 C A 59294.6 PASS BaseQRankSum=0.732;DB;ExcessHet=9.9651;FS=0.622;InbreedingCoeff=-0.1559;MQ=60;MQRankSum=0;POSITIVE_TRAIN_SITE;QD=20.47;ReadPosRankSum=-0.237;SOR=0.606;VQSLOD=23.85;culprit=MQRankSum;DP=5853;AF=0.287;AS_BaseQRankSum=1.1;AS_FS=1.725;AS_InbreedingCoeff=-0.4237;AS_MQ=60;AS_MQRankSum=0;AS_QD=8.57;AS_ReadPosRankSum=0.8;AS_SOR=0.835;MLEAC=36;MLEAF=0.295;AS_FilterStatus=PASS;AS_VQSLOD=8.9354;AS_culprit=AS_MQRankSum;AN=2;AC=1 GT:AD:DP:GQ:PGT:PID:PL 1/0:8,13:36:99:.:.:591,192,397
Looking at the reads the genotype should be 1/2. Looking at other threads I found this but I don't think it relates to this problem.
Command I am using (It looks like this because I am debugging):
for sample in $(bcftools query -l "$vcf"); do
echo "$sample 1/0"; # I just added 1/0 for grepping
bcftools view -c 1 -s "$sample" -r "$nonPARregion" "$vcf" | bcftools view -c 1 -f PASS -i 'FORMAT/DP >= 20'
done | grep '1/0'
Variant call command from the VCF
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --emit-ref-confidence GVCF --gvcf-gq-bands 10 --gvcf-gq-bands 20 --gvcf-gq-bands 30 --gvcf-gq-bands 40 --gvcf-gq-bands 50 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --use-new-qual-calculator true --contamination-fraction-to-filter 0.00867692 --output TURIBU-07-105.g.vcf.gz --intervals /cromwell_root/broad-gotc-prod-execution7/ExomeGermlineSingleSample/e0703c95-22f0-4294-98ba-4b17503f5119/call-BamToGvcf/BamToGvcf/093821c9-64da-4e1e-921e-0b9978abf6a8/call-ScatterIntervalList/glob-cb4648beeaff920acb03de7603c06f98/6scattered.interval_list --input gs://broad-gotc-prod-execution7/ExomeGermlineSingleSample/e0703c95-22f0-4294-98ba-4b17503f5119/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/b6f2c766-2765-4a94-b876-efdd49bf8918/call-GatherBamFiles/TURIBU-07-105.bam --reference /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation --annotation-group AS_StandardAnnotation --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --disable-optimizations false --just-determine-active-regions false --dont-genotype false --max-mnp-distance 0 --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --recover-dangling-heads false --do-not-recover-dangling-branches false --min-dangling-branch-length 4 --consensus false --max-num-haplotypes-in-population 128 --error-correct-kmers false --min-pruning 2 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false --use-filtered-reads-for-annotations false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 10.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --genotyping-mode DISCOVERY --genotype-filtered-alleles false --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version=4.0.10.1,Date="March 26, 2019 4:12:09 PM UTC">