change missing genotype in bcf file from 0/0 to ./. to run plink
0
0
Entering edit mode
3.7 years ago
rah ▴ 30

I have a lot of bam files for which i need to call the genotype for, to create bcf files for and then run Plink. I am using two methods. In this is example i have 5 individuals, each bam file has its own RG tag, so they are distuingishable.


Method 1) I am for each individual sample creating the bcf file -> then merging these bcf files -> running plink on merged bcf

step 1) bcftools mpileup -I -f hg19.fa Individual_1.bam | bcftools call -c -Ob -o output_1.bcf (this i repeat for all 5 individuals)

step 2) bcftools merge -l bcfset.txt -Ob -o merge.bcf (with bcfset.txt containing the names of the individual bcf files)

step 3) bcftools view -Ob -m2 -M2 -v snps merge.bcf --regions chr21 -o filter.bcf (filter to get sites on chromosome 21)

With the final output being for a single line

chr21 9411618 . C T 57.9723 . VDB=0.0221621;SGB=-0.511536;MQ0F=0;AF1=1;AC1=2;MQ=36;FQ=-35.9869;DP=4;DP4=0,0,3,0 GT:PL ./.:. ./.:. ./.:. 1/1:90,9,0 ./.:.

step 4) Running plink -> which works!


Method 2) I am creating the pileup file by passing in a list with the bam files

step 1) bcftools mpileup -b bamfile.txt -I -f hg19.fa -Ob -o pileupfile.bcf

step 2) I have tested two methods to call

2A) bcftools call --threads 40 -m pileupfile.bcf -v -Ob -o callfile1.bcf (which i dont do filter as method 1 step 3)

chr21 9411618 . C T 65 . DP=4;VDB=0.0221621;SGB=0.823243;MQ0F=0;AC=2;AN=2;DP4=0,0,3,0;MQ=36 GT:PL ./.:0,0,0 ./.:0,0,0 ./.:0,0,0 1/1:90,9,0 ./.:0,0,0

HERE PLINK FAILS

2B) bcftools call -c -Ob pileupfile.bcf -o callfile2.bcf (similar call and filter as in method 1 using the -c option in the call and similar filtering as step 3)

chr21 9411618 . C T 57.9785 . DP=4;VDB=0.0221621;SGB=0.823243;MQ0F=0;AF1=1;AC1=10;DP4=0,0,3,0;MQ=36;FQ=-26.8149 GT:PL 0/1:0,0,0 0/1:0,0,0 0/1:0,0,0 1/1:90,9,0 0/1:0,0,0

HERE PLINK ALSO FAILS


When running plink the issues are with the missing genotypes, so since it works in method 1 where the information is ./.:. i think the issue is for method 2A and 2B that the missing genotypes are not represented either with ./.:0,0,0 or 0/1:0,0,0.

QA) Can anyone help me with either replacing the '0,0,0' to just a simple '.' without destroying the bcf format or in some way suggest how i can get the same output format for the 2A or 2B method as i get in method 1. I have looked into bcftools filter and annotate but i cant really figure out how to solve this specific issue.

bcftools plink SNP genotype bcf • 1.4k views
ADD COMMENT
0
Entering edit mode

Please post at least one full .log file showing HOW plink is failing.

ADD REPLY
0
Entering edit mode

PLINK v1.90b4.4 64-bit (21 May 2017) Options in effect: --allow-extra-chr --bcf head5callmv_21.bcf --const-fid --geno 0 --keep-allele-order --make-bed --out head5mv21 --recode --set-missing-var-ids @:#$1,$2 --vcf-idspace-to _

Hostname:

Working directory: Start time: Wed Mar 17 17:53:27 2021

Random number seed: 1616000007 772425 MB RAM detected; reserving 386212 MB for main workspace. --bcf: head5mv21-temporary.bed + head5mv21-temporary.bim + head5mv21-temporary.fam written. Warning: 224780 variant records had no GT field. 224780 variants loaded from .bim file. 224780 missing IDs set. 5 people (0 males, 0 females, 5 ambiguous) loaded from .fam. Ambiguous sex IDs written to head5mv21.nosex . Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 5 founders and 0 nonfounders present. Calculating allele frequencies... done. Total genotyping rate is 0. Error: All variants excluded due to missing genotype data (--geno).

End time: Wed Mar 17 17:53:27 2021

ADD REPLY
0
Entering edit mode

There was a --bcf bugfix in February 2020; you should update your plink 1.9 build. (You can also use plink 2.0's --bcf, which should be more efficient.)

ADD REPLY

Login before adding your answer.

Traffic: 1555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6