Hello
I made for 83 samples bam file a vcf file with HaplotypeCaller then filtered with VarianFiltration, after that with vcfR package in R program got "GT". but I have many no-call (./.). I want to remove no-call .
also I used of gatk HaplotypeCaller -R reference.fasta -I sample1.bam -O variants.g.vcf -ERC GVCF to made a variants.g.vcf , becuaase variants.g.vcf is need as input file for GenotypeGVCFs, but I have second problem, for 83 bam files, I had error to make a variants.g.vcf . "A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file."
May give me some suggestion?
thank you
No idea about the first error. I rarely used GATK, but I have a guess regarding the second error. 1) emit-ref-confidence = ERC, 2) the software is complaining that you cannot use emit-ref-confidence on a multi-sample, so I suggest you remove -ERC from your command line.
No idea about the second error, but the first error seems odd to me. What version of GATK are you using? I thought recent versions only emit segregating sites anyway. Are you sure the GT field for all samples is empty? If at least 1 sample has a variant, then the GT field will need to be there for the remaining samples - it's 1 column per sample.
many thanks for your reply I used gatk4.3. for exapmle for a position some accession is no call (./.) and some accession is 0/0, 0/1 or 1/1 but count no call is very
Okay, if I understand your comment correctly, GATK is behaving correctly.
GATK has to fill all
GT
fields for anyPOS
with at least 1 informative call. For example, if you have at 5 of 83 samples with enough information to make a call atPOS 1151
, but 78 samples do not, then you will have./.
in theGT
field for 78 samples, and whatever the called genotype is in the remaining 5.In downstream analyses you can potentially remove these sites with high no call proportions as they can be uninformative - depends on your study. You can do this using vcfR, parsing the
GT
field and subsetting the dataset to only include sites with a sufficient proportion of samples with calls.