Assign missing genotype (./. or N/N) for a SNP in vcf file if DP=0
1
1
Entering edit mode
8.1 years ago
berge2015 ▴ 110

Hello,

I have a problem with getting the correct genotypes from a samtools mpileup generated vcf file.

In this post: A: ./ Vs "0/0" In Vcf Files, the user says DP=0 causes the genotype to be ./. which is exactly what I want. However, my genotypes are assigned 1/1 if DP=0 for the SNP for any/all individuals.

Am I calling the SNPs in a wrong way? Below are my commands (numbers as prefix for order & clarity):

1. bwa/0.7.4/bin/bwa aln -t 4 Ref.fa sample1.fq > sample1.sai
2. bwa/0.7.4/bin/bwa samse Ref.fa sample1.sai sample1.fq > sample1.sam
3. samtools-0.1.19/samtools view -bS sample1.sam > sample1.bam
4. samtools-0.1.19/samtools sort sample1.bam sample1.sorted
5. samtools-0.1.19/samtools mpileup -BuDI -f Ref.fa sample1.sorted.bam sample2.sorted.bam .. sampleN.sorted.bam | bcftools view -bvcg - > all_samples.bcf
6. samtools-0.1.19/bcftools/bcftools view all_samples.bcf | vcfutils.pl varFilter -D1000 > all_samples.vcf
7. vcftools_0.1.12a/bin/vcf-to-tab < all_samples.vcf > all_sample_SNPs.out

Only when parsing the 'all_sample_SNPs.out' file did I realize that my genotype calls are missing the N/N or NN alleles in entirety. I know my data, so no way the missing alleles are gone at ALL possible locations. So I must be doing something wrong.

Of course it is not vcf-to-tab's fault as the VCF file assigns 1/1 genotype to those with 0 read depth at that particular locus. I'd appreciate it a lot if someone could point towards where I could be messing up.

Alternatively: is there a filter or another way to convert the vcf file so that sites with DP=0 are assigned the missing value of ./. or N/N or any other similar nomenclature?

Thanks!

snp vcf variant mpileup • 3.6k views
ADD COMMENT
1
Entering edit mode
8.1 years ago
berge2015 ▴ 110

It appears that vcftools can do it. In case someone happens to visit this thread looking for solution to this problem, here's how to do it:

vcftools_0.1.12a/bin/vcftools --vcf in.vcf --out fltrd_out --minGQ 15 --minDP 1 --recode --recode-INFO-all

Found the solution here: A: How to filter vcf file on minimum genotype depth and quality for each sample

ADD COMMENT

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6