Add Dp Tag To Genotype Field Of Vcf File
2
0
Entering edit mode
11.4 years ago

Hello everyone,

I am using samtools mpileup for SNP calling, then with BEAGLE I do the haplotyping and with GATK BeagleOutputToVCF I convert the beagle output back to vcf format. Everything is working fine, but I miss one tag.

I want to add the DP tag to the genotype field of the vcf file. Is there an option in samtools mpileup, BEAGLE or GATK BeagleOutputToVCF which can add this information? Do I have to use the tool mentioned in incorporating raw read coverage per sample in merged vcf which requires much IO for calculation and is an extra step in my pipeline? Or is this information somewhere in my vcf file? My vcf file looks like this:

SL2.40ch12      17      .       T       C       69.50   .       AC=1;AC1=1;AF=0.167;AF1=0.1766;AN=6;DP=39;DP4=10,7,2,3;FQ=70.3;MQ=46;NumGenotypesChanged=0;PV4=0.62,0.24,0.034,1;R2=0.922;RPB=5.484225e-01;VDB=3.008871e-02     GT:GQ:OG:PL
     0|0:13:.:0,9,90 0|0:60:.:0,39,255       0|1:21:.:104,0,11
vcf • 8.1k views
ADD COMMENT
0
Entering edit mode

From your example, there is already a DP tag. DP = 39 here.

ADD REPLY
0
Entering edit mode

This is the coverage summed over all different samples, I need the coverage per sample

ADD REPLY
2
Entering edit mode
11.4 years ago
Bpow ▴ 280

If your version of samtools is new enough (it's present at least in 0.1.18), you can provide the '-D' option to mpileup to get per-sample read depth of high-quality reads (DP in genotype field) and high-quality variant reads (DV in genotype field) (as opposed to the depth across samples, which is indicated by the DP field in the INFO field).

ADD COMMENT
0
Entering edit mode

Thanks! this was exactly what I was looking for. Don't know why I couldn't find this option by my own. Now I read the manual again and I saw you were right!

ADD REPLY
1
Entering edit mode
8.8 years ago

Just to keep this updated for samtools 1.2 onwards the -D is deprecated; now there is an option --output-tag DP

For samtools 1.2, mpileup -t has DP,DPR,DV,DP4,INFO/DPR,SP

For 1.3 DP4 has changed to ADF (Allelic depths on the forward strand, FORMAT) and ADR (Allelic depths on the reverse strand, FORMAT)

Possible -t values for 1.3: Comma-separated list of FORMAT and INFO tags to output (case-insensitive):

  • AD (Allelic depth, FORMAT),
  • INFO/AD (Total allelic depth, INFO),
  • ADF (Allelic depths on the forward strand, FORMAT),
  • INFO/ADF (Total allelic depths on the forward strand, INFO),
  • ADR (Allelic depths on the reverse strand, FORMAT),
  • INFO/ADR (Total allelic depths on the reverse strand, INFO),
  • DP (Number of high-quality bases, FORMAT),
  • DV (Deprecated in favor of AD; Number of high-quality non-reference bases, FORMAT),
  • DPR (Deprecated in favor of AD; Number of high-quality bases for each observed allele, FORMAT),
  • INFO/DPR (Number of high-quality bases for each observed allele, INFO),
  • DP4 (Deprecated in favor of ADF and ADR; Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases, FORMAT),
  • SP (Phred-scaled strand bias P-value, FORMAT) [null]
ADD COMMENT

Login before adding your answer.

Traffic: 1516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6