filtering and statistics of depth per genotype per SNP of VCF files (solved)
2
0
Entering edit mode
9.9 years ago
Hans ▴ 140

Hello

I need to get the statistics of VCF file. I would like to see the distribution of read depth for each sample at each SNP.

I would like also to filter by this depth, sey, set the data point to NULL when the value is below a threshold.

Thank you

Hanan

SNP next-gen • 5.0k views
ADD COMMENT
0
Entering edit mode

Thank you for your response

I have used Bioconductor - VariantAnnotation R package so I can easily manipulate the statistics.

read vcf file with obj<-readVcf( ) and look at the obj@assays$data@listData[["DP"] matrix.

Hanan

ADD REPLY
0
Entering edit mode
9.9 years ago

If your VCF contains the DEPTH (DP) information in the FORMAT section. Use awk to get the index of this DP-column and extract the DP for each sample:

$ curl -sL "https://raw.githubusercontent.com/chmille4/ngs_server/89f038f986747390d190baf09efa93a8897c0ec6/ext/vcftools/examples/valid-4.1.vcf" |\
awk -F '     ' '/^#CHROM/ {split($0,samples);next;} /^#/ {next;} {dpcol=-1;n=split($9,fmt,/\:/);for(i=1;i<=n;++i) if(fmt[i]=="DP") { dpcol=i;break;} if(dpcol==-1) next; for(i=10;i<=NF;++i) {split($i,a,/\:/);printf("%s\t%s\t%s\t%s\n",$1,$2,samples[i],a[dpcol]);}}'
19    14370    NA00001    1
19    14370    NA00002    8
19    14370    NA00003    5
20    17330    NA00001    3
20    17330    NA00002    5
20    17330    NA00003    3
20    1110696    NA00001    6
20    1110696    NA00002    0
20    1110696    NA00003    4
20    1230237    NA00001    7

(...)
ADD COMMENT
0
Entering edit mode

Thank you, where do I put my local file name

ADD REPLY
0
Entering edit mode
9.9 years ago
jesse.hoff • 0

You can use vcftools

ADD COMMENT

Login before adding your answer.

Traffic: 1693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6