How I detect if a vcf file has been already filtered
0
0
Entering edit mode
2.8 years ago
zizigolu ★ 4.3k

Hello

I have some vcf from a genome service company

I have annotated vcfs with annovar

perl table_annovar.pl /data//Downloads/vcf/snps.vcf humandb/ -buildver hg38 -out myanno -remove -protocol refGene,cytoBand,dbnsfp30a -operation g,r,f -nastring . -vcfinput

then by maftools I went to down stream analysis

At the first glance I see too many events with a median of > 1200 :(

> anno_maf
An object of class  MAF 
ID summary     Mean Median
1:             NCBI_Build      NA       NA     NA
2:                 Center      NA       NA     NA
3:                Samples       8       NA     NA
4:                 nGenes     572       NA     NA
5:        Frame_Shift_Del    1680  210.000  221.0
6:        Frame_Shift_Ins     839  104.875  102.5
7:           In_Frame_Del     124   15.500   16.0
8:           In_Frame_Ins      59    7.375    5.0
9:      Missense_Mutation    6839  854.875  852.5
10:      Nonsense_Mutation     483   60.375   60.0
11:       Nonstop_Mutation      12    1.500    0.5
12:            Splice_Site     151   18.875   20.0
13: Translation_Start_Site       3    0.375    0.0
14:                  total   10190 1273.750 1274.5

How I know if I have a filtered vcf or the reason of this many events is because my vcfs are unfiltered

Company is not responding in this time frame

This is one of my vcf files

https://www.dropbox.com/s/rqvzaex4sg6ojwi/snp.vcf?dl=0

Please have a look a help me to know what is going wrong here

vcf annovar • 1.2k views
ADD COMMENT
0
Entering edit mode

Have you examined the vcf header? It will often contain meta information lines to include information about filtering. The VCF format specification outlines a standard format for the description of filtering steps. In addition, many programs will self-document their application to the file (also true of SAM/BAM files). That is, when you apply some manipulation to the VCF file, the program will add how it was called to the end of the header, precisely for this purpose, so you can see what steps have been applied to file before you received it (or to remind yourself how you generated it). For example:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=FS60,Description="FS > 60.0">
##FILTER=<ID=LowQual,Description="Low quality">
##FILTER=<ID=MQ40,Description="MQ < 40.0">
##FILTER=<ID=MQRankSum-12.5,Description="MQRankSum < -12.5">
[...]
##bcftools_concatVersion=1.9+htslib-1.9
##bcftools_concatCommand=concat -o ./SNPs_filtered_PASS_concat.vcf.gz -f ./snps_vcf_list.txt --threads 24; Date=Tue Mar 24 07:55:08 2020
##bcftools_viewVersion=1.14+htslib-1.14
##bcftools_viewCommand=view -h variants.only.recode.gz; Date=Thu Jan 20 14:13:54 2022
ADD REPLY
0
Entering edit mode

This is what I see in the headers

##fileformat=VCFv4.1
##fileDate=20211026
##source=lofreq call --call-indels -s -S /mnt/nsa3/projects/active/bioit_development/ska/galaxy/galaxy_dev3/refDB/hsapiens/variant/hg38/dbsnp.lofreq.vcf.gz -f /mnt/nsa3/projects/active/bioit_development/ska/galaxy/galaxy_dev3/refDB/hsapiens/genome/hg38.chronly/fasta/hg38.fa --no-default-filter -r chr1:1-124478211 -o /tmp/3733965.1.devel.q/lofreq2_call_parallel39OJ6n/0.vcf.gz CLTSS_LTS_0038.bam
##reference=/mnt/nsa3/projects/active/bioit_development/ska/galaxy/galaxy_dev3/refDB/hsapiens/genome/hg38.chronly/fasta/hg38.fa
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
##INFO=<ID=SB,Number=1,Type=Integer,Description="Phred-scaled strand bias at this position">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Counts for ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant).">
##INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position">
##FILTER=<ID=min_snvqual_69,Description="Minimum SNV Quality (Phred) 69">
##FILTER=<ID=min_indelqual_50,Description="Minimum Indel Quality (Phred) 50">
##FILTER=<ID=min_dp_10,Description="Minimum Coverage 10">
##FILTER=<ID=sb_fdr,Description="Strand-Bias Multiple Testing Correction: fdr corr. pvalue > 0.001000">
##FILTER=<ID=min_snvqual_85,Description="Minimum SNV Quality (Phred) 85">
##FILTER=<ID=min_indelqual_66,Description="Minimum Indel Quality (Phred) 66">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position for this sample">
##FORMAT=<ID=AD,Number=2,Type=Integer,Description="Allelic depths (filtered) for the ref and alt alleles in the order listed">
##SnpEffVersion="4.3 (build 2016-07-03 08:26), by Pablo Cingolani"
##SnpEffCmd="SnpEff  hg38.27 -i vcf -no-downstream -no-upstream -no-intergenic -hgvs1LetterAa -csvStats CLTSS_LTS_0038.snps.snpEff_summary.csv -htmlStats CLTSS_LTS_0038.snps.snpEff_summary.html /mnt/galaxyDataDev/galaxy-3.0.0.0-2/jobs/000/148/148887/working/CLTSS_LTS_0038.snps.variant_filtered.vcf "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' ">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' ">

You think I still should do more filtering?

ADD REPLY

Login before adding your answer.

Traffic: 1733 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6