What are good settings for filtering VCF files?
1
8
Entering edit mode
10.2 years ago
devenvyas ▴ 760

I followed the samtools/bcfutils/vcfutils pathway followed here to convert a set of human Hg19-aligned BAM files into a set of raw VCF files. I then got vcftools to filter down to just autosomal SNPs. These are really, really, really low-coverage genomes (they were enriched for NRY and/or mtDNA, and I am just trying to make use of the "leftovers")

Now I have the data I want, but I am trying to found out what of it is actually usable. I was wondering what are good filtering parameters for tossing/keeping human SNPs (or where can I find said parameters)? Thanks!

-Deven

samtools vcftools SNP bcftools • 20k views
ADD COMMENT
24
Entering edit mode
10.2 years ago

This is what I use. I generally change them depending on the study. But more or less this is close to what everyone uses.

  • MinDP (Minimum read depth): 5 (Indels) and 3 (SNPs)
  • MaxDP (Maximum read depth): You have a low coverage data, so I would set it to 100. Normally it is 3 times the average coverage.
  • BaseQualBias (Minimum p-value for baseQ bias): 0
  • MinMQ (Minimum RMS mapping quality for SNPs): 20 or 30 (to be more stringent)
  • Qual (Minimum value of QUAL field): 15 or 20

  • StrandBias (Minimum p-value for strand bias): 0.0001

  • EndDistBias (Minimum p-value for end distance bias): 0.0001
  • MapQualBias (Minimum p-value for mapQ bias): 0
  • VBD (Minimum Variant Distance Bias): 0 (More relevant to RNA-seq reads)

  • GapWin (Window size for filtering adjacent gaps): 30 bp

  • SnpGap (SNP within INT bp around a gap to be filtered): 20 bp

  • SNPcluster (number of snps within a region): I usually drop all the snps if there are more than 3 snps within 10 bp.

ADD COMMENT
3
Entering edit mode

@Ashutosh -

could you provide reasoning as to why those are the thresholds you typically use? It would be helpful so researchers can understand the parameters better! Thanks!

ADD REPLY
1
Entering edit mode

You are right to question this - indeed, there are absolutely no standards for these filtering criteria. Take a look at my take on DP alone: A: DP in VCF files?

ADD REPLY
0
Entering edit mode

I know vcftools can filter based on DP/Qual, do you have any recommendations on what to use to do the other filtering? Thanks!

ADD REPLY
1
Entering edit mode

This one does almost everything that's mentioned above.

ADD REPLY
0
Entering edit mode

I have my own python script. If you know python you can modify it for your use. OR you can use vcf-tools "annotate" feature. I think the second option will be much better.

ADD REPLY

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6