Reliable Tools To Filter Vcf Format Files
7
1
Entering edit mode
11.1 years ago
Tonyzeng ▴ 310

I have VCF variants files, can anyone provide me a list of tools for variant filtering? thank you

vcf variant • 7.7k views
ADD COMMENT
2
Entering edit mode

What about vcftools? http://vcftools.sourceforge.net/

ADD REPLY
3
Entering edit mode

Tony: For general question like this you should first go through similar questions in Biostar. You can easily search them using the search button. Only if you don't find a good or satisfying answer, you should post a question. Thanks.

ADD REPLY
1
Entering edit mode
11.1 years ago
William ★ 5.3k

GATK has a tool "SelectVariants" that has some standard filter options and you can create filter expressions based on the attributes in the vcf records: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_SelectVariants.html

You can also use SnpSift to build filter expressions on the standards vcf attributes and the ones added by SnpEff effect prediction: http://snpeff.sourceforge.net/SnpSift.html

ADD COMMENT
1
Entering edit mode
11.1 years ago

http://www.bioconductor.org/packages/2.12/bioc/html/VariantAnnotation.html

vcf objects are basically sample-subsettable granges - a very clever implementation

ADD COMMENT
1
Entering edit mode
11.1 years ago
dangenet ▴ 90

If you want a quick and highly customizable way to filter vcfs, try perl one-liners.

perl -lne 'print $_ if ($_ =~ /0\/1/)' < my_vcf_file.vcf > filtered_vcf_file.vcf

will get you all the variants where the genotype has been called as "0/1". In English, this one-liner says "print the line if the line contains the string "0/1".

perl -lane 'print $F[5] if ($_ !~ /^#/)' < my_vcf_file.vcf > QUAL_scores.txt

will get you a list of all the QUAL scores. In English, this says "print the value in the sixth column if the line does not start with a # character".

My favorite perl one-liner guide is here. A one-liner is no replacement for a proper filtering script, but for getting a sense of the distribution of your data there's nothing better.

ADD COMMENT
0
Entering edit mode
11.1 years ago

While it imports your VCF into a database first, our GEMINI software is specifically designed to allow filtering of variants in VCF files based on genome annotations and sample genotypes.

See Gemini: Integrative Exploration Of Genetic Variation And Genome Annotations thread. Also, please see the documentation.

An example of a GEMINI query filtering variants based on allele frequency and functional impact:

$ gemini query -q "select * from variants \
                  where is_lof = 1 \
                  and aaf >= 0.01" my.db

Extend this to further filter based on sample Thelonius being a heterozygote

$ gemini query -q "select * from variants \
                  where is_lof = 1 \
                  and aaf >= 0.01" 
         --gt-filter "gt_types.Thelonius == HET" \
         my.db
ADD COMMENT
0
Entering edit mode
11.1 years ago
Bioch'Ti ★ 1.1k

Hi,

You can also look at the extension of Plink! that manages VCF files: http://atgu.mgh.harvard.edu/plinkseq/overview.shtml

Best

ADD COMMENT
0
Entering edit mode
11.1 years ago

Filter using javascript: https://github.com/lindenb/jvarkit#-filtering-vcf-with-javascript-rhino-

/** prints a VARIATION if two samples at least
have a DP<200 */ 
function myfilterFunction()
    {
    var samples=header.genotypeSamples;
    var countOkDp=0;


    for(var i=0; i< samples.size();++i)
        {
        var sampleName=samples.get(i);
        if(! variant.hasGenotype(sampleName)) continue;
        var genotype = variant.genotypes.get(sampleName);
        if( ! genotype.hasDP()) continue;
        var dp= genotype.getDP();
        if(dp < 200 ) countOkDp++;
        }
    return (countOkDp>2)
    }
myfilterFunction();

.

$ gunzip -c file.vcf.gz |\
   java -jar  dist/vcffilterjs.jar  SCRIPT_FILE=filter.js
ADD COMMENT
0
Entering edit mode
11.1 years ago

The snpSift package has snpSift filter operation that is quite powerful and performant.

http://snpeff.sourceforge.net/SnpSift.html#filter

ADD COMMENT

Login before adding your answer.

Traffic: 1638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6