filtering the somatic variants
3
Dear all,
after surveying the literature (http://www.ncbi.nlm.nih.gov/pubmed/26647970) , some groups describe the following way to filter the somatic variants : wondering if any of you has any comments/suggestions. thanks !
bogdan
-- for point mutations :
$GATK -T VariantFiltration with --clusterWindowSize 10 --clusterSize 3 \
--filterExpression "MQ0 >= 4 && (( MQ0 / ( 1.0 * DP)) > 0.1)" --maskName InDel \
--filterExpression "DP < 5 " --filterName "LowCoverage" \
--filterExpression "QUAL < 30.0 " -- filterName "VeryLowQual" \
-- filterExpression "QUAL > 30.0 && QUAL < 50.0 " --filterName "LowQual" \
--filterExpression "QD < 2.0 " -- filterName "LowQD" \
--filterExpression "FS > 60.0" --filterName "StrandBiasFishers" \
--filterExpression "SB > -0.1" --filterName "StrandBias" \
--filterExpression "DP > 500 " --filterName "ExcessiveDepth"
-- for indels :
$GATK -T VariantFiltration with --filterExpression "MQ0 >= 4 && (( MQ0 / ( 1.0 * DP)) > 0.1)" \
--filterExpression "FS > 60.0" --filterName "StrandBiasFishers" \
--filterExpression "SB > -10.0" --filterName "StrandBias" \
--filterExpression "DP < 5" --filterName "LowCoverage" \
-- filterExpression "QUAL < 30.0" --filterName "VeryLowQual" \
--filterExpression "QUAL > 30.0 && QUAL < 50.0 " --filterName "LowQual"
SNP
snp
• 5.4k views
An example of filtering a VCF based on Allele Depth, Allele Fraction, FS, and SOR (metrics from GATK):
AF_AD_filters = function( x) {
af < - geno( x) $AF [ ,"TUMOR" ] >= 0.05
ad < - min( as( geno( x) $AD [ ,"TUMOR" ] , "List" )) >= 5
af & ad
}
AF_AD_rules < - FilterRules( list( AF_AD_filters = AF_AD_filters))
vcf_filtered < - filterVcf( "AML.vcf.bgzip" , "hg38" ,
"AML_out_AF_and_AD_filtered.vcf" ,
filters= AF_AD_rules)
compressVcf < - bgzip( "AML_out_AF_and_AD_filtered.vcf" ,
"AML_out_AF_and_AD_filtered.vcf.bgzip" )
idx < - indexTabix( compressVcf, "vcf" )
tab < - TabixFile( compressVcf, idx)
FS_SOR_filters = function( x) {
fs < - info( x) $FS <= 60
sor < - info( x) $SOR <= 4
fs & sor | ! isSNV( x)
}
FS_SOR_rules < - FilterRules( list( FS_SOR_filters = FS_SOR_filters))
vcf_filtered < - filterVcf( "AML_out_AF_and_AD_filtered.vcf.bgzip" , "hg38" ,
"AML_out_AF_and_AD_FS_and_SOR_filtered_way1.vcf" ,
filters= FS_SOR_rules)
A slightly modified version :
AF_AD_filters = function( x) {
af < - geno( x) $AF [ ,"TUMOR" ] >= 0.05
ad < - min( as( geno( x) $AD [ ,"TUMOR" ] , "List" )) >= 5
af & ad
}
AF_AD_rules < - FilterRules( list( AF_AD_filters = AF_AD_filters))
FS_SOR_filters = function( x) {
fs < - info( x) $FS <= 60
sor < - info( x) $SOR <= 4
fs & sor | ! isSNV( x)
}
FS_SOR_rules < - FilterRules( list( FS_SOR_filters = FS_SOR_filters))
vcf2 < - vcf[ AF_AD_filters( vcf) ]
vcf3 < - vcf2[ FS_SOR_filters( vcf2) ]
writeVcf( vcf3,"AML_AF_and_AD_FS_SOR_filtered_AS_OBJ.vcf" )
Or :
vcf < - readVcf( "AML.vcf" ,genome= "hg38" )
AF_AD_filters = function( x) {
af < - geno( x) $AF [ ,"TUMOR" ] >= 0.05
ad < - min( as( geno( x) $AD [ ,"TUMOR" ] , "List" )) >= 5
af & ad
}
AF_AD_rules < - FilterRules( list( AF_AD_filters = AF_AD_filters))
FS_SOR_filters = function( x) {
fs < - info( x) $FS <= 60
sor < - info( x) $SOR <= 4
fs & sor | ! isSNV( x)
}
FS_SOR_rules < - FilterRules( list( FS_SOR_filters = FS_SOR_filters))
vcf2 < - subsetByFilter( vcf, filter= FilterRules( list( AF_AD_filter= AF_AD_filters)) )
vcf3 < - subsetByFilter( vcf2, filter= FilterRules( list( FS_SOR_filtersr= FS_SOR_filters)) )
writeVcf( vcf3,"AML_out_AF_and_AD_FS_SOR_filtered_AS_OBJ_subsetbyfilter.vcf" )
Login before adding your answer.
Traffic: 3580 users visited in the last hour
Possibly I could add if anyone knows a good publication that details the ways we can do a good filtering of somatic SNV.
Hello I am wondering if there's any updates on how to filter the somatic variants? do you have any more insight? thanks
yes, using VariantAnnotation package from BioC : https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html
I could post an R script a bit later during the day ..
Hi jackyen,
what kind of data do you have? Matched samples or only a tumor or disease sample, aiming to call somatic events without a normal control?
hi ATpoint, we only have tumor sample. My understanding is that the standard GATK workflow works a little better with a matched normal but in real world it's difficult to design an experiment because the challenges getting both. What would you suggest how to approach this? thanks
Check this: http://mbontrager.org/blog/2016/08/17/Variant-Exploration
Hi Bogdan,
I am wondering if there's any updates or publications on how to filter the somatic variants to lower the false positive rate?
we can search the latest publications on pubmed please. Recently, i've been doing mainly scRNA-seq analyses.