fpfilter - All variants fail
0
0
Entering edit mode
5.5 years ago

Hi, I used fpfilter to filter my variants vcf from Varscan2 using the following command:

java -jar /home/art2407/miniconda3/pkgs/varscan-2.4.3-2/share/varscan-2.4.3-2/VarScan.jar fpfilter myvar.snp mybam.readcount -dream3-settings 1 > myvar.fpfilt.vcf

And I got the following output:

Loading readcounts from mybam.readcount...
Parsing variants from myvar.snp...
672 variants in input file
672 had a bam-readcount result
663 had reads1>=2
0 passed filters
672 failed filters
    0 failed because no readcounts were returned
    0 failed minimim variant count < 3
    81 failed minimum variant freq < 0.05
    0 failed minimum strandedness < 0.0
    663 failed minimum reference readpos < 0.2
    672 failed minimum variant readpos < 0.15
    663 failed minimum reference dist3 < 0.2
    672 failed minimum variant dist3 < 0.15
    0 failed maximum reference MMQS > 50
    0 failed maximum variant MMQS > 100
    0 failed maximum MMQS diff (var - ref) > 50
    0 failed maximum mapqual diff (ref - var) > 10
    0 failed minimim ref mapqual < 20
    0 failed minimim var mapqual < 30
    9 failed minimim ref basequal < 15
    672 failed minimim var basequal < 30
    0 failed maximum RL diff (ref - var) > 0.05

All variants fail at "minimum readpos" and "minimum dist3". Can anyone tell me what these parameters mean and how to adjust them?

fpfilter varscan2 • 1.9k views
ADD COMMENT
0
Entering edit mode

I think it is the proximity of the altered base to the 3' end of the read. You can see that those bases are of pool(er) quality (672 failed minimim var basequal < 30) and therefore more likely to be artifacts. This is a heuristic (experience-based) filter which I would not change unless you have expert knowledge.

ADD REPLY
0
Entering edit mode

Thanks for your reply. I ran the same command on another tumour DNA data and I got similar results except that no variants failed basequal :

Parsing variants from tumor.vcf...
49 variants in input file
49 had a bam-readcount result
49 had reads1>=2
0 passed filters
49 failed filters
    0 failed because no readcounts were returned
    1 failed minimim variant count < 4
    0 failed minimum variant freq < 0.05
    8 failed minimum strandedness < 0.01
    49 failed minimum reference readpos < 0.1
    49 failed minimum variant readpos < 0.1
    49 failed minimum reference dist3 < 0.1
    49 failed minimum variant dist3 < 0.1
    0 failed maximum reference MMQS > 100
    0 failed maximum variant MMQS > 100
    0 failed maximum MMQS diff (var - ref) > 50
    0 failed maximum mapqual diff (ref - var) > 50
    0 failed minimim ref mapqual < 15
    1 failed minimim var mapqual < 15
    0 failed minimim ref basequal < 15
    0 failed minimim var basequal < 15
    0 failed maximum RL diff (ref - var) > 0.25

I tried on the same command on 2 more data files and they give the same results. Is there any way I can improve these results? Or is there any other tool I can use to filter out false positives ?

ADD REPLY
0
Entering edit mode

What kind of data is this? Some targeted capturing approach? What is the read length? It is indeed odd and indicates a systematic bias that all variants fail based on these specific filters.

ADD REPLY
0
Entering edit mode

The data was obtained using targeted sequencing on illumina platform. The read length is 150bp

ADD REPLY
0
Entering edit mode

Targeted in terms of capturing beads or amplicon?

ADD REPLY
0
Entering edit mode

it is capturing bead based

ADD REPLY
0
Entering edit mode

Hi, an update about this error.. I tried setting readpos and dist3 parameters to zero, to see what result filter gives. The result is same as above. Though none of the variants are failing readpos and dist3 , they seem to be failing some parameter which is not described in the result summary. The summary I got this time is below:

446 variants in input file
441 had a bam-readcount result
418 had reads1>=2
0 passed filters
446 failed filters
    5 failed because no readcounts were returned
    14 failed minimim variant count < 3
    19 failed minimum variant freq < 0.05
    0 failed minimum strandedness < 0.0
    0 failed minimum reference readpos < 0.0
    0 failed minimum variant readpos < 0.0
    0 failed minimum reference dist3 < 0.0
    0 failed minimum variant dist3 < 0.0
    0 failed maximum reference MMQS > 50
    0 failed maximum variant MMQS > 100
    0 failed maximum MMQS diff (var - ref) > 50
    3 failed maximum mapqual diff (ref - var) > 10
    0 failed minimim ref mapqual < 20
    2 failed minimim var mapqual < 30
    8 failed minimim ref basequal < 15
    43 failed minimim var basequal < 20
    0 failed maximum RL diff (ref - var) > 0.05

Why are variants near 3' end filtered out? Is there some other parameter that can be relaxed? How do I use this filter to filter out only false positives? Please help

ADD REPLY

Login before adding your answer.

Traffic: 1303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6