What kind of preprocessing do you need to use bam-readcount and fpfilter.pl for indels?
This question has been already asked couple of times but so far I haven't found any clear answer (scripts or other examples) for them. So, I have a set of VCF files generated by VarScan2. These VCF files contain somatic indels occuring in tumor samples. I am trying to filter these indels by using bam-readcount and fpfilte.pl. The format of VCF file is following;
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
chr12 46236017 . CA C . PASS DP=177;SS=1;SSC=0;GPV=6.0385E-14;SPV=8.3811E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:66:44:17:27.87%:44,0,17,0 0/1:.:111:80:23:22.33%:80,0,23,0
My script is presented below;
# Prepare input file by selecting data from VCF input file.
sed '/^#/ d' $file | awk '{print $1,$2,$2}' > $file.positions
# Calculate read features by using bam-readcount tool.
$BAM_READCOUNT -w1 -f $REFERENCE ${file%.vcf}.bam -l $file.positions > $file.readcounts
rm $file.positions
# Filter variants by using fpfilter.pl.
perl $FPFILTER --var-file $file --readcount-file $file.readcounts --output-file $file.output
rm $file.readcounts
# Select variants that pass all filters.
grep "PASS" $file.output | awk '{print $1,$2-2,$2+1}' > $file.pass
rm $file.output
# Filter original VCF file so that only variants passing all filters are included.
vcftools --vcf $file --positions $file.pass --recode --out ${file%.vcf}.filtered.vcf
rm *.pass
rm *.log
rm $file
However, this approad filters all indels from my data. The problem seems to be in readcounts, because the prompt indicates that;
Filtering $file for false-positives...
4 variants
0 had a position near the ends of most supporting reads (position < 0.1)
0 had strandedness < 0.01 (most supporting reads are in the same direction)
0 had var_count < 3 (not enough supporting reads)
0 had depth < 8
...
0 passed all filters
4 had no readcounts for the variant allele
I would really appreciate tips and example scripts to see how raw variant data from VCF should be extracted and used in bam-readcount and fpfilter.
chr12 46236017 . CA C
for bam-readcount, you need to supply chr12 46236018 46236018 as the position
I have the same problem, and apparently there is no reliable answer out there by now. How did you solve the problem for your data?