indel preprocessing prior to bam-readcount
1
0
Entering edit mode
8.2 years ago
Jokhe ▴ 140

What kind of preprocessing do you need to use bam-readcount and fpfilter.pl for indels?

This question has been already asked couple of times but so far I haven't found any clear answer (scripts or other examples) for them. So, I have a set of VCF files generated by VarScan2. These VCF files contain somatic indels occuring in tumor samples. I am trying to filter these indels by using bam-readcount and fpfilte.pl. The format of VCF file is following;

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
 chr12  46236017    .   CA  C   .   PASS    DP=177;SS=1;SSC=0;GPV=6.0385E-14;SPV=8.3811E-1  GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:66:44:17:27.87%:44,0,17,0 0/1:.:111:80:23:22.33%:80,0,23,0

My script is presented below;

# Prepare input file by selecting data from VCF input file.
sed '/^#/ d' $file | awk '{print $1,$2,$2}' > $file.positions

# Calculate read features by using bam-readcount tool.
$BAM_READCOUNT -w1 -f $REFERENCE ${file%.vcf}.bam -l $file.positions > $file.readcounts
rm $file.positions

# Filter variants by using fpfilter.pl.
perl $FPFILTER --var-file $file --readcount-file $file.readcounts --output-file $file.output
rm $file.readcounts

 # Select variants that pass all filters.
grep "PASS" $file.output | awk '{print $1,$2-2,$2+1}' > $file.pass
rm $file.output

# Filter original VCF file so that only variants passing all filters are included.
vcftools --vcf $file --positions $file.pass --recode --out ${file%.vcf}.filtered.vcf
rm *.pass
rm *.log
rm $file

However, this approad filters all indels from my data. The problem seems to be in readcounts, because the prompt indicates that;

Filtering $file for false-positives...
4 variants
0 had a position near the ends of most supporting reads (position < 0.1)
0 had strandedness < 0.01 (most supporting reads are in the same direction)
0 had var_count < 3 (not enough supporting reads)
0 had depth < 8
...
0 passed all filters
4 had no readcounts for the variant allele

I would really appreciate tips and example scripts to see how raw variant data from VCF should be extracted and used in bam-readcount and fpfilter.

bam-readcount indel fpfilter.pl VarScan • 4.4k views
ADD COMMENT
1
Entering edit mode

chr12 46236017 . CA C

for bam-readcount, you need to supply chr12 46236018 46236018 as the position

ADD REPLY
0
Entering edit mode

I have the same problem, and apparently there is no reliable answer out there by now. How did you solve the problem for your data?

ADD REPLY
0
Entering edit mode
8.1 years ago
m.b • 0

I have been facing a similar problem. I see that the main issue for your filtering process is: "no readcounts for the variant allele". Can you please tell me if your 4 variants are all deletions?

I have a larger dataset with insertions and deletions; and all my deletions have not passed the filtering because there is no read count for them. This is a bam-readcount problem.

I went back to the bam-readcount github page and it seems that this is a recognised bug. It does not return readcounts at positions of known deletions. Please see link below: https://github.com/genome/bam-readcount/issues/27

Unfortunately, I am not sure how to get around this....

ADD COMMENT
0
Entering edit mode

Thank you for your answer. The example case I am presenting in this post contains only deletions so it is very likely that this is caused by the bug you indicated. However, I have similar problem with all of my samples and some of them do contain also insertions. Insertions are also filtered out due to "no readcounts".

ADD REPLY

Login before adding your answer.

Traffic: 2147 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6