Hi All
When using varscan (v2.4.0) to call variants on Illumina sequenced amplicons, I have a sample that contains reads where there is a mix of an insertion and a deletion at the same position. The problem is varscan only detects the deletion.
java -jar VarScan.v2.4.0.jar mpileup2cns my_sample.mpileup --strand-filter 0 --min-var-freq 0.10 --min-avg-qual 0 --min-coverage 1 --min-reads2 1 --output-vcf 1 --variants 1 > my_sample.vcf
here is the relevant position in the mpileup input to varscan
ZmpylD 177 t 193 .-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.+23CGTCGACGTACCCGACGGCAACA.+23CGTCGACGTCCCCG
ACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG..-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACAA.+23CGTCGACGTCCCCGACGGCAACA.-2C
G.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.+23CGTCGACG
TCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG
.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.+23CG
TCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.+23CGTCG
ACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCCACA.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG..-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.+23TGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.+23CGTCTACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACAA.-2CGA.-2CG.-2CG.-2CG.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.+23CGTCGACGTCCCCGACGGCAACA..+23CGTCGACGTCCCCGACGGCAACA.-2CG.-2CG.-2CG AFGCGC5GF+G@GFFFEF=FGFG<#FFF6GFGFEGCCCGCGCGFFGCECGFGGGCGCFEFECFEFGGGFGEFCFGCGGFGGFCGFG@EG8EG?ECGEGGCFGCEG7G=F=CFCFCG@CGF>FFG?FCEGFEGCFG2ACEG?GE;GGFFGG?<GGGC=F=FGCFF:DECC8CGGGGF%C#GCGFC<=GCG6GGD
So the question is why is the insertion not also reported and can this be fixed? Note that I have also tried to use freebayes on this data but get a similar result. It is important that I be able to detect such alternate variant alleles. If varscan or freebayes cannot do this, can anyone suggest a tool that can?
Thanks for your help
Mark
Thanks for the suggestion but , no, it is as before. Incidentally, I mistakenly said that varscan called the deletion not the insertion. Actually varscan calls the insertion not the deletion and freebayes calls the deletion not the insertion.
Can you post the output for both mpileup2cns and mpileup2indel for the position
From the original output, it looks like the 2bp deletion is observed 134 times, and a 23bp insertion 53 times. So the 2bp deletion seems the correct consensus (so freebayes is correct). Does varscan call a 23bp insertion - or a 21bp insertion - i.e. maybe it merges the insertion and deletion together as they both start with CG.
the mpileupcns vcf
the mpileupindel vcf
So in neither case is the 23 bp (or 21bp) insertion called. For what its worth, IGV recognizes it as a 23bp deletion.
So if you are correct that varscan is calling only the most common variant at each locus, whether you use mpileupcns or mpileupindel, then it is not useful as I have tried to use it since multiple alleles per locus will be common in my data. Likewise for freebayes. If these tools are inadequate (as I have used them) is there another that could do the job? Alternatively, could I take a different approach with varscan? Could I perhaps use it to call variants on every read individually so that multiple alternate alleles would not occur and then tally up the vcf info for each to find variants above my thresholds?