Hi, everyone, I have a problem with my exome sequencing result. In my result, almost half of somatic variants are A->T/T->A. It is really very confusing!!! I check my data, and found that original data are regular. But when removing dbsnp and 1000 genome, I found that variants A->T in tumor are much more than variants A->T in normal. Are there any suggestions???
Maybe by setting a (higher) threshold to the fraction of reads required to call a variant and the coverage? Can you look at some variants and check how many reads support the variant call? On the other hand, I didn't totally understand why your result (~50% A<->T substitution) is totally unexpected?
Half is not unexpected if there are less than 10 mutations. A test is needed.
Can you try to give a little more background here on your sequencing technology, experimental setup. etc. maybe also put in a table of substitution counts? What comes to mind immediately is sequencing errors and that many variants called could be false positives.
I use Hiseq2000, and Agilent 50M to capture exome regions. If that are sequencing errors, how can I do to filter them?
Also, have you made adjusted your parameters to exclude regions with read depth that is is unusually low or high? Both can cause SNP calling oddities (for different reasons).
I have set depth at least 8 and >15% in tumor and <0.5% in normal. Michael, by the way, do you think 50% A<->T substitution is acceptable? I did not read any paper say their AT substitution is so high.
I did not remove those depth >500, which used by some scientists. Because my average depth is 140X~160X, I think remove >500X may cause some missingness.
oh, that just the problem, I have average of 60!
What I mean is all the mutation.