Hi:
I use GATK (reference is b37) to call snp & indel ,but I have some problems with the results.
I select 2 sites from 2 samples to describe my questions:
sample1(WES) insertion: The site 3:8299641 was called as an insertion(G
to GGAAGGAAGGAAGGAAGGAAGGAAC
, but I didn't find any insertion in all of the mapping reads, and the "insertion sequence" can be found in the reference. Especially, the first and last bases of the insertion are "G" (3:8299641) and "C" (3:8299665),which I think should be called as snp, are not called as snp.
sample2(WGS) deletion: 1:3081756 was called as deletion (GGGACTTACCTGGCCTCAGGGGCAT
to G
), but I didn't find any deletion in all of the mapped reads. And it shows that reference also has these bases.
In addition I have checked the code many times and I'm sure I use the right bam file.
Is there anyone has the same problem? And what causes this?
AND
I use RealignerTargetCreator and IndelRealigner for both sample1 and sample2.
It drives me crazy that I can't put pictures in. I use samtools tview to see the position (in sample1, WES, average depth above 100X, position 3:8299641) and put it below:
8299641 8299651 8299661 8299671 8299681 8299691 8299701 8299711
GGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGACTTTACTTTTACCACACTGGATTGTTTGTACGATTTAAATGAGAAA
S...............................................................................
............................................................ ,,,,,,,,,,,,,,,
........................C........ ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
........................C........ ..................................
..................... ..................................................
........................C........ ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
............................................................... ,,,,,,,,,,,,,,
................................................................ ,,,,,,,,,,,,,
........................A........ ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
........................C........ ,,,,,,,,,,
C................................................................ ,,,,,,,
C.................................................... ,,,,,,
C........................................................... ,,,,,
C............................................................
sample2 (WGS, average depth above 30X) deletion: 1:3081756
3081761 3081771 3081781 3081791 3081801 3081811 3081821
GGGACTTACCTGGCCTCAGGGGCATGGACTCACCTGGGCTCAGGGACAGTGACTTACCTGGGCTCAGGGACAGGGATGCA
......Y........................................R......Y............R............
......C......G.......A..G........................G....C............A........C
,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g,,,,,,c,,,,,,,,,,,,a,,,,,,,,c
.................................................G..A.C..A..................C...
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,c,,,,,,,,,,,,,,,,,,,,,
................................................................................
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g,,,,,,c,,,,,,,,,,,,a,,,,,,,,
,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g,,,,,,c,,,,,,,,,,,,a,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g,,,,,,c,,,,,,,,,,,,a,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
......C........................................G......C............A........
...................................................G..G........G.........A..G...
It's difficult to see whether there problem is in GATK or your reading of the results without actually seeing the results. Can you post a screen capture from IGV, ideally showing the insertion/deletion (I'm not sure you can have IGV show the inserted sequence, but at least showing the region around it would be useful).
Also, did you run the indel realigner on the samples? What was the coverage like in those areas?
I updated my question and hope it helps.Thks.
Somewhat, it's still difficult to tell what exactly is going on without adding all of the sequences in. In the future, what people generally do is upload an image somewhere and just link to them (biostars doesn't host images for you).
You might look at the realigned BAM files and see if the edit distances in those regions are now lower. If they're not, then this would seem to be a GATK error. If the edit distances are lower and there are reads spanning the insertions/deletions then that's why this is happening (and it's quite possibly correct).