I am working with exome-seq data. (actually more like targeted sequencing data)
I have my own bed file. I am trying to calculate GC contents for my intervals in my bed file using GATK GCContentBYInterval.
However, I realized that some of intervals are missing after I run GCContextbyInterval in GATK.
For example, I have 62308 intervals in my bed file.
But when I run
java -Xmx2000m -Djava.io.tmpdir=TEMP -jar xxxx/GenomeAnalysisTK.jar -T GCContentByInterval -L mybedfile.bed -R fastfile.fa -o gc.txt
My gc.txt
file only includes 62181 intervals(lines) instead of 62308.
I am not sure where were 130 intervals gone..
I googled it and found that if the intervals are continous, it will not be reported both, only one. However it is not the case here.
Could somebody please let me know? Am I missing something? Is it a bug in GATK?