Excuse me:
After calling variants of indel and frameshift, I wanted to annotate them using 1000 Genome Project and ESP.Could I get the frequency of indel and frameshift?It seemed that the answer was no. I was confused.
Any help that help me understand that issue would be much appreciated.
Many thanks in advance!
No, but most of indel and frameshift in 1000g/esp had no annotation information("."). Is that normal?
Could you give me an example?
The 1000g first round didn't attempt to call indels. They were very low coverage WGS, intended for medium-frequency SNPs. They simply didn't call indels on most of the subjects, because 2-5x coverage can't do it accurately. I don't know about later rounds of data releases, maybe now they do have indel data. Same for ESP, as indels are harder to call, they avoided it to keep the dataset clean.
I don't know a good dataset for indels, and will monitor this thread closely :).
I know the GATK best practices for DNA-seq includes a variant recalibration that references some kind of indel reference, so that's worth a look. It's called the Mills set but I don't have a reference.
Yes! I have recalibrated using Mills. But I want to filter them depending on the frequency of indel and frameshift in 1000g/esp. It seemed unreasonable. So how could I deal with that kind of variants?