We identified certain regions in the genome where there is significantly low coverage across individuals. One of those is a high GC content region (70%) so that might be one explanation but I am wondering what may be other reasons why we can't get good sequencing/alignment in this region. This question can be generalized as "what are the characteristics of a region of the genome that would make it hard to sequence and align so a reliable genotype calling can't be performed using Illumina next-gen sequencing". Other platforms have their unique problems but we are interested in Illumina platform related issues.
Some issues we hypothesize to be important are:
Paralogous regions
Repeats
Segmental Duplications
Do you have other things that we can add to this list of things to check? Thank you
Great question. Very interested in reading what people post on this topic.