Hi Everyone,
I'd like to get some advice/ideas on how to confirm that there is structural variation in a specific genomic region in a population. We sequenced individuals from two populations to high coverage, ~50X worth of data for each population. We detect a region of the genome ~500kb in size where we see double the amount of coverage in one population compared to the other. We therefore suspect the region is duplicated in one population (we also see a large increase in heterozygosity in the population with the higher coverage, which is presumably from mapping two genomic regions to one location).
We'd like to confirm that there is indeed a duplication here and that there is not just a random increase in coverage at one population at this site. One approach we are considering is searching for 'junction fragments', the reads that contain part of both the original sequence and the new duplicated sequence. Presumably these will not have been mapped as in the reference genome they have no close correlate. If anybody knows of a good way to do this or knows any papers or software that deal with this problem that would be great.
Any other ideas for confirming the presence of copy number variation is appreciated. Ideally methods that we could use on the existing data rather than resequencing.
*I should specify that we would also like to know exactly where the structural variant begins and ends.
Many thanks in advance
What species? Are the "individuals" expected to be homogeneous, genetically?