I am not sure If am using write terminology or asking my question in a right way. So please do not judge this post early :-). I have a list of Copy Number Variations (CNVs), and I was curious to know if these regions are in the coding part of genomes (exome regions) and if so, how much my exome data are covering those regions.
To answer my question, I used "DepthOfCoverage" from GATK and ran it using hg19 references; so for one of the regions "chr15 20549990 22285350" (which is also a large one!) I got the following error.
##### ERROR MESSAGE: Badly formed genome loc: Contig 16 given as location, but this contig isn't present in the Fasta sequence dictionary
my interpretation for the error is that, this region is not entirely in an exome ... 1) is my interpretation that correct ? 2) is there any other way to answer my initial question ?! (how much my exome data are covering those CNVs?
Thank you
My interpretation is that you are using an input file that contains the term "Contig 16" and that term is not found in the reference genome. Try searching the Web using part of your error message "this contig isn't present in the Fasta sequence dictionary" - it's a common issue.
actually the error means "this region doesn't exist in the reference file". so, does it mean the CNV is corrupted ?! or the reference does not contain the non-coding region ?!
Assuming that GATK is the right tool, can you give the entire command that you used?
If you just want to see if your CNVs overlap genes, then you can get a list of genes from UCSC (http://hgdownload.cse.ucsc.edu/downloads.html) and then use bedtools to look at the overlap.
I want to see if my CNVs overlap coding regions not any specific gene.