I'm doing an exercise that asks for two files:
- Input 1:
A target file
(.bed format) contains multiple regions fromchr7:40000000-50000000
of human reference genome GRCh37 (hg19) - Input 2:
Refseq exon list file
(.bed format) for all human coding genes (hg19 position)
The final goal is:
For all genes located in chr7:40000000-50000000
, get the summary statistics of the target file coverage. (For each gene, get the fraction of exonic bases that was covered by the target file).
I believe what they refer in this exercise is that the target file should be something like the whole chr sizes of hg19 hg19.chr.sizes
and refseq_exon_list
the list of exons from reqseq database. Both can be downloaded from tools like table browser. Is that correct?
I'm not sure which files I should download here to perform this task. Once downloaded the file I believe what I need to do is to restrict the refseq_exon_list
by the requested region and then perform coverage with something like bedtools
. Something on those lines:
bedtools coverage -a hg19.chr.sizes -b reqseq_exon_list
Am I right here ? Any input is appreciated, thanks.