understanding exercise on file coverage
1
0
Entering edit mode
3.0 years ago
alexmondaini ▴ 20

I'm doing an exercise that asks for two files:

  • Input 1: A target file (.bed format) contains multiple regions from chr7:40000000-50000000 of human reference genome GRCh37 (hg19)
  • Input 2: Refseq exon list file (.bed format) for all human coding genes (hg19 position)

The final goal is:

For all genes located in chr7:40000000-50000000, get the summary statistics of the target file coverage. (For each gene, get the fraction of exonic bases that was covered by the target file).

I believe what they refer in this exercise is that the target file should be something like the whole chr sizes of hg19 hg19.chr.sizes and refseq_exon_list the list of exons from reqseq database. Both can be downloaded from tools like table browser. Is that correct?

I'm not sure which files I should download here to perform this task. Once downloaded the file I believe what I need to do is to restrict the refseq_exon_list by the requested region and then perform coverage with something like bedtools. Something on those lines:

bedtools coverage -a hg19.chr.sizes -b reqseq_exon_list

Am I right here ? Any input is appreciated, thanks.

coverage bedtools • 645 views
ADD COMMENT
1
Entering edit mode
3.0 years ago

No part of this exercise seems to ask for manipulating entire chromosomes. So why are even considering computing converge over the entire chromosomes?

Chromosome sizes are not intervals, they simply list how long the chromosomes are.

Instead of chromosome size, you need to use the target bed file in the first position.

I will say that your wording of the problem is a bit ambiguous, it seems to conflate exons and genes. Makes sure that is what the problem really asks. What would the exonic coverage of a gene even mean? A gene is a long interval that includes exons and introns.

ADD COMMENT

Login before adding your answer.

Traffic: 1900 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6