Entering edit mode
16 months ago
adarsh_munna
▴
50
Hello,
I have multiple exome capture kit bed files. I need to know if there is any method to calculate how much portion/ percentage of the region in a gene, each capture kit covers. Thereby, I can compare these files based on specific genes. I have visualized these bed files in IGV and it is visually covering the gene of interest. But numerically is there any possibility to find?
Thank you
How to calculate average coverage for all genes ;
Aside from the programs/answers linked in @Pierre's answer
mosdepth
(LINK) is the fastest way to do this.Note: Are you asking if the BED file covers what portion of each gene? i.e. you are not asking above coverage from BAM alignments?
Not from the BAM files, but from BED files. Even I had the same question myself. Because this coverage from BED was asked to me by another person and I was not able to get an answer.
You would need to do some custom coding to figure that out.
These are just interval files, you can't get coverage from these.
You can diff them and annotate the intervals, but I don't think this kind of comparison would give you insightful results. Will you be comparing the probe intervals? Because most of the target files of the bed files are just the exon intervals. You can add padding to the probe intervals and compare them, but you can't really know which probes work better without the sequencing data.
I could be off on what you're trying to accomplish, but it sounds like bedtools should be able to do this.
If you, like your questions asks, you want to calculate the % genomic region covered by a bed file, you can use bedtools annotate for this. You provide the regions you want to know the % covered and then supply the files that will be "covering" these regions.
Maybe:
Expected output (roughly)
In this hypothetical, exome_kits 1 and 2 cover gene 1 90% while Kit3 doesn't have good overlap, but situation is reversed for gene2.