Question

Calculate the percentage of genomic region covered from the BED file.

0

Entering edit mode

22 months ago

adarsh ▴ 60

Hello,

I have multiple exome capture kit bed files. I need to know if there is any method to calculate how much portion/ percentage of the region in a gene, each capture kit covers. Thereby, I can compare these files based on specific genes. I have visualized these bed files in IGV and it is visually covering the gene of interest. But numerically is there any possibility to find?

Thank you

NGS sequencing genomics exome genes • 1.6k views

ADD COMMENT • link updated 22 months ago by rfran010 ★ 1.5k • written 22 months ago by adarsh ▴ 60

2

Entering edit mode

How to calculate average coverage for all genes ;

ADD REPLY • link 22 months ago by Pierre Lindenbaum 166k

1

Entering edit mode

Aside from the programs/answers linked in @Pierre's answer mosdepth (LINK) is the fastest way to do this.

Note: Are you asking if the BED file covers what portion of each gene? i.e. you are not asking above coverage from BAM alignments?

ADD REPLY • link 22 months ago by GenoMax 151k

0

Entering edit mode

Not from the BAM files, but from BED files. Even I had the same question myself. Because this coverage from BED was asked to me by another person and I was not able to get an answer.

ADD REPLY • link 22 months ago by adarsh ▴ 60

1

Entering edit mode

You would need to do some custom coding to figure that out.

ADD REPLY • link 22 months ago by GenoMax 151k

1

Entering edit mode

I have multiple exome capture kit bed files. coverage from BED

These are just interval files, you can't get coverage from these.

You can diff them and annotate the intervals, but I don't think this kind of comparison would give you insightful results. Will you be comparing the probe intervals? Because most of the target files of the bed files are just the exon intervals. You can add padding to the probe intervals and compare them, but you can't really know which probes work better without the sequencing data.

ADD REPLY • link 22 months ago by barslmn ★ 2.4k

0

Entering edit mode

I could be off on what you're trying to accomplish, but it sounds like bedtools should be able to do this.

If you, like your questions asks, you want to calculate the % genomic region covered by a bed file, you can use bedtools annotate for this. You provide the regions you want to know the % covered and then supply the files that will be "covering" these regions.

Maybe:

bedtools annotate -i genes.bed -files exome_kit1.bed exome_kit2.bed exome_kit3.bed

Expected output (roughly)

chr start end name exome_kit1 exome_kit2 exome_kit3
chr5 100 200 gene1 0.9 0.9 0.1
chr5 300 400 gene2 0.1 0.1 0.9

In this hypothetical, exome_kits 1 and 2 cover gene 1 90% while Kit3 doesn't have good overlap, but situation is reversed for gene2.

ADD REPLY • link 22 months ago by rfran010 ★ 1.5k