I am trying to calculate the gene coverage of specific genes in a BAM file. I have a list of genes with thier start and end positions in a BED file. I would essentially like to know the number of overlaps for each gene and how well it is covered.
My bed file looks like this:
track name="tb_ncbiRefSeqCurated" description="table browser query on ncbiRefSeqCurated" visibility=3 url=
chr10 100042192 100081869 NM_001308.3 CPN1
chr10 100150093 100186029 NM_006459.4 ERLIN1
chr10 100188299 100229596 NM_001278.5 CHUK
chr10 100232297 100267638 NM_001303405.2 CWF19L1
chr10 100523728 100529923 NM_001284368.1 NDUFB8
chr10 100523739 100529871 NM_001284367.2 NDUFB8
chr10 100735395 100747437 NM_001374303.1 PAX2
chr10 100735395 100829944 NM_001304569.2 PAX2
I have tried to covert the bam file to a bed file and then used bedmap to obtain this, but I dont get the fraction of coverage. This is the code I used:
> bam2bed <../../bam_files/NA12878-200304612B-v15-450-Rep10_35x.bam | bedmap --delim '\t' --echo --count --bases-uniq --bases-uniq-f --echo-ref-size --mean ../../../reference_genes/5_selectedCols_mendel_genes - > test_bed
I get an output that looks like this:
chr10 100150093 100186029 NM_006459.4 ERLIN1 0 0 0.000000 35936 NAN
chr10 100188299 100229596 NM_001278.5 CHUK 0 0 0.000000 41297 NAN
chr10 100232297 100267638 NM_001303405.2 CWF19L1 0 0 0.000000 35341 NAN
chr10 100523728 100529923 NM_001284368.1 NDUFB8 0 0 0.000000 6195 NAN
chr10 100523739 100529871 NM_001284367.2 NDUFB8 0 0 0.000000 6132 NAN
chr10 100735395 100747437 NM_001374303.1 PAX2 0 0 0.000000 12042 NAN
chr10 100735395 100829944 NM_001304569.2 PAX2 0 0 0.000000 94549 NAN
chr10 100745581 100829944 NM_003989.5 PAX2 0 0 0.000000 84363 NAN
The counts and the fraction of bases are not calculated.
Could you help me to find a better solution. I am sure this has been done before but I just cant find it.
see How to calculate average coverage for all genes ;
Thanks! I have checked this out and found
provides the mean coverage of genes.
provides the histogram coverage of each feature that can be combined to obtain the overage percent covered and coverage uniformity.
In addition, the Jvarkit Bamstats05 is an useful package to get all this information together