Obtain fraction of gene covered from bam file using bed file
0
0
Entering edit mode
2.2 years ago

I am trying to calculate the gene coverage of specific genes in a BAM file. I have a list of genes with thier start and end positions in a BED file. I would essentially like to know the number of overlaps for each gene and how well it is covered.

My bed file looks like this:

track name="tb_ncbiRefSeqCurated" description="table browser query on ncbiRefSeqCurated" visibility=3 url=
chr10   100042192       100081869       NM_001308.3     CPN1
chr10   100150093       100186029       NM_006459.4     ERLIN1
chr10   100188299       100229596       NM_001278.5     CHUK
chr10   100232297       100267638       NM_001303405.2  CWF19L1
chr10   100523728       100529923       NM_001284368.1  NDUFB8
chr10   100523739       100529871       NM_001284367.2  NDUFB8
chr10   100735395       100747437       NM_001374303.1  PAX2
chr10   100735395       100829944       NM_001304569.2  PAX2

I have tried to covert the bam file to a bed file and then used bedmap to obtain this, but I dont get the fraction of coverage. This is the code I used:

> bam2bed <../../bam_files/NA12878-200304612B-v15-450-Rep10_35x.bam | bedmap --delim '\t' --echo --count --bases-uniq --bases-uniq-f --echo-ref-size --mean ../../../reference_genes/5_selectedCols_mendel_genes - > test_bed

I get an output that looks like this:

chr10   100150093       100186029       NM_006459.4     ERLIN1  0       0       0.000000        35936   NAN
chr10   100188299       100229596       NM_001278.5     CHUK    0       0       0.000000        41297   NAN
chr10   100232297       100267638       NM_001303405.2  CWF19L1 0       0       0.000000        35341   NAN
chr10   100523728       100529923       NM_001284368.1  NDUFB8  0       0       0.000000        6195    NAN
chr10   100523739       100529871       NM_001284367.2  NDUFB8  0       0       0.000000        6132    NAN
chr10   100735395       100747437       NM_001374303.1  PAX2    0       0       0.000000        12042   NAN
chr10   100735395       100829944       NM_001304569.2  PAX2    0       0       0.000000        94549   NAN
chr10   100745581       100829944       NM_003989.5     PAX2    0       0       0.000000        84363   NAN

The counts and the fraction of bases are not calculated.

Could you help me to find a better solution. I am sure this has been done before but I just cant find it.

coverage bed bam mapping • 632 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks! I have checked this out and found

bedtools bamtobed -i input_bam | bedtools coverage -mean -b - -a input_bed > output.file 

provides the mean coverage of genes.

bedtools bamtobed -i input_bam | bedtools coverage -hist -b - -a input_bed > output.file

provides the histogram coverage of each feature that can be combined to obtain the overage percent covered and coverage uniformity.

In addition, the Jvarkit Bamstats05 is an useful package to get all this information together

ADD REPLY

Login before adding your answer.

Traffic: 2471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6