Hello,
I have a BAM file (hg19) and need to calculate the depth of coverage at each base in the exonic regions. I downloaded the CCDS BED files from the UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=1109848749_gHXtAmBqKaR1UOA191fS812ba8pn&clade=mammal&org=Human&db=hg19&hgta_group=genes&hgta_track=ccdsGene&hgta_table=0&hgta_regionType=genome&position=chrX%3A15%2C578%2C261-15%2C621%2C068&hgta_outputType=bed&hgta_outFileName=ccds.bed) and used samtools depth
However, I found there are over 900 million bases in the CCDS file, which much exceeds the expected bases in exonic area (roughly 1%). I manually checked the bed file with R and got the similar output of 900 million bases (995030607) for non-overlapping regions
I wonder what the problem might be and I could obtain a BED file obtaining all human exonic regions?
Thank you very much!
You can find the current CCDS data at this NCBI site. You should be able to create a BED file from information there. Take a look at the README to understand the data.
To my understanding, the data here, especially the newer ones, are built on GRCm38, while my data uasd hg19 as reference genome
What did you pick in the second screen - One entry per gene or per coding exon or something else?
The second screen of UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=1109848749_gHXtAmBqKaR1UOA191fS812ba8pn&clade=mammal&org=Human&db=hg19&hgta_group=genes&hgta_track=ccdsGene&hgta_table=0&hgta_regionType=genome&position=chrX%3A15%2C578%2C261-15%2C621%2C068&hgta_outputType=bed&hgta_outFileName=ccds.bed), I chose "Create one BED record per: Coding Exons" (the 6th one and also by default)