over 900 million bases in CCDS regions?
0
0
Entering edit mode
3.5 years ago
YL ▴ 10

Hello,

I have a BAM file (hg19) and need to calculate the depth of coverage at each base in the exonic regions. I downloaded the CCDS BED files from the UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=1109848749_gHXtAmBqKaR1UOA191fS812ba8pn&clade=mammal&org=Human&db=hg19&hgta_group=genes&hgta_track=ccdsGene&hgta_table=0&hgta_regionType=genome&position=chrX%3A15%2C578%2C261-15%2C621%2C068&hgta_outputType=bed&hgta_outFileName=ccds.bed) and used samtools depth

However, I found there are over 900 million bases in the CCDS file, which much exceeds the expected bases in exonic area (roughly 1%). I manually checked the bed file with R and got the similar output of 900 million bases (995030607) for non-overlapping regions

I wonder what the problem might be and I could obtain a BED file obtaining all human exonic regions?

Thank you very much!

CCDS • 1.0k views
ADD COMMENT
0
Entering edit mode

You can find the current CCDS data at this NCBI site. You should be able to create a BED file from information there. Take a look at the README to understand the data.

ADD REPLY
0
Entering edit mode

To my understanding, the data here, especially the newer ones, are built on GRCm38, while my data uasd hg19 as reference genome

ADD REPLY
0
Entering edit mode

What did you pick in the second screen - One entry per gene or per coding exon or something else?

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6