Coverage plot/histogram of a complete genome
1
0
Entering edit mode
3.2 years ago
Michal Nevo ▴ 140

Hi, I have 10 bam files with many contigs.

I created bed files for each one (with bedtools genomecov)

enter image description here

How can I create a histogram that shows coverage depth (In the x-axis) and the number of bases in the genome (In the y-axis) ? One histogram for each bam file.

like this:

enter image description here

Thanks!

bam histogram plot genome Coverage • 1.9k views
ADD COMMENT
1
Entering edit mode

I guess QualiMap has this option.

ADD REPLY
1
Entering edit mode
3.2 years ago

You can use deepTools' plotCoverage tool, which will generate 2 slightly different plots, but will allow you to extract the table of counts that you can use to generate a histogram in any other tool of your liking, e.g. via R Details are here

ADD COMMENT
0
Entering edit mode

Can you explain? I've already created these 2 graphs..but this is not what I need. I'm looking for a way to create the graph I need.

Did you mean I should create this file? enter image description here

I already have a coverage file.

ADD REPLY
0
Entering edit mode

correct, the --outRawCounts option will generate a file where each row corresponds to the number of reads found at a given bp position.

Here's the example code from the deepTools documentation:

$ head coverage.tab
    'H3K27me3'        'H3K4me1'       'H3K4me3'       'H3K9me3'
    0 0       0       0
    0 0       0       0
    0 0       0       0
    0 0       0       0
    0 0       0       0
    0 0       0       0
    0 0       0       0
    0 0       0       0
    0 0       0       0

$ cut -f1 coverage.tab | sort -n | uniq -c
    1 'H3K27me3'
    548190 0 # the vast majority of sampled bp had 0 overlapping reads
    127914 1
    35703 2
    12271 3
    4584 4
    1717 5
     659 6
     251 7
     106 8
      49 9
      16 10
       6 11
       3 12
       2 13
       3 14
       1 15
       1 16
       2 17
       1 19
       1 21
       2 22
       1 23
       1 24
       2 28
       1 35
       1 40 # there was one bp with 40 overlapping reads!
       1 44

uniq -c will return two columns as shown above -- the first one contains the values your example plot has on the y axis ("number of bp in the genome") and the second column contains the coverage value, which is shown on the x axis. In the example above, the peak would be to the far left, i.e. at 0x coverage.

ADD REPLY

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6