Use bedtools intersect to see how much coverage you had of a specific coordinate
3
0
Entering edit mode
4.0 years ago
mropri ▴ 160

How would I compare a bed file and a bam file using bedtools intersect to see how much coverage I had of specific coordinates in my bed file? Thank you in advance for your help!

ChIP-Seq • 2.5k views
ADD COMMENT
1
Entering edit mode
4.0 years ago

You want samtools depth.

ADD COMMENT
1
Entering edit mode

mosdepth (LINK) if you want a more performant option.

ADD REPLY
0
Entering edit mode

Thanks, this is helpful, going to try out the couple of ways you guys suggested

ADD REPLY
1
Entering edit mode
4.0 years ago

Here's one way:

$ bedmap --echo --count --echo-overlap-size --echo-ref-size --delim '\t' my_intervals.bed <(bam2bed < my_reads.bam) | awk -vFS="\t" -vOFS="\t" '{ print $0, ($NF-1)/$NF; }' > answer.bed

Note: Please see my second answer in this thread, which uses your specific inputs.

ADD COMMENT
0
Entering edit mode

Thank you, will try this out and see how it goes.

ADD REPLY
1
Entering edit mode

Running bedmap --help will describe the --count, --echo-overlap-size and --echo-ref-size options. This should give you numbers much like what other tools report.

ADD REPLY
0
Entering edit mode
chr12     75943973          75954960          0                 10987       0.999909

What do the last 3 columns represent

ADD REPLY
1
Entering edit mode

I'm not sure why you would see three columns. There should be four columns: the number of reads which overlap the interval (--count), the total lengths of overlaps between reads and the interval (--echo-overlap-size), the length of the interval (--echo-ref-size), and the fraction of overlap length by interval (via awk). If you can post your inputs somewhere, I can take a look.

ADD REPLY
0
Entering edit mode

I would like to post my input so you can check, where do you suggest I upload it so you can see. I would have replied earlier but because I have a new account can only post so much in 5 hrs so couldnt reply.

ADD REPLY
0
Entering edit mode

I have copied the link where my bed and bam file is that I was using to measure coverage of my bed coordinates with the command you gave. Hope you can look at this and if the output is correct.

https://drive.google.com/drive/folders/1tXtRjoEvs3k2N-DZWayT5pR9iHy0slxM?usp=sharing

ADD REPLY
0
Entering edit mode

I'll try to take a look this weekend

ADD REPLY
0
Entering edit mode

Thank you, feel bad for bothering you. Even if you are not able to, no worries, appreciate your help.

ADD REPLY
0
Entering edit mode

I'm going to add a separate answer, to keep things clean.

ADD REPLY
0
Entering edit mode

chr12 75943973 75954960 0 10987 0.999909

This is the output I got when for one of my regions when I ran this, what are the last 3 columns representing, a little confused there. Appreciate your help.

ADD REPLY
1
Entering edit mode
4.0 years ago

First, please sort the BED file with sort-bed (not sortBed or UNIX sort):

$ sort-bed 10A_AT1_SE1.bed > 10A_AT1_SE1.sorted.bed

Then map the sorted BED file to the BAM file:

$ bedmap --skip-unmapped --delim '\t' --echo --count --bases-uniq --echo-ref-size --bases-uniq-f 10A_AT1_SE1.sorted.bed <(bam2bed < 10A_1_H3K.sorted.bam) > 10A_1_H3K.coverage.bed

The file 10A_1_H3K.coverage.bed will look something like this:

$ head 10A_1_H3K.coverage.bed
chr1    199064  200374  404 1310    1310    1.000000
chr1    6853080 6870040 204 9044    16960   0.533255
chr1    8004008 8027135 507 22308   23127   0.964587
chr1    8061658 8098833 366 37175   37175   1.000000
chr1    8174215 8211962 369 34253   37747   0.907436
chr1    8873384 8890703 434 17319   17319   1.000000
chr1    8874708 8890922 420 16214   16214   1.000000
chr1    15833039    15850691    271 17652   17652   1.000000
chr1    16643023    16645543    266 2520    2520    1.000000
chr1    16665212    16667822    281 2610    2610    1.000000

The last four columns are:

  • The number of reads in the BAM file that overlap an interval in the BED file by one or more bases
  • The number of bases in the interval in the BED file that have coverage by reads from the BAM file
  • The length of the interval in the BED file
  • The fraction of bases in the interval in the BED file that have coverage by reads from the BAM file

The order of the specified bedmap operands --count, --bases-uniq, --echo-ref-size, and --bases-uniq-f write the aforementioned column values in that same order.

If one base of read coverage is too lenient, additional operands are available to further constrain that criterion. Run bedmap --help and take a look at the "Overlap Options" section.

Hope this helps!

ADD COMMENT
0
Entering edit mode

I really appreciate this, thank you for taking the time out of your day for the detailed answer. I am so thankful to you and this truly helps further my analysis and allows me to understand what is being done. All the thanks!

ADD REPLY
0
Entering edit mode

Just one last question, I should be able to use the counts column that I have to normalize my reads by DESeq2. I ask this because i have three coverage.bed files and I want to normalize the read coverage between them. I have used DESeq2 before so was wondering would that method work here as well? Thanks for your help

ADD REPLY
0
Entering edit mode

Yes, these are raw read counts.

ADD REPLY
0
Entering edit mode

Thank you! really appreciate all your help

ADD REPLY
0
Entering edit mode

Please upvote and accept this (and/or any other answers) that provided a solution for your problem. Use "green check" mark to do so. This will provide closure to the thread.

ADD REPLY

Login before adding your answer.

Traffic: 1583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6