How would I compare a bed file and a bam file using bedtools intersect to see how much coverage I had of specific coordinates in my bed file? Thank you in advance for your help!
How would I compare a bed file and a bam file using bedtools intersect to see how much coverage I had of specific coordinates in my bed file? Thank you in advance for your help!
Here's one way:
$ bedmap --echo --count --echo-overlap-size --echo-ref-size --delim '\t' my_intervals.bed <(bam2bed < my_reads.bam) | awk -vFS="\t" -vOFS="\t" '{ print $0, ($NF-1)/$NF; }' > answer.bed
Note: Please see my second answer in this thread, which uses your specific inputs.
I'm not sure why you would see three columns. There should be four columns: the number of reads which overlap the interval (--count
), the total lengths of overlaps between reads and the interval (--echo-overlap-size
), the length of the interval (--echo-ref-size
), and the fraction of overlap length by interval (via awk
). If you can post your inputs somewhere, I can take a look.
I have copied the link where my bed and bam file is that I was using to measure coverage of my bed coordinates with the command you gave. Hope you can look at this and if the output is correct.
https://drive.google.com/drive/folders/1tXtRjoEvs3k2N-DZWayT5pR9iHy0slxM?usp=sharing
First, please sort the BED file with sort-bed
(not sortBed
or UNIX sort
):
$ sort-bed 10A_AT1_SE1.bed > 10A_AT1_SE1.sorted.bed
Then map the sorted BED file to the BAM file:
$ bedmap --skip-unmapped --delim '\t' --echo --count --bases-uniq --echo-ref-size --bases-uniq-f 10A_AT1_SE1.sorted.bed <(bam2bed < 10A_1_H3K.sorted.bam) > 10A_1_H3K.coverage.bed
The file 10A_1_H3K.coverage.bed
will look something like this:
$ head 10A_1_H3K.coverage.bed
chr1 199064 200374 404 1310 1310 1.000000
chr1 6853080 6870040 204 9044 16960 0.533255
chr1 8004008 8027135 507 22308 23127 0.964587
chr1 8061658 8098833 366 37175 37175 1.000000
chr1 8174215 8211962 369 34253 37747 0.907436
chr1 8873384 8890703 434 17319 17319 1.000000
chr1 8874708 8890922 420 16214 16214 1.000000
chr1 15833039 15850691 271 17652 17652 1.000000
chr1 16643023 16645543 266 2520 2520 1.000000
chr1 16665212 16667822 281 2610 2610 1.000000
The last four columns are:
The order of the specified bedmap
operands --count
, --bases-uniq
, --echo-ref-size
, and --bases-uniq-f
write the aforementioned column values in that same order.
If one base of read coverage is too lenient, additional operands are available to further constrain that criterion. Run bedmap --help
and take a look at the "Overlap Options" section.
Hope this helps!
Just one last question, I should be able to use the counts column that I have to normalize my reads by DESeq2. I ask this because i have three coverage.bed files and I want to normalize the read coverage between them. I have used DESeq2 before so was wondering would that method work here as well? Thanks for your help
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
mosdepth
(LINK) if you want a more performant option.Thanks, this is helpful, going to try out the couple of ways you guys suggested