Question

[solved]bedtools coverageBed: not sorted output

0

Entering edit mode

9.5 years ago

Chico Fernandez ▴ 80

SOLVED

I need to calculate coverages for each position in several regions. I use following command:

coverageBed -abam bam_file -b name_of_bed -d

But finally the coordinates from .bed file are not sorted, in the result file! It reduces the opportunity to use cost-efficient algorithms.

Could you tell me what I am doing wrong? Is there a better way to calculate coverages for each position and keep them in the same order as in original bed file?

Example:

BED-file starts with:

chr1    762097    762270
chr1    865591    865791

(and so on, ascending order of coordinates in chromosome)

So I would like to have coverage output to be in the same order. What I have:

first line in the output file:

1       39845644        39846247        1       0

100th line:

1       917230  917836  516     11

(weird order, neither ascending nor descending)

I re-sorted bam file just in case it was not sorted. It did not help.

UPDATE: tried to calculate coverage using bedTools v. 2.25 and -sorted flag. Previously output was:

1    39845744    39846147    1    4
1    39845744    39846147    2    4
1    39845744    39846147    3    4

Now:

1    10001    10077    HWUSI-EAS1693_0012:7:41:10492:2084#0/1    0    +    10001    10077    0,0,0    1    76,    0,    1    0
1    10001    10077    HWUSI-EAS1693_0012:7:41:10492:2084#0/1    0    +    10001    10077    0,0,0    1    76,    0,    2    0
1    10001    10077    HWUSI-EAS1693_0012:7:41:10492:2084#0/1    0    +    10001    10077    0,0,0    1    76,    0,    3    0

WTF is this? HWUSI? Code for v.2.15:

coverageBed -abam Sample.realigned.dm.recalibrated.bam -b Regions.bed -d > output.txt

Code for v.2.25:

covBed -abam Sample.realigned.dm.recalibrated.bam -b Regions.bed -d -sorted > output.sorted.txt

coverage next-gen • 4.8k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.5 years ago by Chico Fernandez ▴ 80

0

Entering edit mode

How was your bed file sorted? Couldn't you just use the sort command to sort them how you'd like? Or if you'd rather re-sort them to match your original bed file you could just read both into R and use the match function (merge should work too).

ADD REPLY • link 9.5 years ago by dally ▴ 210

0

Entering edit mode

They are sorted lexicographically. The smaller position on chromosome is - the higher the region is.

ADD REPLY • link 9.5 years ago by Chico Fernandez ▴ 80

0

Entering edit mode

So to get this right, your bam and bed input is sorted, but your output doesn't come out sorted (because bedtools automatically sorts in its preferred sort order) as you'd like? Could you post a small section of the resulting bedfile and the input bed file and an example of what you want the output to be so we can help.

ADD REPLY • link 9.5 years ago by dally ▴ 210

0

Entering edit mode

I updated my question. And I have one bam file and one bed file. So I would like to obtain number of reads from bam file covering probes from bed file and to have them sorted as in the bed file.

ADD REPLY • link 9.5 years ago by Chico Fernandez ▴ 80

0

Entering edit mode

Try @dariober's solution and if that does not work i'll try to give you a R solution.

ADD REPLY • link 9.5 years ago by dally ▴ 210

0

Entering edit mode

I'm trying now. The files are approx 5 GB each. I can sort output files with a script, but I a) do not want to re-write the file with coverages (more memory + time consumption) b) do not understand how and why bedtools mixes the order of bed specified regions. It makes no sense.

ADD REPLY • link 9.5 years ago by Chico Fernandez ▴ 80

0

Entering edit mode

Your bed file is 5GB!? In this case you definitely want to use the -sorted option that comes with latest versions of bedtools otherwise it's going to take an awful amount of memory and time (and you get sorted output). By the way, if you stream the output of coverageBed to sort I think the overhead of re-sorting is going to be relatively small compared to the actual counting.

ADD REPLY • link 9.5 years ago by dariober 15k

0

Entering edit mode

Sorry, had a limit of 5 messages per day on this forum so could not answer yesterday. Yes, they are exome-seq data. I tried your gnu version - it worked for ages, I waited several hours and then interrupted it. Now will try 2.25 version, hope it will work.

ADD REPLY • link 9.5 years ago by Chico Fernandez ▴ 80

0

Entering edit mode

9.5 years ago

h.mon 35k

Sort the .bam and .bed files with samtools sort and sortBed, respectively.

ADD COMMENT • link 9.5 years ago by h.mon 35k

0

Entering edit mode

Thank you. But both of them are sorted.

ADD REPLY • link 9.5 years ago by Chico Fernandez ▴ 80

0

Entering edit mode

8.8 years ago

aiirtime • 0

Hello, it could be I miss understand something, but where is the Problem solved? I have a similar issue it doesn't depend on the order but on the strange output. I recognized that the coverageBed usage/description changed at the point of the perspective.

Earlier versions "coverageBed" main task description was:

Returns the depth and breadth of coverage of features from A on the intervals in B

which is represented by your first output Chico Fernandez:

1    39845744    39846147    1    4
1    39845744    39846147    2    4
1    39845744    39846147    3    4

and at the version 2.26 "coverageBed" main task description is:

Returns the depth and breadth of coverage of features from B on the intervals in A

you see the perspective is changed for more speed I thing, but the Problem is your output also changed like the second output from Chico Fernandez:

1    10001    10077    HWUSI-EAS1693_0012:7:41:10492:2084#0/1    0    +    10001    10077    0,0,0    1    76,    0,    1    0

1    10001    10077    HWUSI-EAS1693_0012:7:41:10492:2084#0/1    0    +    10001    10077    0,0,0    1    76,    0,    2    0

1    10001    10077    HWUSI-EAS1693_0012:7:41:10492:2084#0/1    0    +    10001    10077    0,0,0    1    76,    0,    3    0

I thought ok, than I have to change only the parameters -a and -b so: I try to force coverageBed from version 2.26 to do the old fashion by changing -a and -b This didn't worked because the tool is crushing. It was the RAM (The bam file with 10Gb and as -b parameter couldn't be load into the RAM with 64Gb, internal stuff makes the file bigger? could it be?)

Is there a possibility to use coverageBed like before? Instead of using the old version?

Best airtime

ADD COMMENT • link 8.8 years ago by aiirtime • 0

Ram · Accepted Answer · 2016-02-02

What version of bedtools do you use? As of version 2.24 the output should be sorted if input is sorted. From https://github.com/arq5x/bedtools2/blob/master/docs/content/history.rst#version-2250-3-sept-2015:

The coverage tool now takes advantage of pre-sorted intervals via the -sorted option. This allows the coverage tool to be much faster, use far less memory, and report coverage for intervals in their original order in the input file.

Alternatively, just use gnu sort as

coverageBed -abam bam_file -b name_of_bed -d | sort -k1,1 -k2,2n > out.bed