Segmentation fault (core dumped) error with bigWigAverageOverBed
0
0
Entering edit mode
3.9 years ago
camelest ▴ 50

I'm wondering someone could help about the error on bigWigAverageOverBed. My system is Ubuntu 18.04.4 LTS and bigWigAverageOverbed is v357.

I'm encountering an error with a simple code as below,

bigWigAverageOverBed input.bw INPUT.bed output.tab

which gives me back as

processing chromosomes Segmentation fault (core dumped)

I have three files as INPUT.bed, which I modified by awk command according to their ovelaps with another reference BEDfile. Somehow only one of the three INPUT.bed gives the error as above. Since the size of the one was relatively large (36M), so I tried splitting it but the error doesn't change.

Any help would be really appreciated.

bigWigAverageOverBed RNA-Seq • 3.1k views
ADD COMMENT
1
Entering edit mode

I had a similar error (Segmentation fault) in bigWigAverageOverBed due to a small number of entries in the bed file which had an end position beyond the length of their respective chromosome. Once these were removed the file processed fine regardless of size (up to 16M at least). I suggest rather than just splitting the file, trying to slice it to from the top (e.g. using head, each time with a bigger -n) to see if the problem only starts at a certain point in the bed file.

ADD REPLY
0
Entering edit mode

I'm sorry for the late reply. I thought I replied but just realized it wasn't successfully posted. In conclusion, chatul's point was correct. When I removed regions beyond the chromosomes, the error went away. Thank you so much for the help.

ADD REPLY
0
Entering edit mode

what is the output of

file INPUT.bed input.bw

awk '{printf("%d,%s\n",int($3)-int($2),$0);}' INPUT.bed | sort -t, -k1,1n | head

awk '{printf("%d,%s\n",int($3)-int($2),$0);}' INPUT.bed | sort -t, -k1,1n | tail
ADD REPLY
0
Entering edit mode

Thank you for your input. These are the results.

file INPUT.bed input.bw
INPUT.bed: ASCII text
input.bw: data

awk '{printf("%d,%s\n",int($3)-int($2),$0);}' INPUT.bed | sort -t, -k1,1n | head

1,chr1  100000358   100000359   chr1:100000358-100000359,+  0   +
1,chr1  10002693    10002694    chr1:10002693-10002694,+    0   +
1,chr1  10002731    10002732    chr1:10002731-10002732,-    0   -
1,chr1  10002877    10002878    chr1:10002877-10002878,+    0   +
1,chr1  10002963    10002964    chr1:10002963-10002964,+    0   +
1,chr1  10003111    10003112    chr1:10003111-10003112,+    0   +
1,chr1  10003414    10003415    chr1:10003414-10003415,+    0   +
1,chr1  10003546    10003547    chr1:10003546-10003547,-    0   -
1,chr1  10003591    10003592    chr1:10003591-10003592,+    0   +
1,chr1  10003596    10003597    chr1:10003596-10003597,-    0   -

awk '{printf("%d,%s\n",int($3)-int($2),$0);}' INPUT.bed | sort -t, -k1,1n | tail

320,chr3    149374579   149374899   chr3:149374579-149374899,+  +
325,chr3    24575115    24575440    chr3:24575115-24575440,+    +
331,chr16   2318409 2318740 chr16:2318409-2318740,+ 0   +
334,chr11   134337126   134337460   chr11:134337126-134337460,+ +
346,chr2    170219168   170219514   chr2:170219168-170219514,-  -
358,chr9    69065046    69065404    chr9:69065046-69065404,-    -
364,chr8    24813374    24813738    chr8:24813374-24813738,-    -
393,chr12   48336640    48337033    chr12:48336640-48337033,-   -
435,chr15   80696258    80696693    chr15:80696258-80696693,-   -
561,chr11   65266877    65267438    chr11:65266877-65267438,+   +
ADD REPLY
0
Entering edit mode

That looks like a pretty unusual bed file, where the chromosomes do not look formatted correctly, and it isn't sorted in a typical way. What do the chromosome names in the bigWig file look like?

ADD REPLY
0
Entering edit mode

Thank you for your help. So it seems that there is something wrong in Chr column with my BED file. I used bigWigtobedGraph and the result is something like this.

bigWigToBedGraph input.bw output.bedGraph

chr1    629903  629904  1
chr1    629909  629910  1
chr1    629916  629917  1
chr1    629919  629920  1
chr1    629929  629930  4
chr1    629932  629933  1
ADD REPLY
0
Entering edit mode

Chromosome names have to match up to do operations. In other words, 1,chr1 will not match up with chr1, for example. Adjust your awk statement accordingly, so that you're not adding commas, numbers or other extraneous stuff to the chromosome field of INPUT.bed.

ADD REPLY
0
Entering edit mode

Thank you for your comment. I think something like "1,Chr1" is output of the code Pierre Lindenbaum suggested. 1 is the result of int($3)-int($2) in printf, if I understand correctly (sorry I'm pretty new to this area).

The below code gives me the usual Chr orders.

awk '{print $1}' INPUT.bed | sort | uniq

chr1
chr10
chr11
chr11_gl000202_random
chr12
chr13
chr14
chr15
chr16
chr17

Do you have any other ideas why this doesn't work? Thank you so much for the help.

ADD REPLY
0
Entering edit mode

you're not the only one with this problem: it looks like a problem with the memory management/ with the OS: https://www.google.com/search?client=q=bigWigAverageOverBed+segmentation+fault

ADD REPLY
0
Entering edit mode

Thank you for your input. I also encountered some posts stating that. But if its memory problem, shouldn't splitting the file solve the problem?

ADD REPLY
0
Entering edit mode

You could try splitting the BED file by chromosome:

$ sort-bed INPUT.bed > INPUT.sorted.bed
$ for CHR in `bedextract --list-chr INPUT.sorted.bed`; do bedextract ${CHR} INPUT.sorted.bed > INPUT.sorted.${CHR}.bed; done

Then run your bigWigAverageOverBed step on each per-chromosome file.

If you want to track memory usage, you can run top while running your per-chromosome process and press Shift-M to sort processes by memory.

Presumably, chr1 would be your largest file and so use the most memory.

ADD REPLY

Login before adding your answer.

Traffic: 2118 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6