Bedtools: Merging Many Bed Files
2
0
Entering edit mode
3.2 years ago
J ▴ 20

I am using the algorithm CookHLA for my research. As part of its preparation, I need to feed it a bed file representing at least 100 of my samples.

I have made the bed files for 500 samples using samtools and bedtools in a pipeline:

samtools view -@ CPUs -b --reference $REF_FILE $FILE_NAME.cram chr6:29000000-34000000 | bedtools bamtobed -i stdin > ${FILE_NAME}.bed

I then sorted these bed files with the following:

for FILE in * ;  
    do 
    echo $FILE
    FILE_NAME=$(echo $FILE | rev | cut -c5- | rev)
    bedtools sort -i $FILE > $FILE_NAME.sorted.bed
    done

Now I want to merge all these sorted bed files into one. I have read into the mergeBed function and intersect offered by bedtools. When running these, I am unfortunately getting the following error:

Error: Sorted input specified, but the file cat_beds.bed has the following out of order record
chr6    28999852    29000003    A00266:357:HFKFMDSXY:4:1449:1127:5791/1 60

Clearly there is a sequence outside of the bounds I specified through samtools. Anyone have advice?

bedtools CookHLA • 2.7k views
ADD COMMENT
0
Entering edit mode

As an addendum, I got that output after a pipeline where I used cat to combine the sorted bed files, then attempted mergeBed

ADD REPLY
3
Entering edit mode
3.2 years ago

I got that output after a pipeline where I used cat to combine the sorted bed files

The input to mergeBed shoud be sorted so I think you should do something like this:

cat *.bed | sortBed | mergeBed > merged.bed

where *.bed are the output files from bamtobed (there so no need to sort these files individually).

ADD COMMENT
0
Entering edit mode

Thanks, I got to that solution as well after looking at it further.

ADD REPLY
1
Entering edit mode
3.2 years ago

Here's another option to merge N files at once:

$ bedops --merge fileA.bed fileB.bed ... fileN.bed > answer.bed

If you have unsorted files and sufficient memory:

$ bedops --merge <(sort-bed fileA.bed) <(sort-bed fileB.bed) ... <(sort-bed fileN.bed) > answer.bed

Or sort them in a loop, depending on system memory, and merge at the end:

$ for fn in `ls *.bed`; do sort-bed ${fn} > ${fn%.bed}.sorted.bed; done; bedops --merge *.sorted.bed > answer.bed
ADD COMMENT

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6