Question

How to correctly use bedtools merge?

0

Entering edit mode

22 months ago

Amisha • 0

I have 10 bed files extracted from GEO Database, I want to use bedtools merge option to merge bedfiles and combine overlapping or “book-ended” features in an interval file into a single feature. But after using bedtools merge the output file generated is reduced to just 4KB which earlier was approximately 1.6GB(all 10 files), am I using bedtools merge correctly or is this an error??

bedtools • 1.2k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 22 months ago by Amisha • 0

1

Entering edit mode

Sorry, but this is impossible to answer without looking at your data. If you have covered virtually all parts of the genome, you could end up with features as big as a chromosome and thus very few resulting features to write to the output.

To test, you could run it with the parameters -c 4 -o collapse, which concatenates all feature names that have been merged into one output. This allows you to see which features have been merged into one (supposing that you have the feature names in the 4th column of your bed files).

ADD REPLY • link 22 months ago by Matthias Zepper 5.0k

0

Entering edit mode

This is how my data looks like after using cat command to combine all the 10 files

ADD REPLY • link updated 22 months ago by Ram 44k • written 22 months ago by Amisha • 0

0

Entering edit mode

To learn which features have been merged, create a version of your file that has every feature named individually (e.g. p53_243240)

sort -k1,1 -k2,2n < merge_p53.bed | awk 'BEGIN{OFS="\t"}{print $0,"p53_"NR}' > named.bed

Now merge this file version as described before and the names of all combined features will be retained, so you can check which mergers have been performed:

bedtools merge -c 4 -o collapse -i named.bed > final.bed

ADD REPLY • link 22 months ago by Matthias Zepper 5.0k