Merge Bed files but keep optional columns
2
3
Entering edit mode
8.5 years ago

I have 2 Bed files that i need to merge. I have done so using the following command and it has worked fine.

 bedtools merge -c 1 -o count -i ~/Temp_output/peak_count_analysis/Klf3/MACS2/Klf3_ChIP_summits200.bed -i ~/Temp_output/peak_count_analysis/Klf1/MACS2/Klf1_K1ER_pool_summits200.bed > ~/Temp_output/peak_count_analysis/merged/mergedsummits200.bed 

However, this results in lost of a lot of useful information for downstream analysis.

Is there a way to merge files and keep the information for those regions? If so, can the merged regions just combine the information into the one column?

Example:

File 1:

chr1   5251857 5252058 Klf1_K1ER_pool_peak_1   13.22945

chr1    9770501 9770702 Klf1_K1ER_pool_peak_2   6.61350

chr1    9773611 9773812 Klf1_K1ER_pool_peak_3   2.72345

chr1    9774350 9774551 Klf1_K1ER_pool_peak_4   40.70829

chr1    9815269 9815470 Klf1_K1ER_pool_peak_5   22.47497

...

File 2:

chr1   6204622 6204823 Klf3_ChIP_peak_1    0.88333

chr1    7078830 7079031 Klf3_ChIP_peak_2    19.91139

chr1    7388243 7388444 Klf3_ChIP_peak_3    15.39874

chr1    9690724 9690925 Klf3_ChIP_peak_4    7.17301

chr1    9738376 9738577 Klf3_ChIP_peak_5    8.30267

...

Hopeful outcome:

chr1   6204622 6204823 2   Klf3_ChIP_peak_1; Klf1_K1ER_pool_peak_5

chr1    9770501 9770702 1   Klf1_K1ER_pool_peak_2

...

I need to eventually convert the merged file to a gtf and would like to retain peak information.

Thanks in advance.

ChIP-Seq bedtools merge • 10k views
ADD COMMENT
4
Entering edit mode
8.5 years ago

Just give to mergeBed more columns to group via the -c and -o options. For example:

cat file1.bed 
chr1    5251857 5252058 Klf1_K1ER_pool_peak_1   13.22945
chr1    9770501 9770702 Klf1_K1ER_pool_peak_2   6.61350
chr1    9773611 9773812 Klf1_K1ER_pool_peak_3   2.72345
chr1    9774350 9774551 Klf1_K1ER_pool_peak_4   40.70829
chr1    9815269 9815470 Klf1_K1ER_pool_peak_5   22.47497

cat file2.bed 
chr1    5251857 5252058 Klf1_K1ER_pool_peak_6   13.22945
chr1    9770501 9770702 Klf1_K1ER_pool_peak_7   6.61350
chr1    9773611 9773812 Klf1_K1ER_pool_peak_8   2.72345
chr1    9774350 9774551 Klf1_K1ER_pool_peak_9   40.70829
chr1    9815269 9815470 Klf1_K1ER_pool_peak_10  22.47497

Retain information on column 4 and 5:

sort -k1,1 -k2,2n file1.bed file2.bed \
| mergeBed -c 4,4,5 -o count,collapse,collapse
chr1    5251857 5252058 2   Klf1_K1ER_pool_peak_1,Klf1_K1ER_pool_peak_6 13.22945,13.22945
chr1    9770501 9770702 2   Klf1_K1ER_pool_peak_2,Klf1_K1ER_pool_peak_7 6.61350,6.61350
chr1    9773611 9773812 2   Klf1_K1ER_pool_peak_3,Klf1_K1ER_pool_peak_8 2.72345,2.72345
chr1    9774350 9774551 2   Klf1_K1ER_pool_peak_4,Klf1_K1ER_pool_peak_9 40.70829,40.70829
chr1    9815269 9815470 2   Klf1_K1ER_pool_peak_10,Klf1_K1ER_pool_peak_5    22.47497,22.47497
ADD COMMENT
0
Entering edit mode

Ah. that's what i thought i might need to do, but wasn't sure how to do it. thank you.

ADD REPLY
0
Entering edit mode

ok. i see what you are doing there.

ADD REPLY
0
Entering edit mode

Hi again,

For some reason, when i sort it with the options you give (specifically the -k2,2n) bedtools merge cannot open the file.

I don't understand this.

to actually get it sorted in the correct order I need the options: -k1,1V -k2,2n (your options sorte Chr 1, Chr 10, Chr 11..). If i do it with just -k1,V1, bedtools and open it but it's not sorted by start properly, but it can at least open it.

annoyingly, once i get it sorted properly, it won't open it but i can't tell what is wrong with it to fix it!

any help is greatly appreciated.

ADD REPLY

Login before adding your answer.

Traffic: 922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6