Concatenate information columns from multiple identical bed files.
3
0
Entering edit mode
2.3 years ago

I have multiple (n) bed files with identical intervals and one unique information column. I would like to combine these into one bed file with n information columns.

file1.bed:

chr    start     stop     col.foo
chr1   1         10       foo
chr1   20        30       foo

file2.bed:

chr    start     stop     col.bar
chr1   1         10       bar
chr1   20        30       bar

desired_output.bed:

chr    start     stop     col.foo    col.bar
chr1   1         10       foo        bar
chr1   20        30       foo        bar

I tried using bedtools merge, but don't see an obvious solution. This only keeps the information column of one of the files:

bedtools merge -c 4 -o collapse -delim = "\t" -i file1.bed -i file2.bed
bedtools bed genomics • 1.0k views
ADD COMMENT
2
Entering edit mode
2.3 years ago
Trivas ★ 1.8k

If you're comfortable using R, you can load the .bed files as tab delimited files, then full_join using chr, start, and stop as your join columns.

e.g

full_join(df_file1, df_file2, by = c("chr","start","stop"))
ADD COMMENT
0
Entering edit mode
2.2 years ago
Malcolm.Cook ★ 1.5k

Since the files are in the same order and same intervals you can simply cut the information columns from each of the files and paste them onto the first threee locus columns from the first file.

In Linux there are utilities cut and paste.

Something like:

paste <(cut -f 1-3 file1.bed) <(cut -f 4 *.bed)
ADD COMMENT
0
Entering edit mode
2.2 years ago

I ended up figuring out a way to use bedtools merge:

sort -k1,1 -k2,2n -s file*.bed | mergeBed -c 4 -o collapse -delim DELIM | sed 's/DELIM/\t/g'
ADD COMMENT

Login before adding your answer.

Traffic: 1235 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6