I'm combining multiple .bed files. For instance:
#file1
chr1 1 2 2
chr1 10 11 3
chr1 50 51 4
#file2
chr1 1 2 10
chr1 10 11 8
chr10 2 3 8
#file3
chr1 1 2 1
chr1 50 51 2
chr10 2 3 9
All files have the same structure for the 4 columns: chr start end filename
I want to combine all of them (have a lot of them!) such that the column names are kept for columns 4 and higher.
chrm start end file1 file2 file3
chr1 1 2 2 10 1
chr1 10 11 3 8 NA
chr1 50 51 4 NA 2
chr10 2 3 NA 2 9
I have tried this approach but the end result doesn't have column names and doesn't show the multiple matching columns (i.e. output only has 4 columns, with everything concatenated vertically).
I have also tried this approach which seems to work a bit better, but doesn't have column names either - and for a large number of files, it becomes very hard to assign them. It also seems that it fails for chr 10 - 22
as indicated here.
Could someone please help me get the last output? I would really appreciate it!
Thanks a lot in advance
Thanks so much @ATpoint, I'm almost there. However, I get only
1
andNA
inoutput.txt
, and don't know how to intersect it with the initial input files. Also, is the order of columns 4 and so on the same as that specified in-i [file1.bed, file2.bed, file3.bed]
? It doesn't seem that the output is sorted, so I just want to make sureWith your dummy data I get exactly the output you gave as the desired one.
I really don't understand why, but now I can't even get the
NA
. I have used\t
as delimiters in my input, and now I'm gettingchr1 2 5 1 0 0
this line does not make sense as it is not in your dummy input. Maybe you have left-over files from previous attempts to run the code snippet, make sure you clean up properly.