Hello, I have posted a question here but I found out that bedtools intersect was not sufficient for me. I need to make a tab separated table in format like below
I have one original bed file and 125 cell line bed files that I have to check overlaps and list the overlapping coordinates as a new file in the following format:
original bed file: an ordinary bed6 file
chromosome start end nametag score strand(+/-)
ex) chr1 1000001 1000095 RBP7 70.69 +
cellline bed file: also an ordinary bed6 or bed8 file
what i have to make:
I have to make a new tsv file that appends values to the right of each bed coordinate.
chromosome start end nametag score strand(+/-) cellline1 cellline2 ....etc..... cellline125 Total_celline_overlap_count
ex) chr1 1000001 1000095 RBP7 70.69 + chr1:1000001-1000004 None ...etc.... chr1:1000003-1000015 60
basically, I have to write down in every line 1) whether this original coordinate has an overlap at the given cell line and 2) if there is, write down the overlaps.
Therefore there are 6+125+1 columns in this file.
I found out that bedtools intersect does not provide this function and even if I could code in python to do this, the time consumption is very inefficient if i use the files generated in bedtools intersect. I used glob package to search the 125 files in my directory and I used csv package to make my custom tsv file but they were insufficient to do this job with bedtools intersect. Can there be any help for me? Thank you very much.
All you need is original bedtools intersect and linux
join
command.Algorithm goes like that:
I haven't used
join
in a long time and don't remember it's syntax, but it should do what you want