Question

Find cooccurences of factors across multiple files

0

Entering edit mode

4.6 years ago

tirichl ▴ 20

Hey,

I have several hundred files that look like this:

a.file
#genomic positions
1, 3, 4, 9, 10
#factors
a, b, d, g

Each file holds multiple genomic positions (numbers) and factors (characters). I want to investigate, whether there are genomic positions that frequently co-occur with factors across all files, but I have no idea on how to approach this. Can someone point me into the right direction? Is there a tool or a library that might help? Thank you!

ChIP-Seq sequencing gene • 685 views

ADD COMMENT • link updated 4.6 years ago by jordi.planells ▴ 480 • written 4.6 years ago by tirichl ▴ 20

score 0 · Answer 1 · 2020-11-25

bedtools intersect accepts multiple file to be intersected. Have you tried with it? You can report the number of occurrences with -c flag.

bedtools intersect -c -a your_file -b factor1 factor2 factorN

Then you could print the lines with more than X occurrences with awk.

awk 'BEGIN{FS="\t";OFS="\t"}{if($4 > X) print $0}'

Hope it helps!