Entering edit mode
3.6 years ago
K
▴
10
Hi,
I have 10 files and I am interested to find all the common coordinates occurred in all to manipulate a synteny plot. Is there a way either script or program which I can use?
I know comm and diff command for two files.
Thank you
Couldn't you use a bash for loop and comm -3 to spit out the common portion between each file and the last comm -3 result, to get what must be common between all the files? Any element in common in all the files will always survive the comparison. Assuming all the files are sorted (you can add a sort step), something like:
Otherwise I would imagine a similar strategy with bedtools intersect might work.
I am looking something like venn diagram analysis where I can see unique coordinates in each scaffold file and common in each subset. I saw some online tools where the datasets in limited to 3 input files but I want to know interactions between 10 scaffolds?
if your files are tsv file and can make one file as reference, you can try:
download tsv-utils from here and 2,3 are common columns among the files. Filter.tsv is reference file and rest of the files are files to be joined. Always post representative, example data if you want forum members to understand your query better.
sorted files 1-10
file 1: 53021-53613
437126-437761
838835-839317
1228237-1233121
1782778-1782914
file 2: 23181-23640
53021-53613
70544-71129
985194-988644
1017828-1018850
file 3: 41052-42706
44618-46770
53136-55912
55909-59236
70402-71600
70544-71129
1228237-1233121
common between: file1 > file2 : 53021-53613
file 1 > file3 : 1228237-1233121
file 2 > file3 : 70544-71129