I need a list of lines that do not have any overlap, and a list of lines that have overlap. We can have any number of genes in each line (separated by comma)! I like to do it in awk, but I'm not very familiar with all the commands.
L1 ycjM,ycjN,ycjO,ycjP,ycjQ,ycjR,ycjS,ycjT,ycjU,ycjV,ymjB
L2 ydaS,ydaT, ydaU,ydaV,ydaW,rzpR
L3 ompn
L4 ycjX,ycjF
L5 ycjX,ycjF,tyrR
.................
Non-overlapping lines: L1 L2 L3
Overlapping lines: L4 L5
Thank you :)
I modified it to this code, and it worked :)
How can I tell it to consider tab comma and space as separators? : '[, \t]' I guess this one doesn't consider white space.
Perhaps use
tr
orsed
to strip out tabs and spaces, e.g.: