Hi,
Dear Community,
I have a file like:
CHR POS AACHANGE EFFECT
chr1 11779914 R88W _
chr1 11779915 R88Q _
chr1 19694500 R478 H
chr1 19694501 R478H H
chr1 23875429 Q63 R
chr1 23875430 Q63R R
chr1 84574386 R10 L
chr1 84574387 R10L L
chr1 89186388 E551G R
chr1 89186389 E551K R
chr19 57506385 A97E E
chr19 57506386 A97 E
As you can see, each row is paired on the basis of its chromosomal position. First and second entry (11779914 and 11779915), third and fourth entry (19694500 and 19694501) and so on...,,,,!!!! I want to compare column three and four. If any of the entry in column four matches with any of the entry in column in three, than remove both paired lines. So, I want output something like this:
CHR POS AACHANGE EFFECT
chr1 11779914 R88W _
chr1 11779915 R88Q _
chr1 89186388 E551G R
chr1 89186389 E551K R
Since entry three and four of column four has H that matched with the column three (entry four of column three - R478H), both lines are removed (entry three and four).
Any command line solution, or python etc is appreciated. Mostly comparing columns of two different files are available but of the same file paired rows, I failed to find my solution.
Thanks,
Waqas.
Hi Kevin,
You exactly understand the situation. I tried with the dummy file and then with my original file. I got exactly the output which I want. I cross-checked manually as well.
Great Thanks,
Waqas.
Great - happy to help Waqas.
Kevin