I want to see which genes in archloro.txt match both lines of ArabidopsisParalogue.txt. I tried grep -f but that would just find the genes in both files have in common and not 2 genes that are in the same line. Any ideas on how to do this?
Basically i want to see how many and which rows in ArarbidopsisParalogue.txt match with 2 genes from archloro.txt.
head archloro.txt
AT1G01080
AT1G01090
AT1G01100
AT1G01110
AT1G01250
AT1G01550
AT1G01690
AT1G01730
AT1G01860
AT1G01950
AT1G02060
AT1G21400
AT1G24180
AT1G59900
AT1G17480
AT5G09300
AT5G34780
AT5G50250
head ArabidopsisParalogue.txt
AT1G01080 AT5G50250
AT1G01090 AT1G21400
AT1G01090 AT1G24180
AT1G01090 AT1G59900
AT1G01090 AT5G09300
AT1G01090 AT5G34780
AT1G01100 AT4G00810
AT1G01100 AT5G24510
AT1G01100 AT5G47700
AT1G01110 AT1G17480
Expected output
AT1G01080 AT5G50250
AT1G01090 AT1G21400
AT1G01090 AT1G24180
AT1G01090 AT1G59900
AT1G01090 AT5G09300
AT1G01090 AT5G34780
AT1G01100 AT1G17480
It's not clear to me from your question and example data how a gene ID from
archloro.txt
can match both IDs in one row ofArabidopsisParalogue.txt
, when the IDs in the latter file are not identical. Could you clarify the question with some expected output?due to formatting issues, I could include head ArabidopsisParalogue.txt. There are two columns of genes.
These commands are not working for me!
Unless you show data from both files, one can't give any suggestions.
BTW, please move this
answer
tocomment
as it is not an answer.The last line of your expected output is incorrect because AT1G17480 is not present in archloro.txt (in the data that you've shown)
and AT5G34780 is present twice in first file.
Yep (and good to see you again).
Your solution also works, it seems. bweil2, if you could devote a few moments more to looking at the other 2 new answers to see if they work (and Accept them also, if so), then that would be great.
thanks and you seem to be in good spirits always @kevin