Dear Biostars,
I have a text file containing several rows and columns like this:
"Gene Name" "Gene Id" "description" "GO"
A 1 phosphatase GO:001256
B 2 synthesize GO:013154
C 3 methylase GO:000054
D 4 kinase GO:001254
E 5 oxigenase GO:001354
F 6 synthesize GO:001254
In addition, I have another text file just containing one column and several rows like this:
Gene Name
A
D
C
B
I need to extract the rows of file 1 that contain gene names listed in file 2.
Does anybody have any idea how to do that?
PS: I know how to do that by excel but it does not work with huge rows of information.
Thank you
Dear Lindenbaum,
The command worked perfectly. Thank you very much
thanks a lot, this is very useful also for my problem. Just a question, is it possible to include the header as well? adding --header is not working.
should work. check the input files order, check the header is the very first line of both files.
Actually you are right, it works, but the header is in the middle not on the top... is there a way to keep it on the top? thanks
use sed to change the header into someting that should be at the top after sort. Somtehing like 's/^chromosome/00000chromosome/'