How to extract gene IDs from a tabular file based on a list of locus tags?
2
0
Entering edit mode
6.3 years ago
majeedaasim ▴ 60

I have a file like:

GeneID Locus tag Protein name
839580 AT1G01010 NAC domain containing protein 1
839569 AT1G01020 ARV1 family protein
839569 AT1G01020 ARV1 family protein
839569 AT1G01020 ARV1 family protein

I also have a list of locus tags (arround 5000) which I want to extract from the entire file.

e.g if I want to extract gene Id and protein name of

AT1G01010

AT1G01020

I should get

839580 NAC domain containing protein 1

839569 ARV1 family protein

R • 2.0k views
ADD COMMENT
1
Entering edit mode
6.3 years ago
Prakash ★ 2.2k

If it is a data frame , you can simply use merge function

merge(file1 , file2, by.x = "Locus",by.y = "Locus",all.x  = TRUE)
ADD COMMENT
0
Entering edit mode
6.3 years ago

One approach using Unix tools:

$ grep -Fwf locusTags.txt geneAnnotations.txt > answer.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6