getting annotation for a list of genes
3
0
Entering edit mode
8.7 years ago
zizigolu ★ 4.3k

hi, As you know cuffdiff gene_exp.diff contains a columns named gene like below

NAC001
DCL1
AT1G01073
IQD18
AT1G01115
GIF2
AT1G01180
MIR165A
AT1G01210,FKGP
AT1G01225,AT1G01230
AT1G01240

I have more than 16000 rows in this column, how I can get the annotation for each gene? I tried DAVID and netaffyx but they do some mess and I should find manually. do you know any solution please? thank you

RNA-Seq gene • 3.2k views
ADD COMMENT
0
Entering edit mode

What do you mean by "Annotation for each gene" ?

ADD REPLY
0
Entering edit mode

my adviser asked me to find the full annotation of each genes but how i can while they repeated many times

ADD REPLY
3
Entering edit mode
8.7 years ago
Ram 44k

"Annotation" is a broad term. To automate a process (so you can do something once and have a computer repeat it for everything else), you need to be precise - on what kind of information you need for each gene, and where you need it from. You'd then find out an API that the source of information offers and use this API to automate your query.

Your takeaway is: your first step needs to be figuring out what information you need. For example, in my analysis, all I needed was the name of each gene, so I used NCBI's eutils along with R.

On a side note, you've got to start thinking for yourself. The "My adviser would like me to" attitude will stifle your growth as a scientist.

ADD COMMENT
0
Entering edit mode

thank you Ram. in cuffdiff output for samples C1 and C2 I have 32394 gene names for example DCL1. I converted these names to AGI in biomart plant because my advisor asked me the AGI. but as in these 32394 genes some genes are repeated, biomart gave me only 20562 AGI . then I confused how to match 32394 gene names with 20562 AGI...impossible doing manually. actually I need biomart convert all of these 32394 gene names to AGI and ignore the repeats

ADD REPLY
1
Entering edit mode

Take a look at this thread: How To Convert Arabidopsis Gene Id To Ncbi Geneid, Ncbi-Gi Or Uniprot?

In general, you'd solve this problem by ensuring your tool output includes your input. If you were entering gene names as input, your output should have at least two attributes/columns: the gene name you gave as input as well as the corresponding AGI. This is assuming you're not passing in duplicates in the input. If you are passing duplicates, you should extract uniques because otherwise, you're just repeating an analysis whose result doesn't change.

ADD REPLY
0
Entering edit mode

thank you, biomart output has only one column (AGI). if the tool consider the repeats, the output is useless because I can't match 20562 AGI with 32394 gene names

ADD REPLY
0
Entering edit mode

Why not? I'm assuming Biomart has an R api, which means you can do with it as you please!

ADD REPLY
0
Entering edit mode

sorry, how I can get what I need from biomart?? may you tell me please

ADD REPLY
1
Entering edit mode

You've got to Google this stuff. 5 minutes of Googling gives me this: https://support.bioconductor.org/p/75125/#75126

I've never used Biomart or worked on plants; if I can find the way to search for plant genes using Biomart within minutes, you should be able to do that too. It might take longer initially, but as you keep at it and get better at using it, Google searches will become synonymous to your thought processes, in that you'll get efficient at them.

Please google first. Try it now: Google "biomart arabidopsis r" and see how many relevant results show up.

ADD REPLY
0
Entering edit mode

thank you dear Ram..

ADD REPLY
1
Entering edit mode
8.7 years ago
GenoMax 147k

Why not look them up from the Arabidopsis (looks like these are arabidopsis ID's) GTF file (which you must have)?

ADD COMMENT
0
Entering edit mode

i used igenome ensemble gtf but as you know for example i have 162006 genes in cuffdiff output while some of them because of comparing various samples in cuffdiff are repeated and not unique then when i convert the genes to AGI i have only 24256 genes that i can't match the result one by one manually. i pasted three rows of my cuffdiff output

test_id gene_id gene    locus   sample_1    sample_2    status  value_1 value_2 log2(fold_change)   test_stat   p_value q_value significant

XLOC_000001 XLOC_000001 NAC001  1:3630-5899 Pri-2h  unPri-2h    OK  8.16533 8.82461 0.112022    0.249649    0.797   0.999241    no

XLOC_000002 XLOC_000002 DCL1    1:23145-33153   Pri-2h  unPri-2h    OK  12.6595 15.1209 0.256324    0.0853285   0.73045 0.999241    no

if you were in my place how u would get the annotation for each genes because i have 24256 against 162006???

ADD REPLY
0
Entering edit mode

Look up bedtools or even grep. If this is the solution, your problem is automation, not information sourcing.

ADD REPLY
1
Entering edit mode
8.7 years ago
Daniel ★ 4.0k

Fereshteh, very specifically for this problem, I would recommend using Thalemine, from Araport (https://apps.araport.org/thalemine/begin.do). They are the main Arabidopsis annotation portal today, and have more updated gene models than the earlier TAIR database. They have a very powerful database interrogation website, and do a lot of statistical comparisons on gene lists that you input.

I recommend this specifically as an arabidopsis tool, but also it will avoid a lot of the command line steps you're trying to overcome.

One thing to note is that you might want to use their latest GTF for your mapping (rather than TAIR, if that is what you used originally), but in the first instance it shouldn't matter until you're up and running.

As a sidenote, I think I exchange the "gene_name" for "gene_id" in my gtf to avoid the XLOC problem, but I can't remember if that's exactly right at the moment, but you can do it.

ADD COMMENT
0
Entering edit mode

thank you, I went through your proposed link. I have more than 33000 AGI IDs from my RNA-seq output that I should get their annotation but in this link I could not upload all together. is there anyway to get the annotation(description) of these IDs together?

ADD REPLY

Login before adding your answer.

Traffic: 2654 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6