Question

How Do I Annotate Rna-Seq Count Tables That Were Processed Via Star->Htseq-Count Using Illumina Igenomes Ucsc Hg19?

1

Entering edit mode

11.5 years ago

ndejay ▴ 30

Hi,

I am new to NGS data processing world. I performed read mapping with STAR and counting with HTSeq-Count using the GTF and chromosome files provided by the TopHat/Cufflinks group (modified Illumina iGenomes UCSC hg19 as found on http://cufflinks.cbcb.umd.edu/igenomes.html). I am now analyzing their contents using differential expression analysis R software (DESeq). The data frame I currently have has samples as columns and gene names (identical to those that are on the GeneCards annotation database, I believe) but I would like to annotate them further in R. Is there a way to do this? Here is an example of what I have:

             XXXX0001      XXXX0002
A1BG       202.900518  3.744657e+01
A1BG-AS1   210.380899  19.96663e+01
A1CF         6.422366  9.354143e-01
A2M        112.642157  5.831635e+04

And I want to be able to retrieve information regarding each gene (e.g. A1BG -> Alpha-1-B Glycoprotein), as seen in http://www.genecards.org/cgi-bin/carddisp.pl?gene=A1BG. My guess is that as soon as I can determine what annotation convention these gene names follow, I should be able to easily convert between nomenclature (Entrez, HGNC, etc.) How do I go about breaking the ice?

Do let me know if I omitted any relevant information and thanks in advance.

rna-seq annotation hg19 ucsc r • 5.2k views

ADD COMMENT • link updated 11.5 years ago by Sean Davis 27k • written 11.5 years ago by ndejay ▴ 30

1

Entering edit mode

Follow a bioconductor annotation tutorial and you will know :-) look at the Org.hs.eg.db package. You want to convert gene symbols (A1BG) to gene names (Alpha-1-B Glycoprotein)

ADD REPLY • link 11.5 years ago by Irsan ★ 7.8k

0

Entering edit mode

You are correct! I absolutely overlooked that! Thank you very much @Irsan @SeanDavis

ADD REPLY • link 11.5 years ago by ndejay ▴ 30

score 0 · Answer 1 · 2013-11-27

0

Entering edit mode

11.5 years ago

Sean Davis 27k

Your gene names appear to be HGNC symbols. From there, you could use the org.Hs.eg.db package (as suggested by Irsan above) or biomaRt.

ADD COMMENT • link 11.5 years ago by Sean Davis 27k