Hi,
I am new to NGS data processing world. I performed read mapping with STAR and counting with HTSeq-Count using the GTF and chromosome files provided by the TopHat/Cufflinks group (modified Illumina iGenomes UCSC hg19 as found on http://cufflinks.cbcb.umd.edu/igenomes.html). I am now analyzing their contents using differential expression analysis R software (DESeq). The data frame I currently have has samples as columns and gene names (identical to those that are on the GeneCards annotation database, I believe) but I would like to annotate them further in R. Is there a way to do this? Here is an example of what I have:
XXXX0001 XXXX0002
A1BG 202.900518 3.744657e+01
A1BG-AS1 210.380899 19.96663e+01
A1CF 6.422366 9.354143e-01
A2M 112.642157 5.831635e+04
And I want to be able to retrieve information regarding each gene (e.g. A1BG -> Alpha-1-B Glycoprotein), as seen in http://www.genecards.org/cgi-bin/carddisp.pl?gene=A1BG. My guess is that as soon as I can determine what annotation convention these gene names follow, I should be able to easily convert between nomenclature (Entrez, HGNC, etc.) How do I go about breaking the ice?
Do let me know if I omitted any relevant information and thanks in advance.
Follow a bioconductor annotation tutorial and you will know :-) look at the Org.hs.eg.db package. You want to convert gene symbols (A1BG) to gene names (Alpha-1-B Glycoprotein)
You are correct! I absolutely overlooked that! Thank you very much @Irsan @SeanDavis