Good morning;
How gene symbol shall I use ??
After I got count table from alignment process based on hg38 version 24 (https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/gencode.v24.primary_assembly.annotation.gtf.gz), I have the gene symbols already in count table. For Deseq2 analysis purpose I remove it, and again I merge it with with result table that is generated from result() function of Deseq2 for further analysis (gene set functional analysis).
my question:
Shall I use gene symbols that already come with GTF file? Or is it needed to mapping with gene annotation using org.Hs.eg.db
. Especially I need this step once using clusterprofile enrich functions.
Note: once I check on symbols between my list and symbols org.Hs.eg.db
there were differences:
gene_names <-unique(dfRes01_2D$gene_name) // gene names get from my countable
missing_genes <- gene_names[!(gene_names %in% keys(org.Hs.eg.db, keytype="SYMBOL"))]
missing_genes # has about 11530 genes
Thanks in advance
please, show us examples of differences
Hello, the following are some of them:
the are 11530
iit's in the GTF ? but what do they look like in the other source ?
Yes, these are what are find in GTF file that I installed from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/ and these are not exist in org.Hs.eg.db and also in release_43 ( that last version of GTF from Gencode_human)
note:
The gene_id that is belong to symbol 'FAM208A' for example in GTF file(version 24) is 'ENSG00000163946.13'. and once I check what is the name of symbol for that gene_id( without using version .13 ) in org.Hs.eg.db the symbol TASOR
Gencode release 24
is from 2015. Not sure why you chose to use a GTF file that is several years old. Current release is 43 as of today.If you check the current assignment of ENSG00000163946 then it is for TASOR gene: http://www.ensembl.org/Multi/Search/Results?q=ENSG00000163946;site=ensembl_all;page=1 org.Hs.eg.db is showing that.
You are right and the ENSG00000163946 in GTF version43 is (ENSG00000163946 .14). In our case the previous studies( papers) were based on version 24, and that is way required from me to use version 24. But once I have started with functional analysis( GO analysis) using Clusterprofile package, I have faced the gaps. Now I do not know what i have to do.
Thank you
There is no reason to use old annotation unless you are just trying to reproduce old results.
If you are doing fresh analysis beyond what is published then you should stick with current genome build/annotations otherwise you are going to run into these kind of issues.
Thank you for your help and quick replay. But also I got gap between GTF 43 and
org.Hs.eg.db
.Again thank you so much
The below list are some of gene that are in 43 GTF but not in
org.Hs.eg.db
:The below are some of in
org.Hs.eg.db
but not in GTF43