gene annotation
0
0
Entering edit mode
18 months ago
ahmad • 0

Good morning;

How gene symbol shall I use ??

After I got count table from alignment process based on hg38 version 24 (https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/gencode.v24.primary_assembly.annotation.gtf.gz), I have the gene symbols already in count table. For Deseq2 analysis purpose I remove it, and again I merge it with with result table that is generated from result() function of Deseq2 for further analysis (gene set functional analysis).

my question:

Shall I use gene symbols that already come with GTF file? Or is it needed to mapping with gene annotation using org.Hs.eg.db. Especially I need this step once using clusterprofile enrich functions.

Note: once I check on symbols between my list and symbols org.Hs.eg.db there were differences:

gene_names <-unique(dfRes01_2D$gene_name) // gene names get from  my countable 
missing_genes <- gene_names[!(gene_names %in% keys(org.Hs.eg.db, keytype="SYMBOL"))] 
missing_genes # has about 11530 genes

Thanks in advance

gtf gene-annotation • 1.3k views
ADD COMMENT
0
Entering edit mode

please, show us examples of differences

ADD REPLY
0
Entering edit mode

Hello, the following are some of them:

 "PROSC"             "C14orf37"          "RP1-206D15.6"     
"CTD-2368P22.1"     "RP5-1039K5.12"     "HIST1H2AB"        
"RP11-292E2.2"      "RP11-67L2.2"       "FAM208A"
 "ERBB2IP"           "LHFP"              "HN1L"             
 "SGOL1"             "TARS"              "CTD-21"
ADD REPLY
0
Entering edit mode

the are 11530

ADD REPLY
0
Entering edit mode

iit's in the GTF ? but what do they look like in the other source ?

ADD REPLY
0
Entering edit mode

Yes, these are what are find in GTF file that I installed from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/ and these are not exist in org.Hs.eg.db and also in release_43 ( that last version of GTF from Gencode_human)

note:
The gene_id that is belong to symbol 'FAM208A' for example in GTF file(version 24) is 'ENSG00000163946.13'. and once I check what is the name of symbol for that gene_id( without using version .13 ) in org.Hs.eg.db the symbol TASOR

ADD REPLY
1
Entering edit mode

Gencode release 24 is from 2015. Not sure why you chose to use a GTF file that is several years old. Current release is 43 as of today.

If you check the current assignment of ENSG00000163946 then it is for TASOR gene: http://www.ensembl.org/Multi/Search/Results?q=ENSG00000163946;site=ensembl_all;page=1 org.Hs.eg.db is showing that.

ADD REPLY
0
Entering edit mode

You are right and the ENSG00000163946 in GTF version43 is (ENSG00000163946 .14). In our case the previous studies( papers) were based on version 24, and that is way required from me to use version 24. But once I have started with functional analysis( GO analysis) using Clusterprofile package, I have faced the gaps. Now I do not know what i have to do.

Thank you

ADD REPLY
0
Entering edit mode

In our case the previous studies( papers) were based on version 24, and that is way required from me to use version 24

There is no reason to use old annotation unless you are just trying to reproduce old results.

If you are doing fresh analysis beyond what is published then you should stick with current genome build/annotations otherwise you are going to run into these kind of issues.

ADD REPLY
0
Entering edit mode

Thank you for your help and quick replay. But also I got gap between GTF 43 and org.Hs.eg.db.

Again thank you so much

The below list are some of gene that are in 43 GTF but not in org.Hs.eg.db:

gene_id                          ENSEMBL_NoVersion                     gene_name
ENSG00000002079.14  ENSG00000002079 MYH16
ENSG00000064489.23  ENSG00000064489 BORCS8-MEF2B
ENSG00000073905.8   ENSG00000073905 VDAC1P1
ENSG00000083622.8   ENSG00000083622 ENSG00000083622
ENSG00000093100.13  ENSG00000093100 ENSG00000093100
ENSG00000100068.15  ENSG00000100068 LRP5L
ENSG00000100101.15  ENSG00000100101 ENSG00000100101
ENSG00000103200.6   ENSG00000103200 ENSG00000103200
ENSG00000103832.10  ENSG00000103832 GOLGA8UP
ENSG00000105501.13  ENSG00000105501 SIGLEC5

The below are some of in org.Hs.eg.db but not in GTF43

ENSEMBL                SYMBOL
ENSG00000231129 ABCF1
ENSG00000236342 ABCF1
ENSG00000206490 ABCF1
ENSG00000276016 ABR
ENSG00000278741 ABR
ENSG00000275176 ACACA
ENSG00000283539 ACR
ENSG00000282844 ACTN4
ENSG00000280759 AP2A2
ENSG00000281385 AP2A2
ENSG00000231268 AGER
ADD REPLY

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6