Estimated gene counts with tximport
2
1
Entering edit mode
3.0 years ago
bioinfo ▴ 150

Hello,

I am analyzing bulk RNA seq data and I used Kallisto to align my data to the transcriptome. Then, I used tximport to assign the gene names from ensembl to my counts. I am comparing the results I analyzed currently to some data that were run 4 years ago and I noticed that in the data from 4 years ago I ended up with an estimated gene counts table with ~50000 genes while now I have about half. Is it possible to see which version of the gene annotation I am using? Is it possible that the difference in the overall amount of genes could be that there was an update on the Ensembl dataset I am using?

I am using the Ensembl dataset using the code below:

mart <- biomaRt::useMart("ensembl", hsapiens_gene_ensembl, host = "uswest.ensembl.org", ensemblRedirect = FALSE) 

I also noticed that the estimated gene counts from 4 years ago contains thousand of gene names that are similar to AC253536.2 (they all start with AC) but the version I am using now does not output any gene names like this. Does anyone know why those were removed?

Thank you

RNA-seq ensembl tximport kallisto • 1.1k views
ADD COMMENT
2
Entering edit mode
3.0 years ago
Ben Moore ★ 2.4k

Ensembl retired clone-based gene names at the beginning of last year. More information can be found in the following blog post: https://www.ensembl.info/2021/03/15/retirement-of-clone-based-gene-names/

ADD COMMENT
1
Entering edit mode
3.0 years ago

The only way to find out what version of Ensembl you used for the quantification with Kalisto is to know the source of the transcript reference Fasta file that you used for your analysis.

However, if your old analysis had ~50k genes, and now you have ~20k genes, it seems likely that the old analysis used a gene set that included both coding and non-coding genes, and your new analysis used only coding genes.

ADD COMMENT

Login before adding your answer.

Traffic: 1726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6