Question

Gene ID problems when mapping and counting

0

Entering edit mode

4.2 years ago

hellocita ▴ 40

Hi everyone! please help me, I am new in this field, I have question in mapping and counting! I was mapping an mouse RNA-seq data to genome (mm10) using STAR, to save time, I did not build STAR index by myself but use the mm10 genome index built by my colleague, and then I use the genome gtf file downloaded from http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.knownGene.gtf, to do counting, the gtf file seems to be in ensembl ID. However, in the counting result, the gene ID seems to be uniprot ID, not ensembl ID nor gene symbol, I guess if I have done any procedure wrong? If not, how can I get gene symbol from the ID?

Here is some information:

head mm10.knownGene.gtf 

chr1    knownGene   transcript  3073253 3074322 .   +   .   gene_id "ENSMUST00000193812.1"; transcript_id "ENSMUST00000193812.1"; 
chr1    knownGene   exon    3073253 3074322 .   +   .   gene_id "ENSMUST00000193812.1"; transcript_id "ENSMUST00000193812.1"; exon_number "1"; exon_id "ENSMUST00000193812.1.1";
chr1    knownGene   transcript  3102016 3102125 .   +   .   gene_id "ENSMUST00000082908.1"; transcript_id "ENSMUST00000082908.1"; 

head genes
A0A023T778
A0A075B5I2
A0A075B5J3
A0A075B5J4
A0A075B5K6
A0A075B5L1
A0A075B5L2
A0A075B5L3
A0A075B5L7
A0A075B5L8
A0A075B5M4
A0A075B5P0
A0A075B5P1
A0A075B5P4
A0A075B5P6
A0A075B5P8
A0A075B5P9
A0A075B5Q0

Please help me!

RNA-Seq • 1.8k views

ADD COMMENT • link 4.2 years ago by hellocita ▴ 40

score 0 · Answer 1 · 2021-03-19

0

Entering edit mode

4.2 years ago

rpolicastro 13k

To avoid problems you should use a consistent set of assembly/annotation/transcriptome files throughout analysis. I would recommend you go back and build the STAR index using an annotation and assembly file from the same source and version.

Alternatively if you just need counts you could skip mapping and feature counting and quantify directly from the fastq files using Salmon, which will be quicker and generally provide better results than alignment + feature counting.

ADD COMMENT • link 4.2 years ago by rpolicastro 13k

0

Entering edit mode

Thank you rpolicastro!! I would like to perform GO enrichment analysis in following analysis, can I do that after using Salmon for counting?

ADD REPLY • link 4.2 years ago by hellocita ▴ 40

4

Entering edit mode

A pipeline we use often is: Salmon -> tximeta -> DESeq2 / edgeR / limma -> goseq.

Tximeta will also automatically handle gene IDs for you (gene-level summarization and ID mapping).

ADD REPLY • link 4.2 years ago by Michael Love ★ 2.6k