Question

RNAseq counts: transcript to gene

0

Entering edit mode

8 months ago

MAPK2 ▴ 50

Hi All,

I have RNAseq counts that I want to use for a differential expression analysis. However, there are approximately 1500 duplicated gene symbols with counts for each transcript (geneID). Can I simply collapse those to create unique rows for each geneSymbol?

For example,

counts <- counts %>%
  select(-c("geneID", "bioType", "annotationLevel")) %>%
  group_by(geneSymbol) %>%
  summarise(across(everything(), sum, na.rm = TRUE))

Thanks for your advice.

enter image description here

rnaseq • 344 views

ADD COMMENT • link updated 8 months ago by Ram 44k • written 8 months ago by MAPK2 ▴ 50

score 2 · Accepted Answer · 2024-03-05

No. Restrict yourself to canonical chromosomes and you won't run into this issue - as often at least. You cannot collapse counts that map to different loci to the same "gene" just because HGNC and ENSEMBL name things differently. ENSEMBL is more unique so you should ideally pick the entries you want to keep instead of aggregating anything.