RNAseq counts: transcript to gene
1
0
Entering edit mode
8 months ago
MAPK2 ▴ 50

Hi All,

I have RNAseq counts that I want to use for a differential expression analysis. However, there are approximately 1500 duplicated gene symbols with counts for each transcript (geneID). Can I simply collapse those to create unique rows for each geneSymbol?

For example,

counts <- counts %>%
  select(-c("geneID", "bioType", "annotationLevel")) %>%
  group_by(geneSymbol) %>%
  summarise(across(everything(), sum, na.rm = TRUE))

Thanks for your advice.

enter image description here

rnaseq • 344 views
ADD COMMENT
2
Entering edit mode
8 months ago
Ram 44k

No. Restrict yourself to canonical chromosomes and you won't run into this issue - as often at least. You cannot collapse counts that map to different loci to the same "gene" just because HGNC and ENSEMBL name things differently. ENSEMBL is more unique so you should ideally pick the entries you want to keep instead of aggregating anything.

ADD COMMENT

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6