Handling of genes with multiple entries in ENSEMBL
0
0
Entering edit mode
4 weeks ago

Hi all,

I am sure it has been ask before but I couldn't find the posts related. I just did some RNA-Seq alignment and quantification using nf-core pipeline. I picked the star_salmon method, which aligns reads with STAR and quantifies with Salmon using the quantification mode. I am also using the gtf from GENCODE46. However, in the output, there are multiple genes with the same name (e.g. CRLF2). They have different ensembl ids and I think they are either on different sex chromosomes or gene duplicates.

My question is, is there a proper way of handling them when I am doing DGE? I just left them as they are for now and do the analysis anyway, but should I combine them? Is there a known list of genes with multiple entries in ensembl that can be used to combine them together?

rna-seq ensembl • 297 views
ADD COMMENT
0
Entering edit mode

Thanks a lot! The answers make sense. It seems there are people on both sides whether to merge it or not. I do see it makes more sense for my purpose and from a logical point of view to combine them. For example, I can see genes from PAR of the sex chromosomes splitting the expression level, which does not make sense as there are male and female in the cohort. I am wondering if it would be legit to combine them using tximport and a tx2gene that puts all the copies of genes under the same name together. Did Salmon correct the counts in a weird way for multimapped reads so that I shouldn't do it this way?

ADD REPLY

Login before adding your answer.

Traffic: 1858 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6