Question

Handling of genes with multiple entries in ENSEMBL

0

Entering edit mode

5 days ago

marionette.kent • 0

Hi all,

I am sure it has been ask before but I couldn't find the posts related. I just did some RNA-Seq alignment and quantification using nf-core pipeline. I picked the star_salmon method, which aligns reads with STAR and quantifies with Salmon using the quantification mode. I am also using the gtf from GENCODE46. However, in the output, there are multiple genes with the same name (e.g. CRLF2). They have different ensembl ids and I think they are either on different sex chromosomes or gene duplicates.

My question is, is there a proper way of handling them when I am doing DGE? I just left them as they are for now and do the analysis anyway, but should I combine them? Is there a known list of genes with multiple entries in ensembl that can be used to combine them together?

rna-seq ensembl • 242 views

ADD COMMENT • link 5 days ago by marionette.kent • 0

1

Entering edit mode

I am sure it has been ask before but I couldn't find the posts relate

Here are those threads:

How to deal with the case that one gene symbol matches multiple ensembl ids?
Different Ensembl Ids point to the same gene symbol.
Why am I getting different ensembl gene ids for a given gene symbol?

ADD REPLY • link 5 days ago by GenoMax 147k

0

Entering edit mode

Thanks a lot! The answers make sense. It seems there are people on both sides whether to merge it or not. I do see it makes more sense for my purpose and from a logical point of view to combine them. For example, I can see genes from PAR of the sex chromosomes splitting the expression level, which does not make sense as there are male and female in the cohort. I am wondering if it would be legit to combine them using tximport and a tx2gene that puts all the copies of genes under the same name together. Did Salmon correct the counts in a weird way for multimapped reads so that I shouldn't do it this way?

ADD REPLY • link 5 days ago by marionette.kent • 0