if we have "genes with the same HUGO ids but different Ensembl id" does it make sense to add up the raw count of those? ( for RNA expression or single cell analysis). Does it make sense to treat them as isoforms?
if we have "genes with the same HUGO ids but different Ensembl id" does it make sense to add up the raw count of those? ( for RNA expression or single cell analysis). Does it make sense to treat them as isoforms?
Hi,
Only last one is from chromosome 22 assembly. The first 2 are from the exception contigs (haplotype variant contigs). So there is a reason to assign different 'ENSG' names for them. I would say always use 'ENSG' ids as reference/indexing purpose, when you are working with ensembl annotation. In my opinion, if you are working with gene-level analysis, you always summarize based on 'ENSG' ids. For transcript-level/isoform-level analysis, you always summarize based on 'ENST' ids
Note:
ENSG: Genes ; ENST: TranscriptVariants/Isoforms ; ENSE: Exons ; ENSP: Proteins
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please provide some examples.
Maybe I can give an example. I have RNAseq samples from human and I have differential expression of gene NDUFA6 like below:
As you can see, there are different ENS IDs (2 alternative sequence alignments and the last one is reference gene at the Ensembl website) for the same gene name. Usually, I do not get such different FCs for the alternative versions of the same gene but now it gets tricky. Should I integrate all alignments of the same gene name into one gene expression (for all such cases) and make the DE analysis again? Or how should I interpret this results?