Hi, I was doing RNA-Seq differential expression analysis, I wonder if some non-protein coding genes, such as lnc-rna or the Pseudogene, should be removed before analysis? Since the purpose is to reveal the expression difference of control and observe group, and to relate the difference with some known biological pathway/functions?
More information:The data I used was RNA-seq data (polyA enriched RNA with Illumina HiSeq). I mapped the reads to evidence-based annotation of the human genome (GRCh38) , version 24 (Ensembl 83), download from GENCODE. Finally, I got deferential expressed genes(DEGs). Then I am trying to converted these DEGs from ensemble id to hgnc symbol and search for their biological functions. However I found some of the genes, such as ENSG00000270000, ENSG00000257155, were lnc rna and do not have hgnc symbol. And I found they were not protein coding genes.
I wonder if I have done it wrong:(?
Thanks for your answer
See also: If RNA-seq includes miRNA genes then why are there miRNA specific pipelines?