should non-protein-coding rna(e.g. lncRNA) be removed in RNA-Seq differential expression analysis
1
2
Entering edit mode
5.7 years ago
hellocita ▴ 40

Hi, I was doing RNA-Seq differential expression analysis, I wonder if some non-protein coding genes, such as lnc-rna or the Pseudogene, should be removed before analysis? Since the purpose is to reveal the expression difference of control and observe group, and to relate the difference with some known biological pathway/functions?

More information:The data I used was RNA-seq data (polyA enriched RNA with Illumina HiSeq). I mapped the reads to evidence-based annotation of the human genome (GRCh38) , version 24 (Ensembl 83), download from GENCODE. Finally, I got deferential expressed genes(DEGs). Then I am trying to converted these DEGs from ensemble id to hgnc symbol and search for their biological functions. However I found some of the genes, such as ENSG00000270000, ENSG00000257155, were lnc rna and do not have hgnc symbol. And I found they were not protein coding genes.

I wonder if I have done it wrong:(?

Thanks for your answer

RNA-Seq rna-seq • 2.4k views
ADD COMMENT
1
Entering edit mode
5.7 years ago
michael.ante ★ 3.9k

Hi Hellocita,

Both genes you mentioned have a A-rich region at the cDNA's 3' site (e.g. ENSG00000257155 / ENST00000548096). Therefore, the polyA fishing / enrichment can result in reads from these transcripts.

I guess you did nothing wrong.

Regarding of keeping these genes in your analysis: you can do both DE-analysis and see how strongly the influence of these genes to the variance/oversdispersion is. These genes seem to be detected due to off-target effects, which may follow different statistical processes than polyadenylated genes.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Hi Michael, I still do not fully understand why the off-target effect are related to the non-coding RNA i got, since the off-target effect are mostly related to siRNA. Do you mean that these genes are called DEGs because off-target effect during the experiment? Therefore one should use different statistics to double-check them?

ADD REPLY
0
Entering edit mode

Hi Hellocita,

I mean that the genes which are not protein coding genes, (especially the two you mentioned in your questions) are off-targets of the polyA-enrichment. Your oligo-dT primer has a certain length and might bind to intrinsic A-rich regions of certain transcripts. The enrichment of these non-target genes might follow a different statistical process than the enrichment of the polyadenylated genes.

Therefore, I'd double check the results of the DE-analysis.

Cheers,

Michael

ADD REPLY
0
Entering edit mode

I see, thank you Michael!

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6