Hello!
I apologise if I am asking a basic question but I was wondering if someone here could clue me in about the role of ERCC spike-in for RNA-Seq?
I've been given a few sets of RNA-Seq data to align to a reference genome and do differential gene expression analysis. I was going to do this via mapping to the reference as opposed to de novo.
I noticed when blasting my over-represented sequences generated from FASTQC that in one sample, I had an over-represented sequence caused by the ERCC spike in. I've tried to understand the role of this in differential gene expression analysis but I'm struggling a bit.
My questions are:
1)Is it normal to present as an over-represented sequence in 1 sample only? 2) Do I need to remove it for mapping and differential expression analysis? 3) If I need to remove it, what's the best way of going about it?
Thank you very much in advance,
Gill
Hi Devon,
I'm using A. thaliana, I assume that will very much count as "common".
Thank you so much for your help,
Gill
I'd think so :)
Hi @Devon
I just noticed I have these genes in my raw read counts file
I read likely they are ER- series of probes correspond to specific transcripts within the ERCC RNA spike-in. I have also noticed one of them is among my differentially expressed genes, so should I remove them before any quantitative procedure?
Thank you
I'm not familiar with the ER- entries.