Can anyone please help me understand how to perform ERCC spike-in analysis and interpret the results? I know there are some packages to do it, I used erccdashboard but I keep getting a label error, so I was thinking if I could do it on Python without using any ercc analysis packages. So, basically I want to understand the steps in analyzing ERCC.
Here is the experiment design: Time course, Treatment vs control, 6 biological replicates (no technical replicates), Baseline at Time 0, Treated and control RNA samples collected 2, 4, 6, 8 hours, Same amount (100 ng input; 2 ul of 1:1000 dil) of ERCC mix 1 was spiked-in in all samples.
I appended the ERCC fasta and gtf file to human .fa and .gtf files respectively, and aligned reads using STAR, got the read counts using --quantMode GeneCounts option.
I think the first thing to do is to plot log input concentration and log read counts for ERCC counts? Do I also find ratio between samples? Since our samples include the same amount of ERCC mix 1, the fold change between the two samples will always be 1 for any ercc transcript given that the experiment was done well. This does not tell me much about how to use ERCC for quality control, besides checking whether or not the sequencing was done right.
What do I do next to find out if I should exclude any of my RNA-seq counts below or above certain thresholds?
Thanks
Thank you, Charles, for your reply! I had read that post of yours earlier, and I am actually taking those tips into consideration for my DEG analysis. Since I already have ERCC spiked-in data, I also wanted to compute DEG with ERCC into account and see what the differences would be in final results.
I think that is fair - as long as you are testing the effect with and without extra normalization (and seeing if there any fundamental changes in functional enrichment that would or would not be reasonable with what you know about the rest of your experiment), I think that is what is most important.
Good luck!