Hey all,
Without going into too much background, I just joined up with a lab as a bioinformatics intern while I'm completing my masters degree in the field. The lab has data from an RNA-seq they outsourced, but the only problem is that the only data they have is preprocessed from the company that did the sequencing: filtering the reads, aligning them, and putting the aligned reads through RSEM. I currently have output from RSEM for each of the four samples consisting of: gene id, transcript id(s), length, expected count, and FPKM. Some of you may recognize this situation from my question yesterday, encouraging me to get the raw reads from the company, however, the issue I wasn't even thinking of at the is that because this isn't a dedicated bioinformatics lab I've joined with, there is no server I can use for computational heavy programs in the NGS analysis, and I don't think my laptop has the power or space for it.
However, I found this article (https://biowize.wordpress.com/2014/03/04/understanding-rsem-raw-read-counts-vs-expected-counts/) that talks about how expected read counts can be better than raw read counts when analyzing differential expression using EBSeq; it's just one guy's opinion, and it's from 2014, so it may be wrong or outdated, but I thought I'd give it a try since I have the expected counts.
However, I have just a couple of questions about running EBSeq that I can't find the answers to:
1: In the output RSEM files I have, not all genes are represented in each, about 80% of them are, but for the ones that aren't, should I remove them before analysis with EBSeq? It runs when I do, but I'm not sure if it is correct.
2: How do I know which normalization factor to use when running EBSeq? This is more of a conceptual question rather than a technical question.
Thanks!
I never used EBSeq, so can't advice here.
However, starting from raw reads, your laptop will be fine with a ( Salmon | Kallisto ) + Sleuth workflow, or ( Salmon | Kallisto ) + tximport + ( DESeq2 | edgeR ) workflow.