Best pipeline for DE study
4
2
Entering edit mode
7.5 years ago

Hello,

I want to examine the differential expression of a specific gene, DEC1, between rna-seq samples of healthy participants and Multiple Sclerosis patients.

What is the best pipeline for this purpose?

I am thinking STAR --> cufflink --> DEseq2?

Thank you,

RNA-Seq rna-seq • 2.5k views
ADD COMMENT
3
Entering edit mode

Why are you doing RNASeq for a single gene study?

ADD REPLY
2
Entering edit mode

Skip cufflinks, it's a waste of time for you. Use either featureCounts or have STAR directly compute the counts instead.

ADD REPLY
0
Entering edit mode

the OP might need to rephrase the question. If you want to see if DEC1 gene is differentially expressed in healthy vs M.S patients you might simply dig into GEO or any papers that might have such a design and try to obtain the list they have for DE and check for the gene. Alternatively, if you want to see something on the lines of knockdown or knockout of that gene and then see the transcriptional commitment then there are a lot of pipelines available to do so. Please try to rephrase the question. You already have enough answers if you just want to see from your data between healthy/MS patients if DEC1 is a DEG or not.

ADD REPLY
3
Entering edit mode
7.5 years ago
John Ma ▴ 310

I don't know what do you mean by "best," but the easiest pipeline is probably STAR->Salmon->edgeR/DESeq2 through tximport.

ADD COMMENT
1
Entering edit mode

sorry am a bit confused, why do you need STAR aligner again if you use the Salmon for quantification be it alignment or no aligment.

ADD REPLY
2
Entering edit mode

The reason why I use the alignment mode is that I would have to map the RNA-seq data anyway for variation calling with GATK.

ADD REPLY
0
Entering edit mode

Yes that is fine, if you want to also make variant calling with your RNA-Seq data then STAR aligner is fine since GATK has a workflow with that.

ADD REPLY
1
Entering edit mode

Salmon has slightly more accurate output if you align with STAR first (Rob mentioned that in the original paper).

ADD REPLY
1
Entering edit mode

Now , I get it. I missed this part, probably in the paper but in Rob's blog it is clearly mentioned, however since the difference is not that large unless mapping with STAR is done with specific parameters, quasi-mapping mode is fine as well. It largely boills down to a test of alignment rate. But anyway if the user is fine to use both then obviously can. I prefer these days not generating bam unless required, however now one can also make .cram files which will occupy less space I reckon. Below is the text from Rob's blog.

"That’s a great question. The answer, unfortunately, isn’t 100% trivial. We’ve analyzed a lot of data with Salmon, and I can tell you that generally, we see the following: (1) the differences between using BAMs and using quasi-mapping is usually small — by design (of quasi-mapping) they yield very similar results (2) when there is a non-trivial difference (rarely), and we know the “truth”, quasi-mapping usually does better than alignment; yet (3) there is a small number of scenarios where we’ve seen alignment produce similar but slightly-better results. My general recommendation would be to go with Salmon’s built-in mapping (which is now quasi-mapping by default) unless you have a compelling reason not to (e.g. the alignments were created with special, very specific parameters, or you need to use a feature that is currently only supported in alignment-based mode, like producing a sampled one-alignment-per-read .BAM file)."

ADD REPLY
2
Entering edit mode
7.5 years ago

After your alignment, you could give my DEA.R script a try. It performs counting (using featureCounts) and differential expression analysis (using DESeq2, edgeR and Limma-voom). Please let me know if you need help.

ADD COMMENT
1
Entering edit mode
7.5 years ago
Joe 21k

This paper (https://www.ncbi.nlm.nih.gov/pubmed/27022035) describes a number of software packages for DGE analysis. DEseq2, edgeR, cuffdiff are quite popular, but we have found quite striking differences between software packages so which one is 'correct', or at least best for your situation, can depend on the experimental set up.

I have to echo @russhh's words though, if you're only interested in one gene (and you already know the gene of interest!) why are you doing RNAseq and not just qRT-PCR, for instance?

ADD COMMENT
0
Entering edit mode
7.5 years ago
hns ▴ 150

Hi Almsned, This is a very open ended question. What is the source of your data? Is it run through Illumina Hiseq/miseq?. Also if you want to use DESeq2 it might be worth using HTSEQ first after aligning it with STAR.

ADD COMMENT
0
Entering edit mode

I would personally see htseq-count as a bit passe now, expecially because it's slow and requires bam sorting.

ADD REPLY

Login before adding your answer.

Traffic: 2604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6