Question

Best pipeline for DE study

2

Entering edit mode

7.5 years ago

almsned.fahad ▴ 20

Hello,

I want to examine the differential expression of a specific gene, DEC1, between rna-seq samples of healthy participants and Multiple Sclerosis patients.

What is the best pipeline for this purpose?

I am thinking STAR --> cufflink --> DEseq2?

Thank you,

RNA-Seq rna-seq • 2.5k views

ADD COMMENT • link updated 7.5 years ago by WouterDeCoster 47k • written 7.5 years ago by almsned.fahad ▴ 20

3

Entering edit mode

Why are you doing RNASeq for a single gene study?

ADD REPLY • link 7.5 years ago by russhh 5.7k

2

Entering edit mode

Skip cufflinks, it's a waste of time for you. Use either featureCounts or have STAR directly compute the counts instead.

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

0

Entering edit mode

the OP might need to rephrase the question. If you want to see if DEC1 gene is differentially expressed in healthy vs M.S patients you might simply dig into GEO or any papers that might have such a design and try to obtain the list they have for DE and check for the gene. Alternatively, if you want to see something on the lines of knockdown or knockout of that gene and then see the transcriptional commitment then there are a lot of pipelines available to do so. Please try to rephrase the question. You already have enough answers if you just want to see from your data between healthy/MS patients if DEC1 is a DEG or not.

ADD REPLY • link 7.5 years ago by ivivek_ngs ★ 5.2k

score 3 · Answer 1 · 2017-06-12

3

Entering edit mode

7.5 years ago

John Ma ▴ 310

I don't know what do you mean by "best," but the easiest pipeline is probably STAR->Salmon->edgeR/DESeq2 through tximport.

ADD COMMENT • link 7.5 years ago by John Ma ▴ 310

1

Entering edit mode

sorry am a bit confused, why do you need STAR aligner again if you use the Salmon for quantification be it alignment or no aligment.

ADD REPLY • link 7.5 years ago by ivivek_ngs ★ 5.2k

2

Entering edit mode

The reason why I use the alignment mode is that I would have to map the RNA-seq data anyway for variation calling with GATK.

ADD REPLY • link 7.5 years ago by John Ma ▴ 310

0

Entering edit mode

Yes that is fine, if you want to also make variant calling with your RNA-Seq data then STAR aligner is fine since GATK has a workflow with that.

ADD REPLY • link 7.5 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

Salmon has slightly more accurate output if you align with STAR first (Rob mentioned that in the original paper).

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

1

Entering edit mode

Now , I get it. I missed this part, probably in the paper but in Rob's blog it is clearly mentioned, however since the difference is not that large unless mapping with STAR is done with specific parameters, quasi-mapping mode is fine as well. It largely boills down to a test of alignment rate. But anyway if the user is fine to use both then obviously can. I prefer these days not generating bam unless required, however now one can also make .cram files which will occupy less space I reckon. Below is the text from Rob's blog.

"That’s a great question. The answer, unfortunately, isn’t 100% trivial. We’ve analyzed a lot of data with Salmon, and I can tell you that generally, we see the following: (1) the differences between using BAMs and using quasi-mapping is usually small — by design (of quasi-mapping) they yield very similar results (2) when there is a non-trivial difference (rarely), and we know the “truth”, quasi-mapping usually does better than alignment; yet (3) there is a small number of scenarios where we’ve seen alignment produce similar but slightly-better results. My general recommendation would be to go with Salmon’s built-in mapping (which is now quasi-mapping by default) unless you have a compelling reason not to (e.g. the alignments were created with special, very specific parameters, or you need to use a feature that is currently only supported in alignment-based mode, like producing a sampled one-alignment-per-read .BAM file)."

ADD REPLY • link 7.5 years ago by ivivek_ngs ★ 5.2k

score 2 · Answer 2 · 2017-06-12

2

Entering edit mode

7.5 years ago

WouterDeCoster 47k

After your alignment, you could give my DEA.R script a try. It performs counting (using featureCounts) and differential expression analysis (using DESeq2, edgeR and Limma-voom). Please let me know if you need help.

ADD COMMENT • link 7.5 years ago by WouterDeCoster 47k

score 1 · Answer 3 · 2017-06-12

This paper (https://www.ncbi.nlm.nih.gov/pubmed/27022035) describes a number of software packages for DGE analysis. DEseq2, edgeR, cuffdiff are quite popular, but we have found quite striking differences between software packages so which one is 'correct', or at least best for your situation, can depend on the experimental set up.

I have to echo @russhh's words though, if you're only interested in one gene (and you already know the gene of interest!) why are you doing RNAseq and not just qRT-PCR, for instance?

score 0 · Answer 4 · 2017-06-12

0

Entering edit mode

7.5 years ago

hns ▴ 150

Hi Almsned, This is a very open ended question. What is the source of your data? Is it run through Illumina Hiseq/miseq?. Also if you want to use DESeq2 it might be worth using HTSEQ first after aligning it with STAR.

ADD COMMENT • link 7.5 years ago by hns ▴ 150

0

Entering edit mode

I would personally see htseq-count as a bit passe now, expecially because it's slow and requires bam sorting.

ADD REPLY • link 7.5 years ago by John Ma ▴ 310