Question

miRNA differential expression

4

Entering edit mode

6.4 years ago

georgians ▴ 50

Hello,

I am doing miRNA differential expression analysis from the Read counts, but I have few questions: 1. Which one is better DEseq2 and EdgeR 2. I have total 452 samples with 4 condition and each condition have different sample no. How to analyze this type of data. 3. I want to check the expression of miRNAs for the individual condition, I mean which miRNAs are Down or UP-regulated in which condition.

Please help me in this regard.

Thanks...

RNA-Seq miRNA DE DESeq2 • 10.0k views

ADD COMMENT • link 6.4 years ago by georgians ▴ 50

0

Entering edit mode

Hello All, Thank you for your kind reply and suggestions. Please focus on my second problem: I have total 452 samples with 4 condition and each condition have different sample no. How to analyze this type of data. I checked DeSeq2 and EdgeR tutorials, mostly in these tutorials they did only DE analysis only for control and treatment condition. But In my case, I have 4 conditions with different sample no. eg.: Condition 1: 139 samples Condition 2: 109 samples Condition 3: 89 samples Condition 4: 80 samples I hope you can understand my problem. Thanks!!

ADD REPLY • link 6.4 years ago by georgians ▴ 50

0

Entering edit mode

First, be sure to add your emphasis to the second issue as a comment rather than an answer to your post.

DESeq2 does accommodate analysis with different samples numbers, thus I am not sure why this is a problem. It seems to me that the sample number is not the issue your are pointing to, but rather the fact that you have 4 conditions. The way you will analyze this will depend upon the question of you experimental problem. You could analyze all the groups together (accounting for multiple groups in the design section of DESeq2) and use contrast to the comparisons that you are interested (more on this here). Alternatively, if your conditions represent something like different time points, you should consider reading that section on the DEseq2 vignette.

ADD REPLY • link 6.4 years ago by lshepard ▴ 480

score 6 · Answer 1 · 2019-03-05

6

Entering edit mode

6.4 years ago

ATpoint 89k

Please browse this forum using the search function and scan the literature for miRNA pipelines and comparisons between edgeR and DESeq2. Short answer, each of the two is valid and established and there is no simple "better" or "worse". Be sure to use proper alignment settings. Use something like bowtie and align against a database such as miRBase, not against the genome and be sure to properly trim your reads. As said, read previous posts, this has been discussed many times before.

ADD COMMENT • link 6.4 years ago by ATpoint 89k

0

Entering edit mode

"align against a database such as miRBase, not against the genome"

Any specific reason for this? We can align to reference genome (eg with Hisat2) and then extract only counts of miRNA (eg featureCounts) based on GTF annotation files (eg Ensembl).

ADD REPLY • link 5.1 years ago by Arindam Ghosh ▴ 550

1

Entering edit mode

miRNAs are short (~20bp) therefore aligning against the genome will give plenty of multimappers just by chance, that is at least what I took from questions towards miRNAs I rad here on biostars. Not a miRNA expert myself. Therefore aligning against a dedicated database might be better.

ADD REPLY • link 5.1 years ago by ATpoint 89k

0

Entering edit mode

That's a reasonably valid logic. I tried aligning with HiSat2 to the Ensembl reference genome and observed that overall alignment rate is ~60-70% with 40-50% unique alignment.

On a similar note, how do you deal with mRNA reads multi-mapping to different positions in the reference genome? Count all or ignore all? We can always avoid this by aligning only to known transcripts.

How about letting the reads align to where ever it can and then only the gene we require? Like mRNA from certain genes may may to pseudogenes, but while counting if we consider only genes and not pseudogenes may make sense.

ADD REPLY • link 5.1 years ago by Arindam Ghosh ▴ 550

1

Entering edit mode

mRNA reads are typically discarded if they multimap. Tools like salmon which perform pseudo- or selective alignment against the transcriptome have a more elaborate strategy to deal with multimappers but I do not recall the principle. Check the salmon (Patro et al) paper for details if you want.

but while counting if we consider only genes and not pseudogenes may make sense.

You cannot cherrypick during alignment. If you only count non-pseudogenes, but the count in reality come from the pseudogene while maybe the non-pseudogene has count 0, then you get many false positives. If you have multimapping then it is what it is, you cannot confidently say where the reads come from.

ADD REPLY • link 5.1 years ago by ATpoint 89k

score 3 · Answer 2 · 2019-03-06

I agree with ATpoint- there is no easy answer for which of them is better. Two additional comments:

1) It is highly important to do pre-processing for miRNA-seq reads (removing adapters, low-quality sequences, size enrichment).

2) As for the alignment- there are several aligners and methods the can be used (I prefer aligning to the genome and the take only reads that were mapped to miRNAs). Consider reading this

score 2 · Answer 3 · 2019-03-06

For point 1: As ATpoint said, one method is not better than the other - each has its own merits. As tough as it is when first starting out, you need to read through the assumptions of the methods and make a judgement call as to which is most appropriate for your data. It's helpful to read through the papers describing the methods, as well as their various vignettes (such as those on Bioconductor), and general rna-seq workflow (e.g. http://master.bioconductor.org/packages/release/workflows/html/rnaseqGene.html). I personally like to use DESeq2, and the DESeq2 vignette is particularly helpful and easy to follow.

For the other two points, the various vignettes will tell you how to do this. You will need to get an estimate of the read counts for transcripts/genes, then compare these among your conditions. Depending on your thresholds for differential expression, you will get a list of differentially expressed genes. Those that are upregulated in a condition will have positive fold changes relative to the contrast group, those that are downregulated will will have negative fold changes relative to the contrast group. Again, this will all become more clear after reading vignettes and following tutorials.

score 1 · Answer 4 · 2019-03-06

1

Entering edit mode

6.4 years ago

nsmi8446 ▴ 170

This site might be helpful for you, it gives an accessible introduction to this type of analysis:

https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/srna/tutorial.html

ADD COMMENT • link 6.4 years ago by nsmi8446 ▴ 170

score 1 · Answer 5 · 2019-03-06

Unless your targeted organism has very poor sequenced genome, I would strongly suggest to map your smallRNA-seq to the reference genome, and then count aligned reads by using miRNA GFF or GTF positions.

About what software to use for Differential Expression. EdgeR or DESeq2 ar both good options, and should give you similar results.

Be aware of processing and alignment steps. As ATpoint and biobiu suggested, better taking some time to have properly clean smallRNAseq data to start further steps. I use bowtie aligner with following options: bowtie -n 1 -l 10 -m 100 -k 1 --best --strata, i.e. allowing one mismatch in alignment seed with a 10 nucleotides length, removing reads with more than 100 putative mapping sites and reporting first single best stratum alignment. Mismatches can range from 0 to 2-3, at maximum, I would say.