Question

How to take input TCGA miRNA data in Limma R package for differential expression

1

Entering edit mode

7.3 years ago

skjobs1234 ▴ 40

I have set of miRNA data from TCGA. I want differential expression analysis using limma R package. Please guide me in details how to proceed. How to take input?

next-gen RNA-Seq R rna-seq • 4.7k views

ADD COMMENT • link updated 4.9 years ago by Barry Digby ★ 1.3k • written 7.3 years ago by skjobs1234 ▴ 40

1

Entering edit mode

Which type of data did you get? - the open-access raw counts or the aligned BAM files?

ADD REPLY • link 7.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Raw data from TCGA database

ADD REPLY • link 7.3 years ago by skjobs1234 ▴ 40

0

Entering edit mode

Okay, I recently conducted an analysis of this exact data from the TCGA but I used DESeq2 to do the normalisation and differential expression.

First, I got a list of all TXT files containing the counts and read them into R using a loop. I then had a data-frame of raw counts. With that, it was easy to use DESeq2.

Do you have to use Limma or was it just a suggestion? I have only used Limma for microarray analyses, but I'm aware that the developers recently upgraded so that it can be used for RNA-seq

ADD REPLY • link 7.3 years ago by Kevin Blighe 88k

0

Entering edit mode

No.. I didn't used Limma right now. I gone through the Limma tutorial "theory" only. I want microarray analyses to see the differential expression of genes. Please suggest. Which tools will be better to used to see the differential expression DEseq2, limma and other tools.

ADD REPLY • link 7.3 years ago by skjobs1234 ▴ 40

0

Entering edit mode

If it's TCGA micro-RNA microarray data, then use limma. If it's TCGA micro-RNA-seq, then use DESeq2 or Limma-Voom. It looks like you have microarray data in which case I imagine that you have the normalised counts.

To run limma, you can use the following code:

We assume that you have a counts matrix, countsMatrix, and metadata in an object called metadata. The order of rows in metadata should match the order of colnames in countsMatrix.

In metadata, we have a column 'Condition', which gives details of the condition applied to each sample, e.g., control, treatmentA, treatmentB, etc.

    require(limma)

    #Set up the comparison matrix
    design <- paste(metadata$Condition, sep="")

    design <- factor(design)

    comparisonmodel <- model.matrix(~0+design)

    colnames(comparisonmodel) <- levels(design)

    #Create a limma object
    project.fitmodel <- lmFit(countsMatrix, comparisonmodel)
    project.fitmodel.eBayes <- eBayes(project.fitmodel)

    names(project.fitmodel.eBayes)

    #Make individual contrasts, fit the model, and adjust by empirical Bayes
    #Control Vs treatmentA
    limControlVtreatA <- makeContrasts(Q1="Control-treatmentA", levels=comparisonmodel)
    limControlVtreatA.fitmodel <- contrasts.fit(project.fitmodel.eBayes, limControlVtreatA)
    limControlVtreatA.fitmodel.eBayes <- eBayes(limControlVtreatA.fitmodel)

    #Control Vs treatmentB
    limControlVtreatB <- makeContrasts(Q2="Control-treatmentB", levels=comparisonmodel)
    limControlVtreatB.fitmodel <- contrasts.fit(project.fitmodel.eBayes, limControlVtreatB)
    limControlVtreatB.fitmodel.eBayes <- eBayes(limControlVtreatB.fitmodel)

    #Obtain statistics for each comparison and output results

    #output all stats; no P value adjusment
    topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="none", number=99999, p.value=1)

    #output all stats; adjust by Benjamini-Hochberg
    topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="BH", number=99999, p.value=1)

    #output only adjust P values < 0.05
    topTable(limControlVtreatA.fitmodel.eBayes, coef="Q1", adjust="BH", number=99999, p.value=0.05)

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

0

Entering edit mode

I have data set miRNA from TCGA. Data set look like this, miRNA_ID, read_count , reads_per_million_miRNA_mapped, and crossed-mapped

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped 
hsa-let-7a-1    74726   8647.065140                          N
hsa-let-7a-2    75087   8688.838961                          N 
hsa-let-7a-3    75246   8707.237957                          N 
hsa-let-7b      77259   8940.176186                          N 
hsa-let-7c       1480   171.261093                           N

So, My objective is to different fold differential analysis of miRNA data set. Please guide which tool(s) is better for this data analysis. I seen some paper using limma and edger. So please guide how to take input and proceed this data set. Thanks

ADD REPLY • link 7.0 years ago by skjobs1234 ▴ 40

2

Entering edit mode

Hi, @skjobs1234 did you find an appropriate answer for your query? Even I'm performing the similar analysis.

ADD REPLY • link 6.3 years ago by pradyumna Jayaram ▴ 210

1

Entering edit mode

Take read_count and use that as input for Limma-Voom. Presumably you have separate read_count values for different samples?

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

1

Entering edit mode

By the way, it looks like you have micro-RNA-seq data, not microarray.

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

2

Entering edit mode

@kevin Yes it is miRNA-seq data from TCGA

ADD REPLY • link 6.3 years ago by pradyumna Jayaram ▴ 210

1

Entering edit mode

...and you just have a single column of values? Sorry for the late follow-up - I missed your response.

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

2

Entering edit mode

It seems like the data is for single sample. the same read count is available for multiple samples and can be merged. I'm Guessing to go with DESeq2 or EdgeR for differential analysis. can limma be used for same? which one would be more effitient in identifying differential expression?

ADD REPLY • link 6.3 years ago by pradyumna Jayaram ▴ 210

2

Entering edit mode

Yes, you should merge them together into a data-frame, i.e., the read_count columns. Try cbind() or just data.frame(). EdgeR and DESeq2 are more common for differential expression analysis in RNA-seq. Limma was the supreme method for decades for microarrays and the equivalent method for RNA-seq data is Limma-Voom.

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

score 2 · Answer 1 · 2020-01-28

Just to add on to what Kevin was getting at,

At this link -- https://github.com/BarryDigby/TCGA_Biolinks/blob/master/TCGA_Biolinks.Rmd

You can download miRNA / gene counts from TCGA via Biolinks and perform DEA with DESeq2. I used PRAD in this example.

There is some code for handling messy metadata in there too, but be careful that it applies to your data as TCGA changes over time.