Question

Differentially expressed genes from expression table (RNAseq)

0

Entering edit mode

5.5 years ago

english.server ▴ 300

I am learning to analyze RNA-seq at GEO. I am aware that raw counts can be processed by EdgeR or DESeq2 to obtain DEGS. However, while looking at the supplementary data for GSE130883 I found an expression table that looks like:

    ID                  e1      e2      e3        
    ENSMUSG00000069049  3.9853  3.98668 3.98668
    ENSMUSG00000069045  2.86804 2.83166 2.80527
    ENSMUSG00000068457  1.96894 1.99508 1.87452
    ENSMUSG00000056673  2.2292  2.14263 2.02953
    ENSMUSG00000025332  2.54212 2.56631 2.56794

Is there a way to obtain DEG from these results? Is it possible to use limma or t-test or are there dedicated routines?

Thank you.

DEG expression table RNA-seq GEO • 1.7k views

ADD COMMENT • link updated 5.5 years ago by ATpoint 85k • written 5.5 years ago by english.server ▴ 300

1

Entering edit mode

It is just a matrix of numbers - can you read the related manuscript to find out to what these numbers relate, exactly? Then, we can better advise.

Manuscript: Sex-Dependent Sensory Phenotypes and Related Transcriptomic Expression Profiles Are Differentially Affected by Angelman Syndrome.

ADD REPLY • link 5.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you for your response. Firstly, I did not cover all columns of data in my original post. The column headings (12 total) go as follow:

WT_M_6GCCAATL007SAll_PE
WT_M_5ACAGTGL007SAll_PE
WT_M_10TAGCTTL007SAll_PE
WT_F_8ACTTGAL007SAll_PE
WT_F_7CAGATCL007SAll_PE
WT_F_2CGATGTL007SAll_PE
AS_M_9GATCAGL007SAll_PE
AS_M_3TTAGGCL007SAll_PE
AS_M_12CTTGTAL007SAll_PE
AS_F_4TGACCAL007SAll_PE
AS_F_1ATCACGL007SAll_PE
AS_F_11GGCTACL007SAll_PE

where

WT= wild type
AS= angelman syndrom
M/F= male/female

The study itself studies the effect of sex on Transcriptomic Expression Profiles of Angelman syndrom rats.

I hope I got the point of ypur question.

ADD REPLY • link updated 5.5 years ago by Kevin Blighe 88k • written 5.5 years ago by english.server ▴ 300

1

Entering edit mode

Okay, it is good that they have 3 replicates per group. I assume that these expression values are the normalised+transformed counts? - in this case, they should be suitable for any downstream analysis that you want to perform, e.g., clustering, 'machine learning' stuff, etc. You can also justify the use of ANOVA, t-test, Limma, etc.

Just try to confirm how this data was produced, though - it must state it in the Methods or Supplementary Methods, somewhere.

I would also check the distribution of the data via hist() and boxplot()

ADD REPLY • link 5.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you fo your respoonse. Is there a "preferred" method of obtaining DEGs in data that are normalised+transformed?

ADD REPLY • link 5.5 years ago by english.server ▴ 300

1

Entering edit mode

Not of which I am aware. Once the main program (DESeq2, EdgeR, etc) normalises and transforms the data, it is basically saying: 'Do whatever you want with this data'. If you still want to err on the side of caution, then use non-parametric tests (Kruskal-Wallis ANOVA, Mann-Whitney U test, Wilcoxon signed-rank test, Spearman correlation, etc.).

If you are aiming to perform differential expression comparisons, then could you not obtain the original raw data and re-process it (and perform the comparisons within DESeq2, EdgeR, etc)?

ADD REPLY • link 5.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks for your time and response. Raw data is available in my case (from SRA) but I dont know how to analyze that.

ADD REPLY • link 5.5 years ago by english.server ▴ 300

1

Entering edit mode

I see. In that case, please use the supplementary data

ADD REPLY • link 5.5 years ago by Kevin Blighe 88k

score 1 · Answer 1 · 2019-06-02

1

Entering edit mode

5.5 years ago

ATpoint 85k

Not answering your question directly but you can always download the raw data from the ENA => Fast download of FASTQ files from the European Nucleotide Archive (ENA), use a computationally unexpensive quantification pipeline like salmon/tximport on them and then proceed with raw counts using either of the established tools. Both salmon(for quantification of fastq files against a reference transcriptome in fasta format) and tximport (to read the transcript counts into R and summarize them to the gene level) have very good documentation.

ADD COMMENT • link 5.5 years ago by ATpoint 85k

0

Entering edit mode

Thank you for your response. Is salmon/tximport able to analyze human RNA-seq data on a 4G ram laptop?

ADD REPLY • link 5.5 years ago by english.server ▴ 300

1

Entering edit mode

I think that should be OK, but I haven't tested exporting the quantifications to a genome alignment .bam file for visualization (and you may need a genome alignment if testing re-analysis with other programs).

However, checking the robustness of gene level (or transcript) level assignments between biological replicates is already a good start (and something that you can do on your laptop).

ADD REPLY • link 5.5 years ago by Charles Warden 8.3k