Question

Interpreting 2 RNA-seq data with different lengths

0

Entering edit mode

8.7 years ago

morovatunc ▴ 560

Hi,

I am doing a course project which I was asked to analyse RNAseq data. For this analysis I picked two data sets from GEO data base. The difference of these datasets is they have different read length. Dataset A has 50 sequence length and Dataset B has 202 sequence length. I obtained these values from FastQC software.

So I would like to know;

My aim is to evaluate differentially expressed genes. Would it be logical to compare genes in these datasets?
Should I use other softwares to evaluate sequence length? Also, forgive my ignorance about this question but is sequence length mean read length?

Thank you for your time, Best,

Tunc.

RNA-Seq DEseq • 1.9k views

ADD COMMENT • link updated 8.7 years ago by andrew.j.skelton73 6.6k • written 8.7 years ago by morovatunc ▴ 560

score 0 · Answer 1 · 2016-03-23

0

Entering edit mode

8.7 years ago

andrew.j.skelton73 6.6k

This all depends on a few factors. Do you want to compare across GEO entries? That's exponentially more problematic. What software do you want to use to analyse the RNA Seq data? Being able to analyse across GEO entries requires that you have the same kinds of samples across both entries, prepped in the same way and account for the differences in your model design.

ADD COMMENT • link 8.7 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Thank you for your response;

I am planning to use DEseq.

I will basically compare datasetA in itself and dataset in itself. Then I will compare correlation of the genes across datasets.

For ex, when we look at datasetA, X gene is overexposed. We can see the same trend of gene X in the second dataset.

Also, these datasets are biologically related.

ADD REPLY • link 8.7 years ago by morovatunc ▴ 560

1

Entering edit mode

That seems like a reasonable approach. Make sure you use DESeq2 rather than DESeq. Performing the differential expression tests independently, and looking for the intersection between the two tests on different datasets, means that you can be confident in what you're seeing.