Question

DESeq2 input from GDAC firehose

0

Entering edit mode

3.5 years ago

JB lee • 0

Hi guys, I hope you are fine.

I'm not good in English so if you couldn't understand my question, please feel free to reply. I'm a beginner of bioinformatics. I want to practice differential expressed gene (DEG) analysis in R.

The RNA seq data I used was downloaded from broad GDAC firehose. There are two types of not normalized data, one is "illuminahiseq_rnaseq-gene_expression (MD5)", and another one is "illuminahiseq_rnaseqv2-RSEM_genes (MD5)". (I decided to download these two because it prefers not normalized data as far as I know)

These two fires have raw count column, but the value of them are different. I wonder A's raw_counts is a real raw count and B's raw_counts is an expected counts by RSEM.

Below are the some rows of each data.

A) illuminahiseq_rnaseq-gene_expression (MD5)

gene raw_counts median_length_normalized RPKM AADACL3|126767_calculated 36 0.6686 0.0539

B) illuminahiseq_rnaseqv2-RSEM_genes (MD5) gene_id raw_count scaled_estimate transcript_id A1BG|1 247.2 2.27E-06 uc002qsd.3,uc002qsf.1 A1CF|29974 0 0 uc001jjh.2,uc001jji.2,uc001jjj.2,uc001jjk.1,uc009xov.2,uc010qhn.1,uc010qho.1 AADACL3|126767 16 7.38E-08 uc001aug.1,uc009vnn.1

What kinds of data do you prefer to use?

I guess I should use DESeqDataSetFromTximport() with RSEM raw counts, and DESeqDataSet() with another data. Is it right..?

Thank you.

DESeq2 counts raw • 632 views

ADD COMMENT • link 3.5 years ago by JB lee • 0