Hi guys, I hope you are fine.
I'm not good in English so if you couldn't understand my question, please feel free to reply. I'm a beginner of bioinformatics. I want to practice differential expressed gene (DEG) analysis in R.
The RNA seq data I used was downloaded from broad GDAC firehose. There are two types of not normalized data, one is "illuminahiseq_rnaseq-gene_expression (MD5)", and another one is "illuminahiseq_rnaseqv2-RSEM_genes (MD5)". (I decided to download these two because it prefers not normalized data as far as I know)
These two fires have raw count column, but the value of them are different. I wonder A's raw_counts is a real raw count and B's raw_counts is an expected counts by RSEM.
Below are the some rows of each data.
A) illuminahiseq_rnaseq-gene_expression (MD5)
gene raw_counts median_length_normalized RPKM AADACL3|126767_calculated 36 0.6686 0.0539
B) illuminahiseq_rnaseqv2-RSEM_genes (MD5) gene_id raw_count scaled_estimate transcript_id A1BG|1 247.2 2.27E-06 uc002qsd.3,uc002qsf.1 A1CF|29974 0 0 uc001jjh.2,uc001jji.2,uc001jjj.2,uc001jjk.1,uc009xov.2,uc010qhn.1,uc010qho.1 AADACL3|126767 16 7.38E-08 uc001aug.1,uc009vnn.1
What kinds of data do you prefer to use?
I guess I should use DESeqDataSetFromTximport() with RSEM raw counts, and DESeqDataSet() with another data. Is it right..?
Thank you.