GDC & GTEx RNA sequencing normalization problem
0
0
Entering edit mode
8.0 years ago
Ted • 0

Dear all,

I've encountered a problem regarding the normalization of RNA sequencing count files that would like ask for your advice.

Basically our goal is to study the differential expressed genes for pancreatic cancer. Here is our procedures for data preprosessing.

  • Downloaded the pancreatic cancer HTSeq raw count data (177 cancers and 4 normals) from GDC data portal.

  • Used the GDC RNA sequencing pipeline to process all the GTEx SRA data (fastq dump -> STAR 2 pass -> fixmate -> HTSeq).

  • Performed TMM normalization to GDC cancer, GDC normal, and GTEx normal count data.

  • Performed voom transform to normalized count file.

We want to see how well the data is normalized so we performed the PCA to our transformed data. We also plot the gene mean/median density across samples between GDC cases and GTEx normals as well as gene mean/median ratio distributions.

PCA, enter image description here

As you can see, GDC cancer and normal are kind of mixed together compared to the GTEx normal. The first peaks on the mean/median plot between GDC and GTEx are bit mismatched. The radio is also away from 1.

My question is: do above phenomena indicate that the TMM normalization is not suitable in this case and large portion of gene will be identified as differential expressed if we carry on to do the DE analysis?

Thank you very much for your help!

RNA-Seq R • 3.0k views
ADD COMMENT
0
Entering edit mode

Hi Ted,

Clearly,GTEX normal form a separate cluster,seems to me a batch effect.I am not sure,TMM does batch correction. I think you should do batch correction before doing any comparison.

For differential Expression analysis(from counts),check this post for further links:

A: RNA sequencing data batch effect removal

For FPKMS based analysis,you can use Combat from "sva" bioconductor package to just to PCA for initial results.

ADD REPLY

Login before adding your answer.

Traffic: 1645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6