Question

Best way to make a PCA plot integrating in-house data with public data?

0

Entering edit mode

2.7 years ago

Trivas ★ 1.9k

Hi everyone, I'm hoping to integrate public RNA-seq datasets with some of my own RNA-seq datasets. However, the relevant public datasets have pretty different sequencing depths and read lengths than others. For instance, my datasets are ~50m reads/sample, 2x150bp while others are ~30m reads/sample, 2x100 and others are 60m reads/sample, 2x76bp.

I downloaded the raw fastq files and processed them all similarly (trimgalore -> STAR -> salmon -> tximport into DESeq2) and using the raw counts in downstream analyses. However, I'm starting to realize that that's not going to work and I need to incorporate some sort of normalization. My end goal is to be able to compare all of these samples on a single PCA plot.

What's the best way of doing this? I have 15 samples with either 2 or 3 replicates split across 5 DESeq objects.

Use DESeq2 to normalize read counts based on sizeFactors and make a large dataframe of normalized counts prior to vst
Use avgLength in my DESeq object and calculate a rough TPM. Then make a large dataframe of TPM counts. My only caveat is I include the following before exporting my DESeq object.
```
keep <- rowSums(counts(ddsTxi)) >= 1
ddsTxi <- ddsTxi[keep,]
```
(Might not matter since it's only removing genes with samples that contain 0 genes.)
Regenerate count matrices using tximport but include countsFromAbundance = "scaledTPM" in the tximport() call. I'm not really sure if scaledTPM is a better option than lengthScaledTPM for what I want to do, so I will gladly take some guidance there as well.
Any other options?

Kind of kicking myself, but any of these options should be fairly quick to incorporate.

tximport SRA RNA-seq plot PCA • 881 views

ADD COMMENT • link updated 2.7 years ago by Hamid Ghaedi 3.3k • written 2.7 years ago by Trivas ★ 1.9k

0

Entering edit mode

In the very first step to making the dataset comparable, you need to remove batch effects. You may use the ComBat approach from sva R package to do so.

ADD REPLY • link 2.7 years ago by Hamid Ghaedi 3.3k