Quantifying abundances of transcripts using kallisto
1
0
Entering edit mode
4.0 years ago
Kumar ▴ 170

Hi, I am analyzing RNA-Seq data. I have 190 PE samples (control: male, female, mutant: male, female) and generated featureCount matrix. For the differential gene expression, I used deseq2 with adjusted 0.05 p-value.

On the other hand, I used kallisto for quantifying abundances of transcripts with the same input data.

My question is that I am getting very less number of transcripts in differential expression after kallisto analysis in comparison to gene analysis results. However, the number of DE should be more in abundances of transcripts rather than a gene-based method.

Please give your opinion:

Here are the results:

Results: Deseq2 gene expression using STAR aligner

adjusted p-value < 0.05
LFC > 0 (up)       : 606, 1.2%
LFC < 0 (down)     : 169, 0.34%
outliers [1]       : 7, 0.014%
low counts [2]     : 40617, 81%

(mean count < 6)

Deseq2 result with kallisto:

out of 138 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 124, 90%
LFC < 0 (down)     : 14, 10%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
rna-seq kallisto next-gen • 1.8k views
ADD COMMENT
0
Entering edit mode
4.0 years ago

1) It is not clear to me that DESeq is intended for finding DE transcripts. Genes, sure, but not transcripts

2) Kallisto is smarter about assigning ambiguously assigned reads, I'd be using its gene counts, not FeatureCounts. Then you don't have to run two totally separate pipelines.

ADD COMMENT
0
Entering edit mode

Hi, I am new in this analysis. I ran the kallisto and got all .tsv files then I merged these all files to make a single matrix file. It looks like this: Therefore, you mean I do not need to do differential expression analysis with this data, if so how I can use these data for RNA-seq analysis. Thank you!

#

                      270   515 574
    ENST00000415118.1   0   0   0
    ENST00000434970.2   0   0   0
    ENST00000448914.1   0   0   0
    ENST00000604642.1   0   0   0
    ENST00000603326.1   0   0   0
ADD REPLY
0
Entering edit mode

You should use tximport first (or tximeta)

ADD REPLY
0
Entering edit mode

Yes! I am using tximport.

Here is my script: However, the issue here is that my samples number are not equal so I am not sure how to improve the script instead of using "each=93". Please suggest.

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts)
sampleTable <- data.frame(condition = factor(rep(c("mutant", "control"), each=93)))
rownames(sampleTable) <- colnames(txi.kallisto$counts)
dds <- DESeqDataSetFromTximport(txi.kallisto, sampleTable, ~condition)

dds <- DESeq(dds)

Since I was not sure about the data structure, so I prepared a single .csv matrix file of all samples' count files using cut and paste commands and I am using the following script: Please suggest if this way is correct.

dat <- read.table("/outfile-kallisto.txt", header=TRUE)
dat <- as.matrix(dat)
head(dat)
metadata <- read.table("/metadata-kallisto.txt", header=TRUE)
dds <- DESeqDataSetFromMatrix(countData = dat,
                          colData = metadata,
                          design= ~ Group + Gender)

dds <- DESeq(dds)
ADD REPLY
0
Entering edit mode

You need to stop and figure out what that line with the "each=93" is doing, then you will know how to modify it for your own needs. Or, making a metadata table outside R is fine too.

Are you completely sure that txOUT = true is what you want to use with DESeq?

ADD REPLY
0
Entering edit mode

I figured out the updates. Here is my updated script: However, the genes in differential gene expression are very few numbers.

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts)
sampleMetaData <- data.frame(Group=c(rep(c("mutant"), 94), rep(c("control"), 93)), Gender=rep(c(rep(c("F"), 
130), rep(c("M"), 57))))

rownames(sampleMetaData) <- colnames(txi.kallisto$counts)
dds <- DESeqDataSetFromTximport(txi.kallisto, sampleMetaData, design= ~ Group + Gender)
res <- results(dds)
dds <- DESeq(dds)

results

out of 171018 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 975, 0.57%
LFC < 0 (down)     : 12, 0.007%
outliers [1]       : 0, 0%
low counts [2]     : 73970, 43%
(mean count < 2)
ADD REPLY

Login before adding your answer.

Traffic: 2742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6