results

Question

Quantifying abundances of transcripts using kallisto

0

Entering edit mode

4.0 years ago

Kumar ▴ 170

Hi, I am analyzing RNA-Seq data. I have 190 PE samples (control: male, female, mutant: male, female) and generated featureCount matrix. For the differential gene expression, I used deseq2 with adjusted 0.05 p-value.

On the other hand, I used kallisto for quantifying abundances of transcripts with the same input data.

My question is that I am getting very less number of transcripts in differential expression after kallisto analysis in comparison to gene analysis results. However, the number of DE should be more in abundances of transcripts rather than a gene-based method.

Please give your opinion:

Here are the results:

Results: Deseq2 gene expression using STAR aligner

adjusted p-value < 0.05
LFC > 0 (up)       : 606, 1.2%
LFC < 0 (down)     : 169, 0.34%
outliers [1]       : 7, 0.014%
low counts [2]     : 40617, 81%

(mean count < 6)

Deseq2 result with kallisto:

out of 138 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 124, 90%
LFC < 0 (down)     : 14, 10%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)

rna-seq kallisto next-gen • 1.8k views

ADD COMMENT • link updated 4.0 years ago by swbarnes2 14k • written 4.0 years ago by Kumar ▴ 170

score 0 · Answer 1 · 2020-11-20

0

Entering edit mode

4.0 years ago

swbarnes2 14k

1) It is not clear to me that DESeq is intended for finding DE transcripts. Genes, sure, but not transcripts

2) Kallisto is smarter about assigning ambiguously assigned reads, I'd be using its gene counts, not FeatureCounts. Then you don't have to run two totally separate pipelines.

ADD COMMENT • link 4.0 years ago by swbarnes2 14k

0

Entering edit mode

Hi, I am new in this analysis. I ran the kallisto and got all .tsv files then I merged these all files to make a single matrix file. It looks like this: Therefore, you mean I do not need to do differential expression analysis with this data, if so how I can use these data for RNA-seq analysis. Thank you!

#

                      270   515 574
    ENST00000415118.1   0   0   0
    ENST00000434970.2   0   0   0
    ENST00000448914.1   0   0   0
    ENST00000604642.1   0   0   0
    ENST00000603326.1   0   0   0

ADD REPLY • link 4.0 years ago by Kumar ▴ 170

0

Entering edit mode

You should use tximport first (or tximeta)

ADD REPLY • link 4.0 years ago by eridanus ▴ 40

0

Entering edit mode

Yes! I am using tximport.

Here is my script: However, the issue here is that my samples number are not equal so I am not sure how to improve the script instead of using "each=93". Please suggest.

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts)
sampleTable <- data.frame(condition = factor(rep(c("mutant", "control"), each=93)))
rownames(sampleTable) <- colnames(txi.kallisto$counts)
dds <- DESeqDataSetFromTximport(txi.kallisto, sampleTable, ~condition)

dds <- DESeq(dds)

Since I was not sure about the data structure, so I prepared a single .csv matrix file of all samples' count files using cut and paste commands and I am using the following script: Please suggest if this way is correct.

dat <- read.table("/outfile-kallisto.txt", header=TRUE)
dat <- as.matrix(dat)
head(dat)
metadata <- read.table("/metadata-kallisto.txt", header=TRUE)
dds <- DESeqDataSetFromMatrix(countData = dat,
                          colData = metadata,
                          design= ~ Group + Gender)

dds <- DESeq(dds)

ADD REPLY • link 4.0 years ago by Kumar ▴ 170

0

Entering edit mode

You need to stop and figure out what that line with the "each=93" is doing, then you will know how to modify it for your own needs. Or, making a metadata table outside R is fine too.

Are you completely sure that txOUT = true is what you want to use with DESeq?

ADD REPLY • link 4.0 years ago by swbarnes2 14k

0

Entering edit mode

I figured out the updates. Here is my updated script: However, the genes in differential gene expression are very few numbers.

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts)
sampleMetaData <- data.frame(Group=c(rep(c("mutant"), 94), rep(c("control"), 93)), Gender=rep(c(rep(c("F"), 
130), rep(c("M"), 57))))

rownames(sampleMetaData) <- colnames(txi.kallisto$counts)
dds <- DESeqDataSetFromTximport(txi.kallisto, sampleMetaData, design= ~ Group + Gender)
res <- results(dds)
dds <- DESeq(dds)

results

out of 171018 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 975, 0.57%
LFC < 0 (down)     : 12, 0.007%
outliers [1]       : 0, 0%
low counts [2]     : 73970, 43%
(mean count < 2)

ADD REPLY • link 4.0 years ago by Kumar ▴ 170