differential expression at transcript level with paired samples
2
3
Entering edit mode
9.7 years ago
mjg ▴ 30

Dear all,

I want to do differential expression at transcript level using an RNA-seq dataset that consists of paired samples (patient 1 control/treatment, patient2 control/treatment, etc.).

I have not found methods other than Ballgown and DEXSeq that are able to handle paired designs. I have also tried the option of obtaining transcript counts and use them in edgeR. However I am concerned about the fact that edgeR is not particularly adapted to do analysis on transcript counts.

Does anyone has any advice on how to go on this problem? Or if there is a way to assess my results with the latter approach I described?

Many thanks,
Maria

RNA-Seq isoform paired-samples • 4.0k views
ADD COMMENT
0
Entering edit mode

Adding on the answer of Sean Davis, both salmon and kallisto can perform bootstrapping to estimate the uncertainty of the transcript level abundances. edgeR now has functions (catchSalmon and catchKallisto) to use these bootstraps to assign an overdispersion value to each trancript. Larger mapping uncertainty means higher overdispersion, so the mapping uncertainty is taken into account during the differential analysis. Having that said, edgeR is able to perform transcript level analysis, check its manual. I think these methods were introduced to edgeR a few years ago, but not when OP asked this question.

ADD REPLY
5
Entering edit mode
9.7 years ago

Perhaps not the solution you were looking for, but software like kallisto and Salmon (and others) generate estimates of raw counts per transcript. You could use those estimates in edgeR or DESeq2.

ADD COMMENT
0
Entering edit mode

Thanks Sean, I will consider these on my analysis.

ADD REPLY
1
Entering edit mode

If you go that route, Martin Morgan has provided code to read in kallisto output:

.require <-
function(pkg)
{
withCallingHandlers({
suppressPackageStartupMessages({
require(pkg, character.only=TRUE, quietly=TRUE)
})
}, warning=function(w) {
invokeRestart("muffleWarning")
}) || {
msg <- sprintf('install %s with
source("http://bioconductor.org/biocLite.R"); biocLite("%s")',
pkg, pkg)
stop(paste(strwrap(msg, exdent=2), collapse="\n"))
}
}
.open <- function(files, h5) {
lapply(files, function(file) {
if (h5 && H5Fis_hdf5(file))
H5Fopen(file)
else file(file, open="rt")
})
}
.close <- function(cons)
UseMethod(".close")
.close.list <- function(cons)
for (con in cons)
.close(con)
.close.connection <- function(cons)
close(cons)
.close.H5IdComponent <- function(cons)
H5Fclose(cons)
.colData <- function(jsonfile) {
json <- fromJSON(jsonfile)
do.call("data.frame", c(json, stringsAsFactors=FALSE))
}
.KALLISTO_COLCLASSES <-
c("character", "integer" , "numeric", "numeric", "numeric")
.KALLISTO_ROWDATA <- "length"
KALLISTO_ASSAYS <- c("est_counts", "tpm", "eff_length")
.read <- function(con)
UseMethod(".read")
.read.connection <- function(con)
read.delim(con, header=TRUE, colClasses=.KALLISTO_COLCLASSES, row.names=1)
.read.H5IdComponent <- function(con) {
eff_length <- h5read(con, "/aux/eff_lengths")
est_counts <- h5read(con, "/est_counts")
tpm0 <- est_counts / eff_length
data.frame(row.names=h5read(con, "/aux/ids"),
length=h5read(con, "/aux/lengths"),
eff_length=eff_length,
est_counts=est_counts,
tpm=tpm0 / (sum(tpm0) / 1e6))
}
readKallisto <-
function(files,
json=file.path(dirname(files), "run_info.json"),
h5=any(grepl("\.h5$", files)),
what=KALLISTO_ASSAYS,
as=c("matrix", "list", "SummarizedExperiment"))
{
as <- match.arg(as)
if (missing(what))
what <- what[1]
else {
whats <- eval(formals()[["what"]])
if (!all(what %in% KALLISTO_ASSAYS))
stop("'what' must be in ",
paste(sQuote(KALLISTO_ASSAYS), collapse=", "),
call.=FALSE)
}
stopifnot(is.character(files))
test <- file.exists(files)
if (!all(test))
stop("file(s) do not exist:\n ",
paste(files[!test], collapse="\n "))
if (is.null(names(files)))
names(files) <- basename(dirname(files))
if (as != "matrix") {
.require("jsonlite")
stopifnot(length(files) == length(json))
if (!is.null(names(json)))
stopifnot(identical(names(json), names(files)))
else
names(json) <- names(files)
test <- file.exists(json)
if (!all(test))
stop("json file(s) do not exist:\n ",
paste(json[!test], collapse="\n "))
}
if (h5)
.require("rhdf5")
if (as == "SummarizedExperiment") {
if (BiocInstaller::biocVersion() >= '3.2')
.require("SummarizedExperiment")
else .require("GenomicRanges")
}
cons <- .open(files, h5)
value <- .read(cons[[1]])
rowData <- value[, .KALLISTO_ROWDATA, drop=FALSE]
assay <- matrix(0, nrow(rowData), length(cons),
dimnames=list(rownames(rowData), names(cons)))
assays <- setNames(replicate(length(what), assay, FALSE), what)
for (w in what)
assays[[w]][,1] <- value[[w]]
for (i in seq_along(cons)[-1]) {
value <- .read(cons[[i]])
if (!identical(rowData, value[, .KALLISTO_ROWDATA, drop=FALSE]))
stop("rowData differs between files:\n ",
paste(files[c(1, i)], collapse="\n "))
for (w in what)
assays[[w]][,i] <- value[[w]]
}
.close(cons)
if (as != "matrix")
colData <- do.call("rbind", lapply(json, .colData))
switch(as, matrix={
if (length(assays) == 1L)
assays[[1]]
else assays
}, list={
c(setNames(list(colData, rowData), c("colData", "rowData")), assays)
}, SummarizedExperiment={
partition <-
PartitioningByEnd(integer(nrow(rowData)), names=rownames(rowData))
rowRanges <- relist(GRanges(), partition)
mcols(rowRanges) <- as(rowData, "DataFrame")
SummarizedExperiment(assays=assays, rowRanges=rowRanges,
colData=as(colData, "DataFrame"))
})
}
.readIds <- function(con, i) {
if (!missing(i))
h5read(con, "/aux/ids", list(i))
else h5read(con, "/aux/ids")
}
readBootstrap <-
function(file, i, j)
{
.require("rhdf5")
stopifnot(length(file) == 1L, is.character(file))
stopifnot(file.exists(file))
stopifnot(H5Fis_hdf5(file))
con <- H5Fopen(file)
on.exit(H5Fclose(con))
nboot <- as.integer(h5read(con, "/aux/num_bootstrap"))
if (nboot == 0L)
stop("file contains no bootstraps:\n ", file)
if (!missing(i) && is.character(i)) {
idx <- match(i, .readIds(con))
if (anyNA(i))
stop("unknown target id(s)", i[is.na(idx)])
i <- idx
}
if (!missing(j) && is.numeric(j)) {
if (any((j < 1L) || any(j > nboot)))
stop("'j' must be >0 and <=", nboot)
j <- paste0("bs", as.integer(j) - 1L)
}
m <- if (missing(i) && missing(j)) {
simplify2array(h5read(con, "/bootstrap"))
} else if (missing(i)) {
query <- setNames(sprintf("/bootstrap/%s", j), j)
simplify2array(lapply(query, h5read, file=con))
} else if (missing(j)) {
group <- H5Gopen(con, "/bootstrap")
name <- h5ls(group)$name
H5Gclose(group)
query <- setNames(sprintf("/bootstrap/%s", name), name)
simplify2array(lapply(query, h5read, file=con, index=list(i)))
} else {
query <- setNames(sprintf("/bootstrap/%s", j), j)
simplify2array(lapply(query, h5read, file=con, index=list(i)))
}
rownames(m) <- .readIds(con, i)
if (missing(j)) {
o <- order(as.integer(sub("bs", "", colnames(m), fixed=TRUE)))
m[,o]
} else m
}
view raw readKallisto.R hosted with ❤ by GitHub
source("readKallisto.R")
files <- dir("~/a/kallisto/example", "abundance.tsv", full=TRUE,
recursive=TRUE)
stopifnot(all(file.exists(files)))
str(readKallisto(files, as="matrix"))
str(readKallisto(files, as="list"))
(se <- readKallisto(files, as="SummarizedExperiment"))
str(readKallisto(files, what="eff_length"))
readKallisto(files, what="eff_length", as="SummarizedExperiment")
files <- sub(".tsv", ".h5", files, fixed=TRUE)
str(readKallisto(files))
str(readKallisto(files, what="tpm"))
readKallisto(files, what="tpm", as="SummarizedExperiment")
readKallisto(files, what=c("tpm", "eff_length"), as="SummarizedExperiment")
xx <- readBootstrap(files[1])
ridx <- head(which(rowSums(xx) != 0), 3)
cidx <- c(1:5, 96:100)
readBootstrap(files[1])[ridx, cidx]
readBootstrap(files[1], i=c(ridx, rev(ridx)), j=cidx)
view raw script.R hosted with ❤ by GitHub

ADD REPLY
3
Entering edit mode
9.6 years ago
Lior Pachter ▴ 720

The combination of kallisto and sleuth will allow you to do a transcript level analysis while taking into account your paired design.

ADD COMMENT

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6