I'm not sure what is going on with my DEXSeq object
0
0
Entering edit mode
23 days ago

I am having difficulty getting my DEXSeq model to behave. I am using a RangedSummarizedExperiment object to load the counts into the DEXSeq object, but there have been all types of issues.

Approach 1:

I have 14 BAM files aligned using STAR and am using a custom gtf that includes a transgene. In order to speed up the process I tried using summarizeOverlaps on each BAM file separately and then merging everything into a count_matrix to load into DEXSeq. This seemed to have worked okay, but maybe now I am paying the price. Here is the code of how I am loading everything into DEXSeq

library(GenomicFeatures)  # for the exonicParts() function
library(txdbmaker)        # for the makeTxDbFromGFF() function
library(GenomicRanges)

## Flatten GTF
gtf <- makeTxDbFromGFF(file = "custom.gtf")

flattened_gtf <- exonicParts(gtf, linked.to.single.gene.only = TRUE)

# read exon count data
all_files <- dir("overlaps", pattern = "*.txt$", full.names = TRUE)

counts_list <- lapply(X = all_files, read.table, header = TRUE)
names(counts_list) <- paste0("S", str_extract(string = basename(all_files), pattern = "[[:digit:]]+"))

# merge exon count data
counts_matrix <- do.call(cbind, counts_list)
colnames(counts_matrix) <- gsub(pattern = "X", replacement = "S", str_extract(string = colnames(counts_matrix), pattern = ".*(?=_Aligned)"))

# load metadata
meta <- read.table(file = "meta.txt", sep = "\t", header = TRUE)

# arrange metadata to match order of counts_matrix
new_order <- match(colnames(counts_matrix), meta$Pedigree)
meta <- meta[new_order,]

if( all(match(colnames(counts_matrix), meta$Pedigree) == seq_len(length(meta$Pedigree))) ){ print("Correct Order") }

# transform into a summarized experiment
counts_matrix <- as.matrix(counts_matrix)

col_data <- DataFrame(meta)
rownames(col_data) <- col_data$Pedigree

se <- SummarizedExperiment(assays = list(counts = counts_matrix), rowRanges= flattened_gtf, colData = col_data)

# perform DEXSeq analysis
# load a BCPARAM object to allow multi-threading
param <- MulticoreParam(workers = 8)
register(param)

dxd <- DEXSeqDataSetFromSE(se, design = ~ Pedigree + exon + Condition:exon)

I keep getting the following error message when using this approach:

Error in SummarizedExperiment(nCountData, colData = colData, rowRanges = featureRanges) : the rownames of the supplied assay(s) must be NULL or identical to those of the RangedSummarizedExperiment object (or derivative) to construct

Approach 2:

I try to just use summarizeOverlaps to generate the summarizedExperiment

library(GenomicFeatures)  # for the exonicParts() function
library(txdbmaker)        # for the makeTxDbFromGFF() function
library(GenomicRanges)

## Flatten GTF
gtf <- makeTxDbFromGFF(file = "custom.gtf")

flattened_gtf <- exonicParts(gtf, linked.to.single.gene.only = TRUE)

library(Rsamtools)
library(GenomicAlignments)
library(BiocParallel) # speed everything up (hopefully)

param <- MulticoreParam(8)
register(param)

BAMS <- file.path("STAR", dir("STAR")[grep(dir(BAMS), pattern = "*.bam")])
BAMLIST <- BamFileList(BAMS, yieldSize = 10000)

se = summarizeOverlaps(
  flattenedGTF, BAMLIST, singleEnd=FALSE, mode = Union,
  fragments=FALSE, ignore.strand=FALSE, BPPARAM = param)

dxd <- DEXSeqDataSetFromSE(se, design = ~ Pedigree + exon + Condition:exon)

And I get the same error message. I am at a loss here. Is there anyone able to help me troubleshoot a reason why this isn't working?

GenomicRanges DEXSeq Help RNA-Seq SummarizedExperiment • 141 views
ADD COMMENT

Login before adding your answer.

Traffic: 2400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6