Question

I'm not sure what is going on with my DEXSeq object

0

Entering edit mode

23 days ago

cftattersfield • 0

I am having difficulty getting my DEXSeq model to behave. I am using a RangedSummarizedExperiment object to load the counts into the DEXSeq object, but there have been all types of issues.

Approach 1:

I have 14 BAM files aligned using STAR and am using a custom gtf that includes a transgene. In order to speed up the process I tried using summarizeOverlaps on each BAM file separately and then merging everything into a count_matrix to load into DEXSeq. This seemed to have worked okay, but maybe now I am paying the price. Here is the code of how I am loading everything into DEXSeq

library(GenomicFeatures)  # for the exonicParts() function
library(txdbmaker)        # for the makeTxDbFromGFF() function
library(GenomicRanges)

## Flatten GTF
gtf <- makeTxDbFromGFF(file = "custom.gtf")

flattened_gtf <- exonicParts(gtf, linked.to.single.gene.only = TRUE)

# read exon count data
all_files <- dir("overlaps", pattern = "*.txt$", full.names = TRUE)

counts_list <- lapply(X = all_files, read.table, header = TRUE)
names(counts_list) <- paste0("S", str_extract(string = basename(all_files), pattern = "[[:digit:]]+"))

# merge exon count data
counts_matrix <- do.call(cbind, counts_list)
colnames(counts_matrix) <- gsub(pattern = "X", replacement = "S", str_extract(string = colnames(counts_matrix), pattern = ".*(?=_Aligned)"))

# load metadata
meta <- read.table(file = "meta.txt", sep = "\t", header = TRUE)

# arrange metadata to match order of counts_matrix
new_order <- match(colnames(counts_matrix), meta$Pedigree)
meta <- meta[new_order,]

if( all(match(colnames(counts_matrix), meta$Pedigree) == seq_len(length(meta$Pedigree))) ){ print("Correct Order") }

# transform into a summarized experiment
counts_matrix <- as.matrix(counts_matrix)

col_data <- DataFrame(meta)
rownames(col_data) <- col_data$Pedigree

se <- SummarizedExperiment(assays = list(counts = counts_matrix), rowRanges= flattened_gtf, colData = col_data)

# perform DEXSeq analysis
# load a BCPARAM object to allow multi-threading
param <- MulticoreParam(workers = 8)
register(param)

dxd <- DEXSeqDataSetFromSE(se, design = ~ Pedigree + exon + Condition:exon)

I keep getting the following error message when using this approach:

Error in SummarizedExperiment(nCountData, colData = colData, rowRanges = featureRanges) : the rownames of the supplied assay(s) must be NULL or identical to those of the RangedSummarizedExperiment object (or derivative) to construct

Approach 2:

I try to just use summarizeOverlaps to generate the summarizedExperiment

library(GenomicFeatures)  # for the exonicParts() function
library(txdbmaker)        # for the makeTxDbFromGFF() function
library(GenomicRanges)

## Flatten GTF
gtf <- makeTxDbFromGFF(file = "custom.gtf")

flattened_gtf <- exonicParts(gtf, linked.to.single.gene.only = TRUE)

library(Rsamtools)
library(GenomicAlignments)
library(BiocParallel) # speed everything up (hopefully)

param <- MulticoreParam(8)
register(param)

BAMS <- file.path("STAR", dir("STAR")[grep(dir(BAMS), pattern = "*.bam")])
BAMLIST <- BamFileList(BAMS, yieldSize = 10000)

se = summarizeOverlaps(
  flattenedGTF, BAMLIST, singleEnd=FALSE, mode = Union,
  fragments=FALSE, ignore.strand=FALSE, BPPARAM = param)

dxd <- DEXSeqDataSetFromSE(se, design = ~ Pedigree + exon + Condition:exon)

And I get the same error message. I am at a loss here. Is there anyone able to help me troubleshoot a reason why this isn't working?

GenomicRanges DEXSeq Help RNA-Seq SummarizedExperiment • 141 views

ADD COMMENT • link 23 days ago by cftattersfield • 0