Hi everyone!
I'm trying to use the cageR package to analyze some CAGE-seq data (paired-end), however, the clusterCTSS function doesn't seem to work for me.
Here is the code that I used:
cage_bam <- new("CAGEset", genomeName = "BSgenome.Hsapiens.UCSC.hg19", inputFiles = bam_file, inputFilesType = "bamPairedEnd", sampleLabels = c('CTR','A','B','C'))
getCTSS(cage_bam, mappingQualityThreshold=20, sequencingQualityThreshold = 20)
#also tried this without these thresholds, but it still doesn't work :(
cage_bam_ctss <- CTSStagCount(cage_bam)
normalizeTagCount(cage_bam, method = "simpleTpm")
#also tried this without this normalization step, still doesn't work :(
clusterCTSS(object = cage_bam, method = "paraclu")
I got the following error message:
Filtering CTSSs below threshold...
Clustering...
-> CTR
-> A
Error in `[.data.frame`(data, , c("chr", "pos", "strand", s)) :
undefined columns selected
I tried changing the method to distclu, but then I got this error message:
Filtering CTSSs below threshold...
Clustering...
Error in .validate_names(colnames, ans_colnames, "assay colnames()", "colData rownames()") :
assay colnames() must be NULL or identical to colData rownames()
If anyone has any experience with this, or knows what I could try to get this to work, your help would be really appreciated!!!
EDITED to add what cage_bam looks like after running get_ctss:
cage_bam
S4 Object of class CAGEset
=======================================
Input data information
=======================================
Reference genome (organism): BSgenome.Hsapiens.UCSC.hg19
Input file type: bamPairedEnd
Input file names: /Users/odancu/Downloads/CAGE/BK_CTR-ready.bam, /Users/odancu/Downloads/CAGE/BK_A-ready.bam, /Users/odancu/Downloads/CAGE/BK_B-ready.bam, /Users/odancu/Downloads/CAGE/C-ready.bam
Sample labels: CTR, A, B, C
=======================================
CTSS information
=======================================
CTSS chromosome: chr1, chr1, chr1, ...
CTSS position: 15797, 17556, 29322, ...
CTSS strand: +, -, +, ...
Tag count:
-> CTR: 0, 0, 0, ...
-> A: 0, 0, 0, ...
-> B: 0, 1, 0, ...
-> C: 1, 0, 1, ...
Normalized tpm:
=======================================
Tag cluster (TC) information
=======================================
CTSS clustering method:
Number of TCs per sample:
=======================================
Consensus cluster information
=======================================
Number of consensus clusters:
Consensus cluster chromosome:
Consensus cluster start:
Consensus cluster end:
Consensus cluster strand:
Normalized tpm:
=======================================
Expression profiling
=======================================
Expression clustering method:
Expression clusters for consensus clusters:
=======================================
Promoter shifting
=======================================
GroupX:
GroupY:
Shifting scores:
KS p-values (FDR adjusted):
UPDATE:
I tried applying clusterCTSS
to the zebrafish sample dataset provided with the package, following the vignette, and Paraclu still has issues (albeit different ones), but distclu works well.
clusterCTSS(object = ce, method = "paraclu")
Filtering out CTSSs below threshold...
Clustering...
-> Zf.30p.dome
Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) :
<RangedSummarizedExperiment>[,j] index out of bounds: chr pos strand
clusterCTSS(object = ce, method = "distclu")
Filtering out CTSSs below threshold...
Clustering...
-> Zf.30p.dome
-> Zf.high
-> Zf.prim6.rep1
-> Zf.prim6.rep2
-> Zf.unfertilized.egg
My original code generated a CAGEset object, but I made a CAGEexp object and tried clusterCTSS again, and while paraclu gave the following error message:
clusterCTSS(object = cage_bam, method = "paraclu")
Filtering out CTSSs below threshold...
Clustering...
-> CTR
Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) :
<RangedSummarizedExperiment>[,j] index out of bounds: chr pos strand
Distclu worked well. I think the issue might be with how clusterCTSS handles CAGEset objects, or how paraclu handles inputs. While this now works, it's somewhat less than ideal, given that later analyses that I want to do on this clustered data, like running the function getExpressionProfiles
, don't work on CAGEexp objects. If anyone had success using clusterCTSS on CAGEset objects, please let me know!
Hi, could you show what
cage_bam
looks like after runninggetCTSS
? I believe the fieldschr
,pos
andstrand
are populated as part of it so the issue might be coming from this step. Are your input files publicly available?Hi lu.ne! No, my input files aren't publicly available. I've edited my post above to update what it looks like right after running
getCTSS
, prior to tpm normalization.It looks OK, I'm afraid I can't really reproduce your error... Are you using
CAGEr_1.26.0
(it should be the last version)?It looks like it's
CAGEr_1.24.0
. I'll try to update the version and see if the issue goes away. Thanks! :)If it does not work you could try changing the values of
mappingQualityThreshold
andsequencingQualityThreshold
, it might be that too many things are filtered. Otherwise, if you can reproduce the error with a small dummy dataset and share it here I'm sure we could help.