Entering edit mode
17 days ago
Gayatri
•
0
Hello,
I am trying to call super-enhancers using ROSE algorithm with mm39 genome for ChIP-seq data and encountering the following error:
MAKING TSS COLLECTION
Traceback (most recent call last):
File "/path/to/file/ROSE/bin/ROSE_geneMapper.py", line 299, in <module>
main()
File "path/to/file/ROSE/bin/ROSE_geneMapper.py", line 282, in main
enhancerToGeneTable,geneToEnhancerTable,withGenesTable = mapEnhancerToGene(annotFile,enhancerFile, uniqueGenes=True, byRefseq=options.refseq, subtractInput=options.control, transcribedFile=transcribedFile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/file/ROSE/bin/ROSE_geneMapper.py", line 140, in mapEnhancerToGene
closestGene = allEnhancerGenes[distList.index(min(distList))]
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Command used:
ROSE_main.py -g MM39 -i /path/to/file/treated1.bed -r /path/to/file/treated1.sorted.bam -c /path/to/file/treated1_input.sorted.bam -o . -s 12500 -t 2500
I have tried to make a refseq file for mm39 genome as specified in the instructions here. Hence any help in why the error might be occuring is highly appreciated.
Did you update the source code in all the required spots as well?
I did update the source code wherever required, changing it to MM39, adding mm39 refseq file similar to mm10 and others. The main problem is ROSE algorithm worked for control samples but is failing on the treated samples.
Are you sure they were aligned to mm39 as well?
The original ROSE implementation has a fair number of problems. I'm frankly surprised it runs at all.
While incomplete, I have been building an R implementation of ROSE here. It does not match the results from the original implementation exactly due to minor differences in how coverage and calculated and a presumed bug in the original implementation that impacts the unstitching process of regions that overlap 3 or more TSSes of unique genes.
It is not well-tested, but it has a number of advantages over the original implementation, including much easier ways to change annotations used and exposing a number of hidden parameters. It also allows some monotonic transformations of signal that make the SE calls much more robust from sample to sample and not driven nearly so strongly by outliers.
It's also much, much faster (a few minutes per sample). Feel free to test it (and open issues if you run into them) if you'd like. I'd recommend installing from the dev branch -
devtools::install_github("j-andrews7/ECLIPSE@dev")
if you do.I think they aligned to mm39 as each time the message 'USING MM39 AS THE GENOME' is displayed while running each part of the algorithm. But I am not sure how to check whether proper alignment is occurring. And thank you Dr. Jared, I will definitely try ECLIPSE and get back to you as soon as I get some output over here.
I tried ECLIPSE twice, once without specifying the org.db, then with specifying org.db and got the following error:
Code:
Output:
I then tried the code for the sample for which the ROSE algorithm worked, and got a similar error for the same:
I am not sure how to input TxDb and OrgDb objects, for OrgDb, I specified OrgDb of Mouse. Should I remove the random, unnamed chromosomes? Or is it something else that might be raising the error?
Oh, thanks for giving it a shot. You need to provide both TxDb and OrgDb objects, e.g.
I've also shown the addition of a few parameters here that match what you were trying to do with the original ROSE calls. I have updated the vignette and README to better show how to provide the annotation data.
Thank you Dr. Jared. I used the code and it is working on both samples, but the warning is persistent in both outputs, as shown below in the output for treated sample run. Also, I noticed that only 3 SEs were called when I included TxDb with a cut-off of 12665.775570056, but when I provided
annotate = FALSE
, 12 SEs were called for the treated with a cut-off of 14933.3727976734, while for control sample, more than 30 SEs were called with a cut-off of 2589.83706832244.In addition, is there a way I can export the results for future use? And is it possible to find any instructions/manual sort of thing with detailed usage for doing various trial analyses with the parameters since this is relatively new for me? I am grateful for the ECLIPSE package Dr. Jared, thank you for developing it as well as suggesting its usage.
The warnings can be safely ignored, they are due to the TxDb containing many seqlevels not found in your peaks object. It is on the list to address, as it is a lot of extra noise.
The annotation does not impact the number of regions returned or the cutoff for SE classification. I would, however, expect differences between your initial run and the command I provided even with
annotate = FALSE
, as ECLIPSE won't perform the unstitching process that ROSE does by default (though on second thought, perhaps it should, even though the logic behind that process seems dubious). ROSE uses in-built (very old) annotations - ECLIPSE forces the user to provide the annotations via the TxDb, which while slightly less convenient, is much more flexible.The vignette is the most comprehensive workflow we have so far. It includes some rough test code for differential analyses and viz that are not yet included in the package. While it doesn't have exhaustive trials of the various parameters (as we're still adding them), it does mention a few of them and how they impact the results. We will perform more exhaustive comparisons when we get a chance. Again, this is still in development and in an alpha state.
You have a few options. Can save the GRanges output as an RDS via
saveRDS
or convert to a dataframe and save as a text file, e.g.:Thank you so much for your support and suggests Dr. Jared. I will further explore ECLIPSE and open issues on github if I run into any errors. Once again thank you for your time.
Regards, Gayatri.