calling super enhancers
1
0
Entering edit mode
11 weeks ago
Oburah • 0

while running ROSE algorithm from youngs lab, i encountered this error any solutions?

(ROSE) hesborn@hesborn-Latitude-5430:~/Desktop/ROZE/rose$ python2.7 ROSE_main.py -g Hg38 -i /home/hesborn/Desktop/Data/TGFB.gff -r /home/hesborn/Desktop/Data/possorted_bam.bam -o CREAN -s 12500 -t 2500




folder CREAN/ does not exist
folder CREAN/gff/ does not exist
folder CREAN/mappedGFF/ does not exist
USING /home/hesborn/Desktop/Data/TGFB.gff AS THE INPUT GFF
USING Hg38 AS THE GENOME
MAKING START DICT
LOADING IN GFF REGIONS
SKIPPING THIS LINE
['']
CHECKING INPUT TO MAKE SURE EACH REGION HAS A UNIQUE IDENTIFIER
REFERENCE COLLECTION PASSES QC
STITCHING REGIONS TOGETHER
PERFORMING REGION STITCHING
SKIPPING THIS LINE
['']
REMOVED 0 LOCI BECAUSE THEY WERE CONTAINED BY A TSS
REMOVED 0 STITCHED LOCI BECAUSE THEY OVERLAPPED MULTIPLE TSSs
ADDED BACK 0 ORIGINAL LOCI
MAKING GFF FROM STITCHED COLLECTION
WRITING STITCHED GFF TO DISK AS CREAN/gff/TGFB_12KB_STITCHED_TSS_DISTAL.gff
OUTPUT WILL BE WRITTEN TO  CREAN/TGFB_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt
python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b /home/hesborn/Desktop/Data/possorted_bam.bam -i CREAN/gff/TGFB_12KB_STITCHED_TSS_DISTAL.gff -o CREAN/mappedGFF/TGFB_12KB_STITCHED_TSS_DISTAL_possorted_bam.bam_MAPPED.gff &
python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b /home/hesborn/Desktop/Data/possorted_bam.bam -i /home/hesborn/Desktop/Data/TGFB.gff -o CREAN/mappedGFF/TGFB_possorted_bam.bam_MAPPED.gff &
PAUSING TO MAP
{'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'CREAN/mappedGFF/TGFB_12KB_STITCHED_TSS_DISTAL_possorted_bam.bam_MAPPED.gff', 'bam': '/home/hesborn/Desktop/Data/possorted_bam.bam', 'rpm': True, 'input': 'CREAN/gff/TGFB_12KB_STITCHED_TSS_DISTAL.gff'}
[]
mapping to GFF and making a matrix with fixed bin number
{'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'CREAN/mappedGFF/TGFB_possorted_bam.bam_MAPPED.gff', 'bam': '/home/hesborn/Desktop/Data/possorted_bam.bam', 'rpm': True, 'input': '/home/hesborn/Desktop/Data/TGFB.gff'}
[]
mapping to GFF and making a matrix with fixed bin number
WAITING FOR MAPPING TO COMPLETE. ELAPSED TIME (MIN):
0
using a MMR value of 502.2775
using a MMR value of 502.2775
has chr
has chr
Number lines processed
0
Number lines processed
0
30
60
90
120
150
180
210
240
270
300
330
360
390
420
450
480
510
540
570
600
630
660
690
ERROR: OPERATION TIME OUT. MAPPING OUTPUT NOT DETECTED

While running ROSE on ATAC-seq data i get this type of error,any help?

Ose • 726 views
ADD COMMENT
3
Entering edit mode
11 weeks ago

The original ROSE implementation uses a crazy slow coverage calculation. It also has a hidden timeout check if it takes too long.

Since you're using ATAC, I'm guessing the stitched regions are smaller and more numerous than with something like H3K27ac, which means this will take a particularly long time. You can edit the linked line to give it as much time as it needs.

ADD COMMENT
0
Entering edit mode

Thanks Jared,even when i edit the code, still have the same error as operation time out. any further approach to this?

ADD REPLY
1
Entering edit mode

How did you change the code? Could also just delete that if ticker == 144: block entirely (lines 443-446).

ADD REPLY
0
Entering edit mode

This is how i edited the code, its taking too much time

 #CHANGE THIS PARAMETER TO ALLOW MORE TIME TO MAP
        if ticker == 702:
ADD REPLY
1
Entering edit mode

Then try increasing it further or deleting that block entirely.

ADD REPLY
0
Entering edit mode
 #CHANGE THIS PARAMETER TO ALLOW MORE TIME TO MAP
        if ticker == 5000:

Thanks Jared, i no longer get the time out error but its been running 3 days,why is it this extremely slow?

ADD REPLY
1
Entering edit mode

For the reason I mention above, mostly. It uses an insanely slow coverage calculation loop whereby it's calling samtools view for each region to pull reads from the BAM, then manually extending the reads, then creates a dict for the region to count coverage at each base in the region, then summing them for each region.

For instances like ATAC where you may have 100k+ regions even after stitching, it's going to be very slow.

I've replicated the process in R using GRanges, and it runs in ~3-5 minutes with near identical results. I have it in an R package, but it's not really ready for prime time as I haven't implemented the stitching exclusion intricacies of ROSE yet.

ADD REPLY
0
Entering edit mode

Hi Jared, is the R package available? share the link,The ROSE algorithms seems not to work on my end

ADD REPLY

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6