Entering edit mode
10 weeks ago
Oburah
•
0
while running ROSE algorithm from youngs lab, i encountered this error any solutions?
(ROSE) hesborn@hesborn-Latitude-5430:~/Desktop/ROZE/rose$ python2.7 ROSE_main.py -g Hg38 -i /home/hesborn/Desktop/Data/TGFB.gff -r /home/hesborn/Desktop/Data/possorted_bam.bam -o CREAN -s 12500 -t 2500
folder CREAN/ does not exist
folder CREAN/gff/ does not exist
folder CREAN/mappedGFF/ does not exist
USING /home/hesborn/Desktop/Data/TGFB.gff AS THE INPUT GFF
USING Hg38 AS THE GENOME
MAKING START DICT
LOADING IN GFF REGIONS
SKIPPING THIS LINE
['']
CHECKING INPUT TO MAKE SURE EACH REGION HAS A UNIQUE IDENTIFIER
REFERENCE COLLECTION PASSES QC
STITCHING REGIONS TOGETHER
PERFORMING REGION STITCHING
SKIPPING THIS LINE
['']
REMOVED 0 LOCI BECAUSE THEY WERE CONTAINED BY A TSS
REMOVED 0 STITCHED LOCI BECAUSE THEY OVERLAPPED MULTIPLE TSSs
ADDED BACK 0 ORIGINAL LOCI
MAKING GFF FROM STITCHED COLLECTION
WRITING STITCHED GFF TO DISK AS CREAN/gff/TGFB_12KB_STITCHED_TSS_DISTAL.gff
OUTPUT WILL BE WRITTEN TO CREAN/TGFB_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt
python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b /home/hesborn/Desktop/Data/possorted_bam.bam -i CREAN/gff/TGFB_12KB_STITCHED_TSS_DISTAL.gff -o CREAN/mappedGFF/TGFB_12KB_STITCHED_TSS_DISTAL_possorted_bam.bam_MAPPED.gff &
python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b /home/hesborn/Desktop/Data/possorted_bam.bam -i /home/hesborn/Desktop/Data/TGFB.gff -o CREAN/mappedGFF/TGFB_possorted_bam.bam_MAPPED.gff &
PAUSING TO MAP
{'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'CREAN/mappedGFF/TGFB_12KB_STITCHED_TSS_DISTAL_possorted_bam.bam_MAPPED.gff', 'bam': '/home/hesborn/Desktop/Data/possorted_bam.bam', 'rpm': True, 'input': 'CREAN/gff/TGFB_12KB_STITCHED_TSS_DISTAL.gff'}
[]
mapping to GFF and making a matrix with fixed bin number
{'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'CREAN/mappedGFF/TGFB_possorted_bam.bam_MAPPED.gff', 'bam': '/home/hesborn/Desktop/Data/possorted_bam.bam', 'rpm': True, 'input': '/home/hesborn/Desktop/Data/TGFB.gff'}
[]
mapping to GFF and making a matrix with fixed bin number
WAITING FOR MAPPING TO COMPLETE. ELAPSED TIME (MIN):
0
using a MMR value of 502.2775
using a MMR value of 502.2775
has chr
has chr
Number lines processed
0
Number lines processed
0
30
60
90
120
150
180
210
240
270
300
330
360
390
420
450
480
510
540
570
600
630
660
690
ERROR: OPERATION TIME OUT. MAPPING OUTPUT NOT DETECTED
While running ROSE on ATAC-seq data i get this type of error,any help?
Thanks Jared,even when i edit the code, still have the same error as operation time out. any further approach to this?
How did you change the code? Could also just delete that
if ticker == 144:
block entirely (lines 443-446).This is how i edited the code, its taking too much time
Then try increasing it further or deleting that block entirely.
Thanks Jared, i no longer get the time out error but its been running 3 days,why is it this extremely slow?
For the reason I mention above, mostly. It uses an insanely slow coverage calculation loop whereby it's calling
samtools view
for each region to pull reads from the BAM, then manually extending the reads, then creates a dict for the region to count coverage at each base in the region, then summing them for each region.For instances like ATAC where you may have 100k+ regions even after stitching, it's going to be very slow.
I've replicated the process in R using GRanges, and it runs in ~3-5 minutes with near identical results. I have it in an R package, but it's not really ready for prime time as I haven't implemented the stitching exclusion intricacies of ROSE yet.
Hi Jared, is the R package available? share the link,The ROSE algorithms seems not to work on my end