When running Cufflinks, the run hangs at the same position each time. This is my command and I have no explanation for this, if anyone has any suggestions please let me know!
cufflinks -p 20 -v -M /opt/databases/genomes/Ensembl_2/Ensembl_Mask.gtf --max-bundle-frags 10000000000 --GTF-guide /opt/databases/genomes/Ensembl_2/genome.gtf -o ./Cufflinks ./Tophat/accepted_hits.bam
EDIT: By position, I mean chromosomal position during the bundle inspection process of the Cufflinks pipeline.
EDIT2: I reduced the --max-bundle-frags switch to 1000000 and after 19 hours, Cufflinks still hangs on the bundle inspection process at the same chromosomal position. This data is big (174 Million reads), but I thought the bundle inspection process was meant to be pretty quick, can anyone correct me on this? It seems that the bundle inspection does not use the --max-bundle-frags switch.
EDIT3: After checking the log more thoroughly, I came across this entry:
Inspecting bundle 14:50035707-50105467 with 6191576 reads
Which clearly demonstrates that the bundle inspection process of cufflinks is not using --max-bundle-frags. I've not seen anything that would suggest this is by design.
UPDATE: After speaking with a Cufflinks dev - The CURRENT (Cufflinks 2.2.0) behaviour is that enabling --GTF-guide WILL OVERRIDE the --max-bundle-frags parameter. This means that if you want to do novel discovery, there is no option to set a skip parameter for large bundles. I've been told that it's on the to-do list for implementation.
For some reason, the title of this post made me think of a bioinformatics-themed children's book.
The Little Tophat That Could
The typo that lead to a bioinformatics children's book? I'll take it.
what does it mean "same position", what position is that?