Hi Guys,
When I try to run cufflinks, with the command:
cufflinks --GTF /.../B0510_manual_reindexed_v2.gff --min-isoform-fraction 0.5 --pre-mrna-fraction 0.05 --max-intron-length 2000 --small-anchor-fraction 0.06 --min-intron-length 30 --overlap-radius 1 --3-overhang-tolerance 0 --intron-overhang-tolerance 0 --no-faux-reads -p 8 -o /.../cufflinks_out_V3/Apo12B/ /media/cinerea/BGI_RNAseq_V2/.../Apo12B/accepted_hits.bam
Cufflinks just skips a huge part (+- 3.4Mb) of a scaffold, at the following step:
You are using Cufflinks v2.1.1, which is the most recent release. [14:00:50] Loading reference annotation. [14:00:50] Inspecting reads and determining fragment length distribution. Processing Locus B0510_5C01:490546-492362 [ ] 0%
I tried to tweak the parameter --max-bundle-frags up and down, but this does not make any difference. In isoforms.fpkm_tracking the transcripts are marked with HIDATA. The reads seem fine at this locus.
What is wrong? any ideas?
EDIT: I inspected the -verbose logs, and I see that exactly this part that's being skipped, is taken by cufflinks as one big bundle, with 1M reads on it. I lowered the --max-bundle-length flag, but this does not seem to have any effect at all?
EDIT2: It filters the large bundle after the "processing-step" resulting in no outcome at all for the genes in that locus. Where does cufflinks get it's bundle sizes from? Can I adjust this?
It does not make any difference. When I adjust this paramer it does not have any influence on the result. Hence the number of reads on that locus is not too high, but I think the locus is too big. I want to know where Cufflinks get its bundle sizes from, and if I can change this.
OK. I think you need to increase the max bundle length then. Have you tried that? In the explanation above, you only wrote that you had lowered it. In a run I looked at yesterday I used 10 million for both the size and length flags and it worked in that case. I am not too sure how the bundles are defined, unfortunately.
Something really stupid was the cause of this all. Inside my Gff file, there was a gene of size 3,5MB... So therefore cufflinks takes it as one bundle. After all no cufflinks problem.
So that would have been solved by increasing --max-bundle-length to 10 million I guess.