Hello,
I need to take JGI annotation in a GFF format and somehow feed it to cufflinks. This is the format the annotation is in:
scaffold_1 JGI exon 1066 1206 . - . name "gm1.1_g"; transcriptId 113
scaffold_1 JGI CDS 1066 1206 . - 0 name "gm1.1_g"; proteinId 1; exonNumber 3
scaffold_1 JGI stop_codon 1066 1068 . - 0 name "gm1.1_g"
scaffold_1 JGI exon 1304 1600 . - . name "gm1.1_g"; transcriptId 113
scaffold_1 JGI CDS 1304 1600 . - 0 name "gm1.1_g"; proteinId 1; exonNumber 2
scaffold_1 JGI exon 1675 2220 . - . name "gm1.1_g"; transcriptId 113
scaffold_1 JGI CDS 1675 2220 . - 0 name "gm1.1_g"; proteinId 1; exonNumber 1
scaffold_1 JGI start_codon 2218 2220 . - 0 name "gm1.1_g"
scaffold_10 JGI exon 11 345 . + . name "gm1.12_g"; transcriptId 124
scaffold_10 JGI CDS 11 345 . + 0 name "gm1.12_g"; proteinId 12; exonNumber 1
scaffold_10 JGI start_codon 11 13 . + 0 name "gm1.12_g"
scaffold_10 JGI exon 470 512 . + . name "gm1.12_g"; transcriptId 124
scaffold_10 JGI CDS 470 512 . + 1 name "gm1.12_g"; proteinId 12; exonNumber 2
scaffold_10 JGI stop_codon 510 512 . + 0 name "gm1.12_g"
When I feed it to cufflinks, I get Seg. Fault. No idea why, no error message. I tried changing it to GTF format, and this is what I got:
scaffold_1 JGI transcript 1066 1206 . - . gene_id "gm1.1_g";
scaffold_1 JGI transcript 1304 1600 . - . gene_id "gm1.1_g";
scaffold_1 JGI transcript 1675 2220 . - . gene_id "gm1.1_g";
scaffold_1 JGI transcript 769 891 . - . gene_id "fgenesh1_pg.1_";
scaffold_1 JGI transcript 942 1031 . - . gene_id "fgenesh1_pg.1_";
scaffold_1 JGI transcript 1304 1469 . - . gene_id "fgenesh1_pg.1_";
scaffold_1 JGI transcript 1802 2220 . - . gene_id "fgenesh1_pg.1_";
scaffold_1 JGI transcript 1244 1603 . - . gene_id "fgenesh1_kg.1_";
scaffold_1 JGI transcript 1675 1950 . - . gene_id "fgenesh1_kg.1_";
scaffold_1 JGI transcript 1588 2220 . - . gene_id "fgenesh1_kg.1_";
scaffold_1 JGI transcript 1588 2220 . - . gene_id "fgenesh1_kg.1_";
scaffold_2 JGI transcript 1099 2094 . - . gene_id "gm1.2_g";
scaffold_2 JGI transcript 4803 5465 . + . gene_id "gm1.3_g";
scaffold_2 JGI transcript 1099 2094 . - . gene_id "fgenesh1_pg.2_";
scaffold_2 JGI transcript 3422 3525 . - . gene_id "fgenesh1_pg.2_";
scaffold_2 JGI transcript 3594 3675 . - . gene_id "fgenesh1_pg.2_";
scaffold_2 JGI transcript 4142 4196 . - . gene_id "fgenesh1_pg.2_";
scaffold_2 JGI transcript 4788 4878 . - . gene_id "fgenesh1_pg.2_";
scaffold_2 JGI transcript 5241 5274 . - . gene_id "fgenesh1_pg.2_";
scaffold_2 JGI transcript 1099 2094 . - . gene_id "CE84695_65";
scaffold_2 JGI transcript 1443 1631 . - . gene_id "fgenesh1_kg.2_";
scaffold_2 JGI transcript 1644 2094 . - . gene_id "fgenesh1_kg.2_";
scaffold_2 JGI transcript 1443 1631 . - . gene_id "fgenesh1_kg.2_";
scaffold_2 JGI transcript 1644 2094 . - . gene_id "fgenesh1_kg.2_";
And cufflinks doesn't crash anymore with the above file, the problem is that it processed 0 loci:
[13:44:59] Loading reference annotation.
[13:45:00] Inspecting reads and determining fragment length distribution.
> Processed 0 loci. [*************************] 100%
> Map Properties:
> Normalized Map Mass: 17691349.00
> Raw Map Mass: 17691349.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[13:45:29] Estimating transcript abundances.
> Processed 0 loci. [*************************] 100%
Any idea how I can tweak this so that it works with cufflinks?
Thanks, Adrian
If you can make a small reproducible example of the segfault, then it'd be good to submit a bug report so that simply gets fixed.