I ran cufflinks on three bacterial RNA-seq samples, and want to use cuffcompare to get the "union" of the transcripts and to map the transcripts to the annotated reference genome. I dod get the unioned list of transcripts (which I will use with cuff diff to look for differentially expressed transcripts) but the genes are not annotated. Can anyone suggest what the issue might be?
tl;dr: My cuffcomapre output has XLOCXXXXXX as gene ids instead of the reference gene name from the annotation file.
The command I have tried is
cuffcompare -r 'genome.gff' 'A2_cuffout/transcripts.gtf' EA349_1cuffout/transcripts.gtf' 'EA349_2cuffout/transcripts.gtf'
The genome.gff file looks like the following:
head genome.gff > ##gff-version 3 #!gff-spec-version 1.20 #!processor NCBI annotwriter ##sequence-region NC_002505.1 1 2961149 ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=243277 NC_002505.1 RefSeq region 1 2961149 . + . ID=id0;Dbxref=taxon:243277;Is_circular=true;biotype=El Tor;chromosome=I;gbkey=Src;genome=chromosome;mol_type=genomic DNA;old-name=Vibrio cholerae O1 biovar eltor str. N16961;serotype=O1;strain=N16961 NC_002505.1 RefSeq gene 235 402 . - . ID=gene0;Name=VC0001;Dbxref=GeneID:2614109;gbkey=Gene;locus_tag=VC0001 ...
And the transcripts.gtf files look like:
head 'A2_cuffout/transcripts.gtf > gi|15600771|ref|NC_002506.1| Cufflinks transcript 286 994 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "4.5531694820"; frac "1.000000"; conf_lo "2.465004"; conf_hi "4.663521"; cov "19.278874"; gi|15600771|ref|NC_002506.1| Cufflinks exon 286 994 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "4.5531694820"; frac "1.000000"; conf_lo "2.465004"; conf_hi "4.663521"; cov "19.278874"; ...
my outputted combined.gtf file looks like:
gi|15600771|ref|NC_002506.1| Cufflinks exon 10 4310 . . . gene_id "XLOC_000001"; transcript_id "TCONS_00000457"; exon_number "1"; oId "CUFF.1.1"; class_code "."; tss_id "TSS1"; gi|15600771|ref|NC_002506.1| Cufflinks exon 4500 8484 . . . gene_id "XLOC_000002"; transcript_id "TCONS_00000005"; exon_number "1"; oId "CUFF.5.1"; class_code "u"; tss_id "TSS6";
So the gene id is XLOC_000001 instead of VC0001 or something similar.