I'm interested in finding novel genes and isoforms using Cufflinks but I've run into a problem.
When I run cufflinks with --GTF /<my annotation file>
option my genes.fpkm
file contains about 25,279 entries with gene ids, FPKM, etc. Everything looks fine.
When I run with --GTF-guide
, my genes.fpkm
file contains 24,984 entries with 8,275 of the gene_ID
s indicating a novel gene with the CUFF.*
identifier. What I expected was more entries than in the first case, and a few novel genes to complement the first dataset.
This seems like an awful lot of novel genes. Furthermore, many of the gene IDs found in the --GTF
output are missing in the --GTF-guide
output as if cufflinks decided in the second case to call them novel or to remove them all together. I'm not sure if this sort of thing is usual or if I'm justified in being confused by this output?
Thanks for any help!,
-Jeremy
Did you specify the
-M/--mask-file
option in these runs?Can you share your input command?