I have analyzed some RNA-seq data with Tophat and cufflinks, and I have several problems about the output. I run the tophat and cufflinks with default values, that is, I used the following command:
$/opt2/tools/tophat-2.0.13.Linux_x86_64/tophat -p 5 -G genes.gtf -o tophat_mut ./ucsc.hg19 2-K13-mut
.fastq.gz
$/opt/toolkit/cufflinks-2.2.1.Linux_x86_64/cufflinks -p 5 -u -g genes.gtf -o ./cufflinks ./tophat/a
ccepted_hits.bam
But the output (i.e. transcripts.gtf
) looked like strange.
chr1 Cufflinks transcript 34611 36081 1 - . gene_id "FAM138A"; transcript_id "NR_026818_1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
chr1 Cufflinks exon 34611 35174 1 - . gene_id "FAM138A"; transcript_id "NR_026818_1"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr1 Cufflinks exon 140075 140566 1000 - . gene_id "CUFF.2"; transcript_id "NR_039983"; exon_number "3"; FPKM "0.0401424461"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.080304"; cov "0.037191";
chr1 Cufflinks transcript 323892 328581 1 + . gene_id "CUFF.1"; transcript_id "NR_028322_1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
1.Why wad FPKM equal to 0? I found that many FPKM values were equal to 0, but some FPKM values were very high. Was this situation unusual? Why did it occur?
2.What did the gene_id "CUFF.2"
(or gene_id "CUFF.1"
) mean?
Thanks!
I have similar question.
Cufflinks has
-F
/--min-isoform-fraction
option set to 10% by default so it should suppress isoforms of FPKM=0. But when I run Cufflinks with-g
/--GTF-guide
option, I usually get isoforms with FPKM=0 in output while there are other expressed transcripts in the same gene.Cufflinks just doesn't try to tweak the assembly of those isoforms. The FPKMs should still be output.