Entering edit mode
11.9 years ago
biorepine
★
1.5k
Hi, I have seen many studies where they assembled transcriptomes (RNA-Seq) using cufflinks and usually it finds many novel transcripts but surprisingly most of them have 0 FPKM expression levels. Has any one noticed this ?
A good ex:
chr5 Cufflinks transcript 19702133 19761803 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19702133 19702839 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19704533 19704567 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19705319 19705437 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19706257 19706391 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "4"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19706708 19706805 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "5"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19708313 19708453 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "6"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19709434 19709586 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "7"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19710326 19710331 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "8"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr5 Cufflinks exon 19712308 19712467 1 - . gene_id "XLOC_150252"; transcript_id "TCONS_00425102"; exon_number "9"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
.........
Which paper/dataset, and which versions of tophat and cufflinks was used for the above analyses? There were some bugs in various older builds leading to assignment of 0 FPKM in cufflinks and cuffdiff runs, not sure how many of them still persist in v2.2+.. see e.g.
If there are alternate isoforms for the genes in question, it could potentially also be linked to cufflinks consideration of multiple isoforms per gene and probabilistic assignment of reads?
Due to similar issues and the somewhat black box nature of cufflinks (and limited design matrix setups), I've turned to other tools for abundance estimates (eXpress) and differential expression analyses (DESeq2, limma).
Cufflinks is used to assembly transcripts, the FPKM was used to filter the results. For this purpose, cufflinks is worth to try.
I posted few lines as biostar has limit on word size.
I have noticed this as well in my RNAseq data have have yet to find an explanation.