Entering edit mode
10.7 years ago
GouthamAtla
12k
I have aligned my data ( paired end RNA-SEQ ) to genome ( Hg 19 - ensemble ) using tophat2 and with a GTF file and default options. When I give the same referance GTF file and accepted_hits.bam
to cufflinks ( v2.1.1 )
, the isoform.fpkm_tracking
file has FPKM values of range 1.66072e-316
for some of the transcripts. Is it normal ? Can we consider them as less abundant transcripts and move on with downstream analysis ?
cufflinks command is
cufflinks -o outdir -p 5 -G ref.gtf sample.bam
I agree - in fact, low coverage genes tend to yield unreasonably high fold-change values, so I use log2(RPKM + 0.1) for analysis (although that is not to say values less than 0.1 are technically problematic).
Here is a paper with a more detailed explanation:
http://bioinfo.aizeonpublishers.net/content/2013/6/285-292.html
The paper also includes some benchmarks with other algorithms, which is emphasized more here:
http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html