Question

RNA-Seq Data analysis FPKM values mostly showing 0

0

Entering edit mode

8.3 years ago

simang5c ▴ 10

Hi Everyone!

I have Human RNA-Seq data downloaded from NCBI, for which I am performing data analysis. I aligned the reads using tophat2 by providing the premade indices from GRCh38.fa and GRCh38.chr.gtf. After aligning and assembling with Cufflinks when I checked the transcripts.gtf file; most FPKM, Coverage, showed 0.0000 values while very few showed values greater than 0.00.. But at the same time, when i ran the assembly using Cufflinks and without providing '-g' option which points towards GRCh38.chr.gtf, the FPKM, Coverage values showed up in all sequences.

Could anyone explain me what might be the probable reasons for such difference? Should I proceed with the analysis?

Also, when I checked the quality of read distribution, using the command read_distribution.py from RSeQC it showed not tags counted. The output is pasted below:

Total Reads 1914074 Total Tags 2097688 Total Assigned Tags 0

========================================================== Group Total_bases Tag_count Tags/Kb
CDS_Exons 103371993 0 0.00
5'UTR_Exons 5217678 0 0.00
3'UTR_Exons 29324747 0 0.00
Introns 1500197093 0 0.00
TSS_up_1kb 33306654 0 0.00
TSS_up_5kb 148463534 0 0.00
TSS_up_10kb 265823549 0 0.00
TES_down_1kb 35215293 0 0.00
TES_down_5kb 152556214 0 0.00
TES_down_10kb 268614580 0 0.00

I don't really understand whats happening. It would be really kind of you guys if anyone could spare sometime and give useful inputs and help me solve the issue.

Thanks in advance..!

RNA-Seq assembly RSeQC • 2.3k views

ADD COMMENT • link 8.3 years ago by simang5c ▴ 10

0

Entering edit mode

Have you even looked at what the -g flag does?

-g/–GTF-guide < reference_annotation.(gtf/gff) >

Tells Cufflinks to use the supplied reference annotation a GFF file to guide RABT assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled.

You should also know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using kallisto or salmon.