Hi!
I have used Tophat and Cufflinks to analyse RNA-seq data. I run the following command:
tophat2 -p 5 -o ./tophat --library-type fr-firststrand -G ./genes.gtf ./ucsc.hg19 \
2014-2194_141118_SN484_0322_AC5K6UACXX_6_1.fq.gz \
2014-2194_141118_SN484_0322_AC5K6UACXX_6_2.fq.gz
cufflinks -p 5 -u -g ./genes.gtf -o ./cufflinks ./tophat/accepted_hits.bam
and Cufflinks generated a GTF file-transcripts.gtf, it looks like:
chr1 Cufflinks transcript 34611 36081 1 - . gene_id "FAM138A"; transcript_id "NR_026818_1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
chr1 Cufflinks exon 34611 35174 1 - . gene_id "FAM138A"; transcript_id "NR_026818_1"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr1 Cufflinks exon 35277 35481 1 - . gene_id "FAM138A"; transcript_id "NR_026818_1"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr1 Cufflinks exon 35721 36081 1 - . gene_id "FAM138A"; transcript_id "NR_026818_1"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr1 Cufflinks transcript 134773 140566 1000 - . gene_id "CUFF.31"; transcript_id "NR_039983"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816"; full_read_support "yes";
chr1 Cufflinks exon 134773 139696 1000 - . gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "1"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816";
chr1 Cufflinks exon 139790 139847 1000 - . gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "2"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816";
chr1 Cufflinks exon 140075 140566 1000 - . gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "3"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816";
chr1 Cufflinks transcript 732240 735831 1000 - . gene_id "CUFF.53"; transcript_id "CUFF.53.1"; FPKM "0.2604531606"; frac "1.000000"; conf_lo "0.186299"; conf_hi "0.334607"; cov "2.247511"; full_read_support "yes";
chr1 Cufflinks exon 732240 735831 1000 - . gene_id "CUFF.53"; transcript_id "CUFF.53.1"; exon_number "1"; FPKM "0.2604531606"; frac "1.000000"; conf_lo "0.186299"; conf_hi "0.334607"; cov "2.247511";
chr1 Cufflinks transcript 749660 751452 1000 - . gene_id "CUFF.6"; transcript_id "CUFF.6.1"; FPKM "0.2508481299"; frac "1.000000"; conf_lo "0.145715"; conf_hi "0.355981"; cov "2.160173"; full_read_support "yes";
chr1 Cufflinks exon 749660 751452 1000 - . gene_id "CUFF.6"; transcript_id "CUFF.6.1"; exon_number "1"; FPKM "0.2508481299"; frac "1.000000"; conf_lo "0.145715"; conf_hi "0.355981"; cov "2.160173";
chr1 Cufflinks transcript 751542 752795 1000 - . gene_id "CUFF.8"; transcript_id "CUFF.8.1"; FPKM "0.4704529712"; frac "0.426471"; conf_lo "0.290214"; conf_hi "0.650692"; cov "4.324710"; full_read_support "yes";
chr1 Cufflinks exon 751542 752795 1000 - . gene_id "CUFF.8"; transcript_id "CUFF.8.1"; exon_number "1"; FPKM "0.4704529712"; frac "0.426471"; conf_lo "0.290214"; conf_hi "0.650692"; cov "4.324710";
chr1 Cufflinks transcript 755134 756272 1000 - . gene_id "CUFF.9"; transcript_id "CUFF.9.1"; FPKM "0.3258973591"; frac "0.264706"; conf_lo "0.168058"; conf_hi "0.483737"; cov "2.995861"; full_read_support "yes";
chr1 Cufflinks exon 755134 756272 1000 - . gene_id "CUFF.9"; transcript_id "CUFF.9.1"; exon_number "1"; FPKM "0.3258973591"; frac "0.264706"; conf_lo "0.168058"; conf_hi "0.483737"; cov "2.995861";
In transcripts.gtf file, in some line, gene_id
was "CUFF" and the transcript_id
was the same as the annotation file, but both gene_ids and transcript_ids others were the form of "CUFF" in some lines, and both were the same as the annotation file in other lines(i.e. genes.gtf). Why? What does the "CUFF*" mean? And I should how to interpret Cufflinks results correctly.
In addition, I also noted that when the gene_id was "CUFF*" and the transcript_id was the same as the annotation file(such as
chr1 Cufflinks exon 134773 139696 1000 - . gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "1"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816";
the start and end position(i.e. 134773 and 139693, respectively) were the same as the annotation file.
Thanks!