Hi:
I am working on Drosophila RNA-seq data as the Nat Protocal paper [1]
After running the program Cufflinks, I got expression value (FPKM) which were associated with IDs like "CUFF.2"
Then I run the program cuffmerge "to create a single merged transcriptome annotation" [1]
But what I got is a file merged.gtf
with lines like
2L Cufflinks exon 74903 75018 . + . gene_id "XLOC_000003"; transcript_id "TCONS_00000004"; exon_number "3"; gene_name "galectin"; oId "CUFF.2.2"; nearest_ref "FBtr0078101"; class_code "j"; tss_id "TSS3";
This file merged.gtf
did not provide FPKM values at all.
Have I got the correct/expected result?
Should Cuffmerge returns expression values associated with each gene (such as a matrix with gene symbol as row names and FPKMs in each row)
Thanks in advance!
Best
[1] Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. ; 7(3): 562-578. doi:10.1038/nprot.2012.016
Thank you Devon!
Could you please also tell me how to generate a matrix with rows for gene expression and cols for samples which using gene symbol (or other identifiers) as row names?
That is, how to transform "CUFF.2.2" to a symbol making biological sense.
Should I write scripts for this purpose?
Many Thanks!
cuffmerge (and cufflinks, for that matter) can be given a GTF or GFF annotation file that will contain things like gene names. You can get that from where ever you downloaded the reference fasta file. Regarding a matrix of expression values, I don't use cufflinks/cuffdiff enough to know of a simple method to do that, though I presume that cuffdiff would output such a file.
Thank you Devon!
I have download all annotation file from http://ccb.jhu.edu/software/tophat/igenomes.shtml
and used these two file for Tophat:
The cuffmerge output file
merged_asm/merged.gtf
contains 3900 CUFF IDs while the cufflinks outputtranscripts.gtf
contains > 10000 CUFF IDs.Do you think there is something inconsistent, given that many cufflinks identified CUFFs were missed by cuffmerge?
Usually the files from igenomes work well together. Perhaps there's a problem in this case. Try to run cuffmerge again, ensuring that you specify the genes.gtf file. You might also look into the genes.gtf file to ensure that it contains gene names and IDs.
Thank you!
Below are the first two lines of
genes.gtf
file:It seemed OK. But I found that under the dir BowtieIndex, files actually do not provide the gene symbol and transcripts ID. Could this be the reason for showing CUFF.2.2 in
transcripts.gtf
files?I assume that the files in BowtieIndex are just fasta files, so they wouldn't contain those names. That's expected. Try rerunning cuffmerge with the gtf file and see if that fixes things. If not then I don't know what the problem is.
Thanks a lot!
When running cuffmerge I got such warnings:
Did you see thing like these?
Thanks!
That suggests a mismatch between the files. I don't use drosophila for anything so I'm never used the files you're using.