Hi, everyone,
I am a newhand working on RNA-seq data. Tophat-cufflinks method is widely used in analyzing RNA-seq data. I have installed these tools, and now need to run them. From tophat and cufflinks websites, I learned some basics as below, but I am not sure if my commands are right. I am wondering could anyone offers a pipeline or gives me a guidance, particularly the command options? THANKS a lot!
Step 1: generate a tophat_out
folder with bam files
tophat <index> sample1_1.fq sample1_2.fq
tophat <index> sample2_1.fq sample2_2.fq
Step 2: generate .gtf files
cufflinks sample1/accepted_hits.bam
cufflinks sample2/accepted_hits.bam
Step 3: prepare a text file named assemblies.txt
with following gtf files
sample1/transcript.gtf
sample2/transcript.gtf
Step 4: run cuffmerge to generate merged.gtf
cuffmerge assemblies.txt
Step 5: discovery of novel transcript
cuffcompare merged.gtf
Step 6: compare gene expressions of two samples
cuffdiff merged.gtf sample1/accepted_hits.bam sample2/accepted_hits.bam
HERE IS MY PIPELINE FOR ucsc.hg19 reference genome and ucsc refgene gtf file:
This I repeat for all samples.
And finally cuffdiff.
And finally I run R - CummeRbund and have issue explained above . Sorry maybe for stupid questions.
Paul.
I'm trying to accomplish the same task, but I'm new at this field, so, how do I get the refgene gtf file and the ucsc.hg19 reference genome?
What organism?
For homo sapiens too.
Have you done a bit of Googling?!
http://hgdownload.cse.ucsc.edu/downloads.html#human
Yes, I did, but I was unable to find the .gtf file and also, at the beginning I thought it would be some specific kind of file other then the "normal" reference genome of human. Now I understood what it takes, thank you, and I'm sorry for my lack of knowledge at this point.
No worries. Hope it helped =)
You should really add some information on your data, in order to give proper guidance on what you need for your analysis.