Hello all,
I want to ask a basic question because this is the first time I try RNA-seq analysis. My original task is basically to count several genes transcription level. I have 8 aligned RNA-seq data with Tophat. There are 2 categories, normal and disease. For Normal I give number 34,35,36, and 37. For Disease I give number 41,42,43, and 44. I already get the aligned bam file from tophat and I follow some workflows to use cufflinks to generate gtf file for each data. I have done that and I also finished merge all 8 data with cuffmerge. Basically, I follow the diagram flow for cufflinks >=2.2 from here.
For human GTF file, I download from: http://genome.ucsc.edu/cgi-bin/hgTables?command=start
I'm a bit confused with the step after that. After I use cuffmerge, I get 1 GTF file and then the step after is cuffquant. What is the input of this cuffquant? The graph said final transcriptome assembly and mapped reads, but which mapped reads? Is it the original accepted_bam from Tophat?
Thank you for your answer.
Thank you for your answer. I just use it but with all of the bam files in one command and resulting in 1 cxb. Is it right? Or basically, I need to run cuffquant for each accepted bam against the merged GTF file from cuffmerge?
My understanding is that this needs to be run per BAM file, so you'd then end up with multiple cxb files. Cuffdiff, in turn, can accept multiple cxb files.
Thank you. I will try it.
Hello,
I want to ask about cuffmerge. When I do the cuffmerge, there are 2 parameters that I'm a bit confused. The
--ref-sequence
parameter and--ref-gtf
parameter. I use--ref-sequence
parameter with Hg38 which I downloaded from Ensembl in each chromosome file and I use human gene GTF that I download from UCSC for--ref-gtf
parameter. Do you think it is right? I read a reply in other post in biostar, the--ref-gtf
parameter is one of the gtf file from sample, for example GTF file from sample number 34 in my case. Which one do you think it is? Now, I'm running the cuffquant with merged GTF from the later (I cuffmerge with--ref-gtf
to number 34 GTF) and still waiting for the result. Thank you.There are two issues here, actually. Firstly, the
--ref-gtf
parameter should take the reference annotation from Ensembl/UCSC/etc., not a GTF file from one of your samples. The second issue is that you should avoid mixing Ensembl and UCSC files. These two sources use slightly different names for each chromosome (UCSC will use things like "chr1" and Ensembl would instead use "1"). These differences mean that cuffmerge will be unable to tell that "1" in one of your samples and "chr1" in the GTF file are the same, meaning that the GTF file will likely get ignored. As a rule of thumb, always use only Ensembl or only UCSC files, that'll prevent a lot of issues. Personally, I prefer the files from Ensembl, they tend to be better managed.Hello,
I redo all of my work with GTF from Ensembl from here: ftp://ftp.ensembl.org/pub/release-79/gtf/homo_sapiens
I found a problem when I tried to do the cuffnorm step. The error said : (8 transcripts) does not match GTF (7 transcripts).
I do all of the work with the sane GTF so I found it strange. Before, when I use GTF file from UCSC, the cuffnorm process is success. Where do you think I made a mistake? Thank you.
I've never seen that error before. Perhaps you can ask the authors of the tool.
It seems I use both Genome fasta and Gene GTF from UCSC before, after I change the GTF file from Ensmble, it becomes error. Thank you for your help.
Did you ever discover a solution to this? Im getting "reconstituted expression bundle (1 transcripts) does not match ( 2 transcripts):....."