I am currently dealing with a number of RNA-Seq runs. We are using MiSeq for sequencing. We have one Wild Type and a number of mutant samples. I am doing the analysis as follows:
- Use
tophat
(withbowtie2
) to align the reads to the genome. The genome is indexed properly, and I have used the index for some other analyses with small RNAs earlier. - Use
cufflinks
on theaccepted_hits.bam
file generated bytophat
to generate thetranscripts.gtf
file for each run. When doing this I use agtf
file to mask rDNA coordinates so that they do not interfere with further analysis. - I then create a
assembly_list.txt
file that contains full path to all thetranscripts.gtf
files created and that I want to analyse together. - I use the
cuffmerge
with theassembly_list.txt
file and provide the originalaccepted_hits.bam
file with appropriate labels to generate a "union" of genes across all the sampled under consideration. I provide the referencegtf
file to thecuffmerge
, along with the genome fasta file to correct for bias. - I then use the
cuffdiff
with themerged.gtf
file, and also provide the respectiveaccepted_hits.bam
files with proper label.
I then use cummeRbund
to analyse the cuffdiff
results.
Most strangely I do not see any new/un-annotated transcripts generated. Further more when I try annotation( genes( cuff ) )
in cummeRbund
, I get a table, but the column of class_codes
is populated with only NA
s.
What am I doing wrong?
UPDATE:
While going through the output files created, i realised that the merged.gtf
file actually contains the class_code
s for the transcripts identified. So clearly, it is the cummeRbund
that is not reading these codes correctly. Is this a bug in cummeRbund
, or is this a known issue.
UPDATE 2:
I also noticed something most bizzare with this analysis. After doing all the above mentioned steps, when i try to visualize the accepted_hits.bam
file on a genome browser, i see no reads mapping to the a gene that is supposed to be deleted from the sample. However, the cuffdiff
output shows the amount of transcripts present nearly equal to that in WT!!! This has me completely stumped.
I have the exact same issue as the first poster and tried to fix it with your method. It didn't seem to work. Any other suggestions?
I can also see that in my diff_out folder, the genes.fpkm_tracking file has the correct gene names associated with it.
Thanks in advanced.
That one answer is so perfect. most of the time the maesteros think that we (students of bioinformatics) know it all. We are biologists and we need the answers in precision..
@Wadunn83..
This is one complete answer... "
One thing is that you must show
readCufflinks()
where to find your merged.gtf. It does not find this automatically and it is NOT in your cuffdiff results. It is in your cuffmerge results folder. "