Hi everyone,
I'm attempting to use cufflinks to examine alternative splicing in my RNAseq data from drosophila. I'm new to the analysis and running into some problems, I'd really appreciate any help offered.
I aligned my data to the dm6 reference genome from UCSC, using the R/BioConductor platform, with qAlign function in QuasR. I had no problems using these bam files to count reads and analyse differential expression in other related BioC packages.
When I try to use cufflinks to analyse alternative splicing in these bam files, I read that the output files can be blank if the reference genome lacks some features, so I updated my reference genome to prepare a .gff file with the correct features using cuffcompare:
cuffcompare -s . -CG -r dmel-all-r6.05.gff dmel-all-r6.05.gff
This produces the file cuffcmp.combined.gtf, which I use with cuffdiff:
mkdir spliceresults cuffdiff -o ./spliceresults cuffcmp.combined.gtf \ sample1.bam,sample2.bam,...,sampleN.bam \ sample1B.bam,sample2B.bam,...,sampleNB.bam
This produces an output file with thousands of genes, all of them have the status 'notest' in the splicing.diff output file. I am aware that problems can be caused by mismatch of the reference and data, but am I right in thinking that I have got over that initial problem with the cuffcompare command? If so, and my files match correctly, why am I getting this results? I read something about not having enough reads in the analysis produces the 'notest' output, but I cannot see that this would be the case here. I have used these bam files for other parts of the RNA analysis and these have worked fine, and I also have 30 biological replicates within each treatment.
I'm sure I am missing something obvious, but my lack of experience with splicing analysis and cufflinks is letting me down. Thanks very much in advance for any help or advice anyone can give me.
Fiona
Your cuffcompare command looks a bit funny to me, unless I'm missing something. It looks like you are using the reference gene annotation file (dmel-all-r6.05.gff) both as the reference (-r) and input gff list (instead of using your assembled transcriptome from each experiment).
Hi, thanks for your reply. I guess it would make sense if there was something wrong with my reference file, and as this is the first time I used cuffcompare, it's possible my mistake is here. I was trying to follow instructions I found on a cufflinks helppage (http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html) to create a .gtf file with tss_id and p_id attributes. The instruction said:
So that is what I was trying to achieve with my cuffcompare command. Have I misunderstood? I'm afraid I can't see how, so if anyone could point out my mistake, I'd be really grateful.
My mistake, I guess that would work. It was different than how I had used cuffcompare, so couldn't predict what that ouput would be.
I'm going to take a step back and ask whether you've
Hi again, and a big thank you because you have fixed this issue for me.
Your point (3) did the job, the filtered .gff allows tests to run where the unfiltered version would not. As you suspected, cufflinks called gffread without prompt.
Best, Fiona
Fantastic news, glad to help.