Hi everyone!
I'm analysing results from a RNA-seq experiment. My organism is an annotated yeast that has some effect on vitis. I have around 15 millions of reads per sample obtained with illumina pair-end, 90 bp length.
I have 2 strains, c13 and c24 which were cultivated in a control condition and one treatment, the experiment was performed with three replicates.
In summary I have:
C13_control (R1,R2,R3), C13_treat (R1,R2,R3)
C24_control (R1,R2,R3), C24_treat (R1,R2,R3)
I want to check the effect of my treatment over the yeast with Tuxedo protocol. At this point I have done Tophat alignment with my 12 samples. with my fasta genome indexed as DA and the following pipeline:
tophat -G annotation.gff3 -o out_folder DA c*_control_R*.fastq c*_treatment_R*.fastq
Checking my results I get something curious. Reads from my c13_control sample aligned only in a 4%, while the same strain, but on treatment map a 60% of the reads. In the other hand, read from my c24 strain map in both cases a 60%.
If I ignore this result and I continue the protocol, when I observe my differential expression in c13 the fold change is huge (even 30 times) and almost all the genes are significant, because obviously is comparing high expression in the treatment over my poor mapping in control, so I have the feeling that my control is not being a control. c24 instead, show a fewer number of genes with fold changes lower that 8 times.
I'm trying to find an explanation for this behaviour or find another approach to raise this problem, so I would appreciate any comment!
Thanks guys!
I've updated your summary slightly, since I think you had the C13 samples listed twice originally.
Have you trimmed adapaters?
Can you do wildcarding like that with tophat? What do your `align_summary.txt` `tophat.log` and `run.log` files look like?
Thank you for replying:
Yes Devon, I trimmed adapters and QC.
Sorry if I confused you mbio.kyle, I just used
*
to show that I did the same pipeline for every sample. For example for my control/treatment samples c13 replicate1 I did:So far my tophat and run log files does not display any errors or warnings, my align summary are for example:
C13_control_R1:
C13_treatment_R1:
Thanks!
You should blast a few of the unmapped reads and see if you received the wrong samples.
Thanks Devon, I patiently did a BLAST against nr database with the 12 million reads that didn't map and I only got ~95 reads mappig a bacteria, which seems to be contation, but I still guessing what is the other huge percentage lost.
Maybe throw them at https://onecodex.com/ and see if that reveals any more contamination
:D, you just save me!
Have you run FastQC on your raw data? Do the read qualities look OK?
Hey! Yes, I did, actually the person that did this sequencing removed adapters and the quality is very good. But thank you!