Hi
N.B. Obviously, these results are what they are, and I expect they are true, but I wanted to see whether I've missed anything.
I've done a comparative analysis of RNAseq as I've done many a time before, using tophat, HTSeq and edgeR. Mapping was successful (80%+) and everything appeared fine. I have been doing the analysis for colleagues and the dataset is in a different species to which I'm used, but I can't see anything different, but it is less than successful.
Setup is like this with the table representing number of genes with Fischer's exact test at FDR <0.05:
vs | Control (x2) | Untreated (x3) | Untreated (excluding genes found in Control vs Untreated)
Treatment1 (x3) | 0 | ~300 | ~90 |
Treatment2 (x3) | ~10 | ~500 | ~200 |
Treatment3 (x3) | ~10 | ~400 | ~150 |
There were only 2 transfection controls as one sample failed. The 'Untreated' was a cell line which did not undergo transfection. All ~10 stand out genes in the 'successful' treatments were snRNA such as U1 spliceosomal RNA (e.g.), with FDR upwards of E-100. These are also observed in the Untreated vs Treatments, but they are absent from Untreated vs Transfection Control, indicating Treatments 1&2 are the cause.
Long story short, my colleagues were expecting more genes and more mainstream pathways, and I'm unsure whether there could be something obscuring them with this snRNA business. Alternatively, are there any recommendations on analytical techniques to tease them out differently?
Thanks.
out of curiosity, how did the fastqc reports look? Any indication of contamination, purification issues?
Nothing stood out. Top illumina quality as expected, 75bp PE, GC looks fine, bit of kmers but nothing weird, adapters minimal. Just thought to check for RNA type and practically no rRNA/tRNA etc. Thanks for looking.
In that case, the only advice I could give is to potentially look at trying out DESeq2, that confirms that it's not a tool based issue (not that I think it is, I'd say it just adds evidence to what you're saying). What was the overlap like between the three treatments, and the untreated with exclusions? Does it fit with the hypothesis to a degree? Also, what was the species for this study?
As I'm doing this analysis for some colleagues, I'm not certain of the treatments beyond microRNA transfections and once I'd got the experimental design I kept out of the expectations. The species is bovine, which appears to be annotated sufficiently (something I considered).
I'm meeting with them this afternoon to discuss further, but I wanted to see any fundamentals I may have missed!
I honestly don't think you've missed anything by the sounds of things, and bovine is fairly well annotated in miRNAs and Genomically, I've had experiments from the same species before.
Ok, so I think I might have found the problem. I hadn't thought to ask about coverage as the data had already been sequenced before I saw it, but exome coverage mapping (bedtools coverage -hist -abam accepted_hits.bam -b Bos_taurus.UMD3.1.83.bed | grep ^all) gives the following: http://i.imgur.com/ereXBWh.png
So I think lack of depth is the real culprit here...