I am testing whether a treatment gives shorter transcripts than a solvent in E. coli using transcriptomics: I have the following dataset:
- (A) treatment condition and a (B) solvent condition
- 3 concentrations (1X, 2X, and 3X)
- 4 timepoints (t=0, t=30, t=60, and t=120 minutes)
I have triplicates for each condition group (e.g., SOLVENT/2X/t=30 there are 3 samples)
How can I test whether the transcripts in the treatment condition are shorter than the solvent condition using these transcriptomes?
There are 2 approaches I've considered:
1) An unsupervised approach where I do de novo transcriptome assembly and then compare the assembly stats.
Pros: I can get an unbiased view of how long the transcripts are in each sample
Cons: I can't directly compare transcripts between the condition and solvent b/c each transcript is unique to a sample. I could cluster these sequences but not sure this is the best approach.
2) A supervised approach where I map the reads directly to the genes for E. coli
Pros: I can have direct comparisons of genes between conditions
Cons: I have a POSITION dimension and a COVERAGE dimension to factor in since reads span different regions.
I'm not sure how read depth can effect either of these approaches
Just to be clear. You do not have full length transcripts. You are proposing to do this based on differences in read depth? I don't know you can do this using assembly stats. Your samples were independently prepped and may be subject to biases. Were they all processed at the same time and sequenced together?
In the supervised situation I had reads mapped to existing gene models. In the unsupervised setting, I have de-novo transcriptome assemblies. Yes, they were all in the same of sequencing and the same lab prep.