Hi,
There are a few parts to this post. I am determining the max or most optimal number of cDNA libraries I should add to an Illumina Nextseq flowcell (400M reads). I currently have 12 libraries, two different treatments per 3 different species and each sample has a biological replicate. This leaves me at 33.33M reads per library, however, I am interested in adding a third biological replicate for one of my species, bringing my library count to 14, and 28.57M reads/library. I will be generating 6 unique assemblies (3 unique species that each have two treatments). My questions are:
Am I running the risk of missing out on some low-abundance transcripts by reducing my sequencing coverage from 33M to 25.5M reads? Or is that too small of a difference to even worry about? I will be doing expression studies so I'd rather focus on the number of replicates than sequence depth.
Is there a method to optimize the assembly of these potential low-abundance transcripts that may be of interest to me? I am aware of all of the various transcriptome assemblers. Various kmer lengths from various assemblers, and merge assemblies?
I was planning on combining the reads from each replicate to generate the assembly de novo. Are there any consequences/tradeoffs to doing this? My samples are from outbred non-model species. I've reduced as many variables as I can to hopefully decrease potential sequence polymorphism. Is it better to first generate the assembly and then map my reads from my replicate to it, and subsequently merge the two assemblies? Generate both assemblies de novo and merge those?
If you're still reading this, thank you!