Hi,
I'm doing some transcriptomics on a non-model (an earthworm) and am having issues with my assembly.
I've got HiSeq RNAseq data from pooled samples (around 20-25 monophyletic) individuals for each of 3 exposures. I've assembled the transcriptome of a single exposure group using Velvet and Oases, but I've got a massive haul, with an N50 of 82,694 >= 1465 bp.
I anticipate that vast amount of variation within my sample will mean that theres an awful lot of very similar sequences in my data- What software is out there to help me achieve a consensus transcriptome?
I really would appreciate any pointers,
Craig
P.S. There is a draft reference genome for this species, but its of a genetically distinct (14% according to mitochondrial COII markers and AFLP) alternative lineage.
Edit: Because I've pooled so many individuals, I'd like to reduce the number of contigs that occur as individual sequences due to SNPs, sequencing errors or whatever.
I'm aware that I need to redo the assembly to get rid of sequences that velvetg has attempted to scaffold with Ns. All other parameters other than kmer length and insert length are at default values.
Hope that helps!
Just for clarification: You have an L50 of 14694? Thats is pretty huge. I am not getting what you are asking. An assembly is a consensus sequence.
It is not entirely clear what are you after - are you asking about advice on achieving a better assembly?