Hello,
I am working with metagenomes of prokaryotes (modern sediment samples) obtained from WGS: some obtained with an Illumina machine and other using the Nanopore MinION. What I usually do is to trim and clean the data, align to a db and then doing the quantification. Should I also assembly prior to align?
1) I have always assumed that for the nanopore ones, considering that these are long reads: 3k-5k bp in my case, the assembly is not necessary. However, reading around, I have seen that there are assembly tools like Canu, Unicycler or Flye that can handle long reads. Would assembly make sense for WGS reads (my understanding is that it makes sense only when you know what you are sequencing)? Or maybe I could use these assembly tools to just reduce the complexity of my data (Lapidus & Korobeynikov 2021, Crusoe et al. 2015)?
2) My Illumina reads are all pair-end and 150 bp long each. Still, being WGS does it really makes sense to do the assembly or will it be risky and create wrong contigs?
Thanks in advance for clarifying this for me.
Thanks Mensur.
I usually align using the nr, nt or refseq databases. If I have any novel organisms I will never know and I atm I am not interested in investigating this. I just want to do comparative analyses of microbial compositions. Since I do not know a priori what is inside my samples I just want to make sure to do everything I can to end up with the most accurate compositional data that I can get from each of these samples. So far, I have skipped the assembly step and gone straight to the alignement using k-mers or score base aligning tools like Kaiju or Kraken or Bowtie2...
Your last comment makes sense to me too now. Thanks