Hi,
I'm in the early stages of preparing my data for publication, and I just wanted to get bioinformatician's opinions on the way I've handled my data to make sure I'm not doing anything wrong and avoid any pitfalls.
Data generated on MiSeq @ 250bp PE, bacterial whole-genomes.
FastQ's assembled using spades with careful flag - look at final assemblies, size of genome, N50.
For files that are larger/smaller than expected I generally use sickle to trim short/low quality reads and reassemble.
Now a few things I'm not clear on:
Do I need to trim adaptors before assembling with spades? The MiSeq trims adapter seqs as part of bcl2fastq, but should I be doing it as a failsafe?
Do I need to order my contigs after assembly? I'm making phylogenies and trying to work out whether any two isolates are the same based on no. of SNPs.
Some of my fastqs have contaminant reads (BLAST identifies non-target organism with high identity). Should I just discard these fastqs or are they salvagable?