I have MiSeq data (fastq.gz format) that I am trying to preprocess for microbiome analyses.
The workflow I've come up with is the following:
- Join paired end reads
- Trim sequences to remove primers & barcodes
- Demultiplex
- Quality Filter
I have tools to do numbers 3 & 4 above (Qiime or mothur). But I can't seem to get anything to work for part 1 & 2.
I have two questions:
- Is the workflow above in the correct order?
- Is there a semi-straight forward tool (decent documentation/workflow examples) to join my paired reads and & trim sequences? So far, I've tried using fastqjoin, but haven't been able to figure out how to use it. Mothur has a
trim.seqs
function, but I've been having issues with that, too.
If these are the best tools, I'll start a different topic for trying to get them to work. I just don't want to spend hours trying to get something to work if it isn't the best way for a beginner to do it. Thanks in advance.
I was able to run join_paired_ends.py with little issue. Do I need to trim primer or barcode sequences before moving on? If so, does Qiime have a script for this? (I see it's built into
split_libraries.py
, but doesn't seem to be built intosplit_libraries_fastq.py
). If not, anything easy/straightforward you can recommend?If you are trimming, you should trim before you join. If the reads have adapters on the end, they should not join successfully. However, if this is 16S, your fragments should be big enough and the reads should not run into adapter regions. If you have adapters, there is a problem with that fragment and it probably should be eliminated anyway.